SQL Server - A Practical Introduction to SSIS
DTS or Data Transformation Services is a powerful and flexible tool shipped with the SQL Server. The latest version of the SQL Server (SQL Server 2005) ships with SSIS or SQL Server Integration Services, a remodeled version of DTS with extra features. We will go over SSIS concepts in this chapter and learn to work with SSIS through a simple example.
DTS and SSIS allow users to automate data import, export from a variety of sources, perform sundry transformations, and format related changes during the process. An SSIS package consists of a series of tasks related to this sort of data extraction and transformation; the tasks are performed in a pre-defined order. Constituent parts of DTS/SSIS tasks are called DTS/SSIS objects and may be related broadly to either flow of control (the order in which execution happens) or data flow (where is the information extracted from? how? where does it go?).
DTS objects may contain connection managers that handle connection to the data source via configuration parameters such as location, type, information on columns and data etc. Variables are placeholders for entities and parameters that change during the course of the execution of the package. Event handlers contain code that executes every time a certain predefined event occurs and log providers record execution related information. Following are some basic DTS objects with short descriptions:
- Containers - Objects that group objects into a common 'bin'. Every SSIS package is a container that holds objects; other SSIS containers include ForEach loops, for loops, and event handlers.
- Destination Adapters - Destination adapters contain information about the output format and settings and load data from a variety of sources into the specified output format (Excel file, flat file, SQL Server etc.).
- Precedence Constraints - These evaluate certain conditions and use the results to order tasks for execution. The status of current task (success, failure etc.) often determines the flow of control at points where there are branches (i.e. a choice of two or more tasks)
- Source Adapters - These integral components actually make data fit for transfer into SQL Server or other formats by adapting data housed in several sources (flat text files, Excel files, other SQL Server databases etc.). Scripting components allow developers more control over this process; developers may use a scripting component to house code that readies data in a custom proprietary format for transfer.
- Tasks - In large part, SSIS tasks deal with data extraction and loading. They may allow the package to communicate with other objects and Windows entities to accomplish goals. Often used tasks include Data Flow tasks, Workflow tasks, Scripting tasks, and Database Maintenance tasks.
- Transformations - Data transformations include aggregations, sorting, merging, grouping etc. on the simple level. SSIS also allows a wide variety of complex transformations.
We will work with a books database over this tutorial. In this chapter, we will learn to import data into SQL Server through an SSIS package that converts Text files to SQL Server database table rows. We will also learn to schedule a job that executes the package at pre-defined intervals of time via the SQL Server Agent. Download the files author1.txt and author2.txt onto an appropriate directory on your computer.
Next, create an 'author' table in SQL Server. We will go over table creation in detail in the next chapter; for now, let us just create a table in the 'master' system database. Open SQL Server Management Studio through the corresponding entry in the SQL Server program group in the 'Start' Menu. First, connect to the default database either when Management Studio automatically opens the 'Connect' dialogue box or by right-clicking the corresponding entry in the object explorer and clicking 'Connect'.
Next, click on the 'New Query' icon on the left hand side of the top bar. A new blank window will open in the right hand panel. Copy and paste the following command into the window and click the '!' Execute icon on the top bar or simply hit the F5 key on your keyboard. This action will create a table named 'author' within the master database with columns authorid (an automatically increasing counter), lastname, and firstname. You may browse this table by expanding the 'Tables' folder within the 'Master' database within 'System Databases' (you may have to right click the folder and click 'Refresh' first).
USE [master] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[author]( [authorid] [int] IDENTITY(1,1) NOT NULL, [lastname] [varchar](100) NOT NULL, [firstname] [varchar](100) NOT NULL, CONSTRAINT [PK_author] PRIMARY KEY CLUSTERED ( [authorid] ASC )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO
We are now ready to create our SSIS package. Go to the SQL Server program group and click on the Business Intelligence Development Studio Entry. The Visual Studio window will open up. Click on 'New Project'. In the New project dialogue, enter suitable values next to the 'Name:', 'Location:', and 'Solution Name:' select 'Integration Services Project'. Click the OK button.
The following screen will open. This screen's main tab level should say 'package.dtsx'; we are working on the SSIS package creation GUI. Right click on the bottom 'Connection Managers' tab and select 'New Flat File Connection' from the list.
The flat file connection manager dialogue should open. The 'File Name:' field should reflect the name and location of author1.txt on your computer. Select the other fields as shown in the next screen shot.

Flat File Connection Manger General Properties
If you open the 'author1.txt' file, you will see a tab separated list of author First Names and Last Names. We will need to enter this specification in our flat file connection manager. Click on the 'Advanced' Properties tab on the left hand panel of the Connection Manger Editor. You will see default names (column1 and column2) for each column. Click on each column name to view properties for the column. Change the names to 'firstname' and 'lastname'. The ',' character should be automatically chosen in the 'Column Delimiter' Field for column 1. For the second column, this field should contain '{CR}{LF}'. Enter a length of 50 for each field in the 'OutputColumnWidth' field. Make sure that the DataType field contains 'string[DT_STR}'. Once you have specified these details, click OK. You will now see a small icon with the name you specified in the 'Connection Managers' panel.

Flat File Connection Advanced Properties
Let us now create a destination connection manager that points to our SQL Server author table. Right click on the 'Connection Managers' panel once more. This time, select 'OLE DB Connection'. In the box that opens, select 'Native OLE DB\SQL Native Client' next to 'Provider'. Next, enter 'localhost' under the server name for your local default server and select or enter 'master' under 'Connect to Database' under the checked 'Select or Enter a Database Name' Radio Button. Click OK after testing the connection. Leave the next dialogue as is and create the connection. You have now created an OLE DB Connection Manger. You will now see an icon for the new connection in the 'Connection Managers' Panel.
The next step is to create a data flow object to facilitate data transfer between our two connections. This part is simple. Make sure that you are within the 'control flow' tab. Place your mouse over the 'toolbox' icon on the left hand side to view a list of available SSIS tasks. Scroll down and click on 'Data Flow Task'; hold the mouse button down and drag an instance of the task onto the central panel (you can access the toolbox through View -> Toolbox also). Enter a suitable name for the task. Once you have entered a name and clicked OK, you should see an icon labeled with the name you assigned to the task appear on the control flow tab. The green arrow underneath the task allows you hook up the successful completion of this task with the start of the next.

Data Flow Task Icon After Creation
You may now specify input and output for the data flow task. Double-click on the data flow task to get to the 'Data Flow Task' Tab to edit your task. Now, position your mouse over the 'toolbox' icon on the left hand side. Scroll down and click on 'Flat File Source' under data flow source. Hold the mouse button down and drag an instance of the Source onto the central panel. Double-click on the source icon to open the connection dialogue. In the initial dialogue of the Flat File Source editor, select the name of the flat file connection manager we created earlier by clicking on the down arrow at the end of the 'Flat File Connection Manger' box. Now, click on 'columns' in the left hand tab and make sure both columns are checked. Click OK.
The red cross in the Data Source task is now a green check. Now, click on the connection point at the center of the bottom border of the Source box and connect it to the OLE DB Destination box. You will see the two items connected by a green arrow.

A data flow task
Next, double click on the destination icon and specify basic properties. Select the localhost.master connection we created earlier. Select [dbo].author in the 'Name of Table or View' Field. Leave the default values in the other entries. Click on 'Mappings' in the left hand tab. Since matching names were provided, the columns in the source and destination should already be mapped. If not, simply click on 'firstname' in the source box and hold the mouse down. Drag the mouse onto the 'firstname' column in the destination box and let go.
Your simple package is now ready. It will import data from the specified flat file to SQL Server. Let us add an extra function to the package. Large databases often use more than one flat file as a source. The author information has been split into author1.txt and author2.txt for demonstration. Let us use a 'foreach' task to import data into the author table from all author flat files in a directory. Save your package by clicking 'save all' or right clicking on the package tab and selecting 'Save..'. Now, click on the control flow tab once again. Move over the toolbox and drag a 'foreach container' onto the work area.
Double click on the container. In the editor dialogue, select 'Foreach File enumerator' next to 'Enumerator'. Select or enter the directory where you have saved the author1.txt and author2.txt files under 'folder'. Type author*.txt under 'Files'. It is better that you use a folder that just holds author data files as we have just set up our package to loop through all files in the directory of format author<some_string>.txt. Next, click on the 'Variable Mappings' tab. Enter a name for a new variable; create it and map it to index 0. This step causes the name and path of the file being processed to be held in the variable you just created and assigned.

Foreach Collection Properties Editor

Foreach Variable Mapping
Next, drag the data flow task into the foreach container as shown below.

Foreach Container with Data Flow Task
Now, right click on the name of the Flat File Connection Manager in the bottom panel and select properties. You will see the properties window for this object in the right hand panel (usually the bottom half of this panel) of BIDS. Click on the ellipsis '...' symbol next to the connection String Property. An expression dialogue opens up. Now, map the connection string property to the variable you created earlier and click OK. You should see the value of the first file in your list in the Connection String box. Now, your package will take input from each author*.txt file in turn as the flat file connection manager for the source refers to a dynamic variable rather than a single source. Save the package. Build it using the top bar 'Build' menu.

Properties of Flat File Connection Manager

Set ConnectionString to Dynamic Variable.
A DBA may expect another process to add data files to a certain location at specific times during the day. To create a job that runs the package automatically at set intervals of time, open management studio, connect to the default database (the database that contains the author table we created earlier) and expand the SQL Server Agent. Right-click Jobs and then click New. In the new job dialogue, select 'Enabled'.
Enter a suitable name next to 'Step Name' and select 'SQL Server Integration Services package execution' from the 'Type' list. Click on the general tab and use file system as the package source. Click the ellipsis '...' icon next to package and browse to the location where you saved the package. Select the Windows account you normally connect to and administer your database from the 'Run as' list. Click 'OK' to Save your job with an appropriate name. Refresh to see your new job under the 'Jobs' folder
Right click on your new job and click on properties. Select the 'Schedules' tab. Click New. Type a name for the schedule. Enable the job by clicking the 'Enable box' if it is not already checked by default. Click the 'Recurring' radio button. Click 'change...' and enter the appropriate values in the Frequency, Daily Frequency, and Duration groups. Apply the changes and save your job by clicking OK.
This tutorial should have given you a basic understanding of the capabilities of SSIS. Typical SSIS packages are many times more complex than ours. Logging on or executing on failure links may be used to handle task failure. Processed files may be moved to a separate folder or even deleted in order to prevent duplicate inserts. Several tasks may be sequentially connected to create a regular program flow.










