Maximizing SQL Server Integration Services: Adding Functionality
April 6, 2018
After a bit of a spring break, I’m back! Believe me when I say March Madness is not limited to the basketball court. Hopefully, you have read through the first three blogs in this series and are anxiously waiting for this blog. (Well, I’d like to think you are.) In case you’ve found your way directly to this page, this is Part 4 in a series to help you get the most from SQL Server Integration Services (SSIS). In Part 1, technical design was the focus, including thoughts about data discovery, building an intermediate database, such as an operational data store, and determining whether additional logging is required in your solution. Between design and implementation, there are housekeeping tasks to be completed, such as building out the SSIS project shell (Part 2) and creating an SSIS template (Part 3).
At this point, the SSIS project has been configured, and it is time to really dig into the implementation process. Look at all of these beautiful controls in the SSIS Toolbox. There are so many options, where does one start? No, seriously, where do I start!?
In the previous blog, we built out an SSIS template which focused on the Control Flow. Controls we used were the Execute SQL Task and a Sequence Container. The Data Flow Task was introduced, but we didn’t do anything with it, and even disabled it in the template to prevent potential build issues. Let’s start with the Data Flow Task, and look at a couple of the controls to the right. As you look at the SSIS Toolbox on your computer, there are more items in the toolbox than I’m showing, including other types of transformations and a number of source and destination controls. The Favorites list has the controls I use most often in SSIS. This blog will look at the OLE DB Source and Lookup transformation as they hold more power than some may realize. They also may help you overcome some design hurdles in your solution.
OLE DB Source
When defining a connection manager to connect to a relational database system, like SQL Server, Oracle, MySQL, etc., there are two common types of drivers: OLE DB and ADO.Net. OLE DB has been around forever (or close to it), as pre-.Net Windows development languages, like Visual Basic 6.0, used OLE DB drivers to access a database. With that shiny new .Net platform (now over 15 years old and the underlying development platform of the SQL Server Data Tools application), database drivers have been built to run natively in .Net applications. For this blog, I’m going to use old reliable, OLE DB. It makes sense, then, that to use the OLE DB connection manager, we need to use the OLE DB Source.
The editor is easy enough to use, as you’re up and running in as little as four mouse clicks. Choose the connection manager (and click OK), select the table or view containing the data for the packager, and click OK to close the editor. Hurray, the source data has been selected, and it was like hitting the Easy Button. However, there are some lingering questions. After all, you are a data professional.
- The table I selected has 50 columns, but the package only needs five of them. Do I have to pull all the fields?
- Besides having 50 columns, there are millions of records in the table, and we only need a small percentage for this project.
- I’d really like to join a few tables together, but the database is managed by a vendor and my DBA won’t allow views to be created.
In the rush to create to get going, the Data Access mode was glossed over. Clicking the drop-down list displays three additional options. The two options using a variable will be ignored in this blog as I haven’t had the need to use them. Let’s choose the SQL Command option.
Now, we have some flexibility. In the SQL command text area, you can paste in a query to get exactly what you need from the data source and nothing more. We don’t have to stop here with this editor. Let’s say this SSIS package needs to run, but for different delivery methods. This is not a problem for SSIS. In Part 2, when we defined the project, we added Project Parameters to store connection strings to our databases. A new parameter could be added to allow the person running the SSIS package to change the delivery method. In this example, ‘Post’ would be replaced (single quotes and all) with ? (no quotes). This is a placeholder in SSIS for parameters and is the same as having a parameter, @DeliveryMethod, in a stored procedure. Click on the Parameters… button. The editor has detected the parameter placeholder and added a mapping. Choose the project parameter and click OK. Now, when the package runs, it will reference the DeliveryMethod project parameter to get the actual value to use.
In the event data is needed from a second data source, such as a user-managed Excel worksheet or a database on another server, the Lookup transformation can be used to provide an additional field or five to the data stream. While the Lookup is a great transformation to use, it can be memory-intensive depending on how much data is in this second data source. When the Data Flow task performs the pre-execute phase, by default, the lookup will query the source and pull the results into memory (full cache), as it’s much faster than looking up each individual record, which is possible with this transformation (no cache mode).
In full cache mode, the editor allows you to define a query, but it doesn’t permit parameters to be included in the query. If you work in a company with multiple subsidiaries or clients stored in the same database and the data flow is only concerned with a single subsidiary at runtime, by default, you’d have to pull in the lookup records across all entities. Fortunately, SSIS does give us a workaround, though it’s not obvious. On control flow tasks, including the entirety of the Data Flow Task, the controls’ properties include an Expressions collection. For connection managers, the Expressions allow connection strings to be more dynamic by referencing a project parameter. The same is true for most data flow transformations, but the collection is not on the transformation but on the Data Flow Task.
To start, create a new variable to hold the query that goes into the Lookup transformation and open the Expression Editor. Place your query inside double quotes. In the place of the static value, include another parameter (project or package-level) or another variable that changes on each execution. If the field is a string, don’t forget your single quote before and after “escaping” the string with double-quotes to concatenate the variable.
Tip: If the variable being used is not a string, a type cast is required. Directly in front of the variable or parameter, include “(DT_STR, <string length>, <code page>)”. Working on a U.S.-based system and converting an integer, this type cast for me is (DT_STR, 10, 1252), if the length will hold all possible value from the integer. The string length can be adjusted as needed.
With the query defined, locate the properties of the Data Flow Task in which you are working. About ¾ of the way down the list, by default, is the Expressions collection. Click on the ellipse to open the Property Expressions Editor. In the Property drop-down list, locate the lookup transformation and specifically the SqlCommand property (Ex: Lookup.SqlCommand). Open the Expression Builder and add the previously created variable to the expression.
At this point, if you previously had a query in the lookup, it will be overwritten with the query defined in the variable. Open the Lookup’s editor and continue configuring the transformation with the field mapping and selection of the field(s) needed.
With the customizations to these two controls, we have made our SSIS package much more flexible, reducing or eliminating the need to create multiple packages that are the same but with different data subsets. Continue to think about how variables and parameters can be introduced into your SSIS solution.