We all know that SQL Server Integration Services (SSIS) is the ETL (Extraction, Transformation and Loading) tool widely used in many industries for data migration tasks. A SSIS package is the collection of Connections, Control Flow Tasks, Data Flow Task, Event Handlers, Parameters, Variables and Configurations.
Data flow task consists of source, destination and transformation components to transfer the data from one place to another place with necessary transformations. The data flow task will fail if we get error rows while executing the SSIS package. Users have an option to redirect the error rows to an error path or to a SQL table in an SSIS data flow task. However, even though the Errorcode and the ErrorColumn ID are available, the name of the column for the error row will not be available to the user. So, the question remains, how do we get name of the error row?
In earlier versions of SSIS, there was no easy way to get the error column name. In SSIS 2016 and later versions, we can achieve this by using a method called ‘GetIdentificationStringByID‘ in the Script component in Data Flow.
To elaborate the above said new feature of data flow task in SSIS 2016, let’s look at an example below:
I have the below customer information in excel which needs to be loaded into a SQL table for which I need to create a table in SQL Server
CustomerName | City | PhoneNumber |
X | CHENNAI | 9876543210 |
Y | NEW DELHI | 7891234560 |
We have to define the column names and datatypes to create a table in SQL, hence I have referred the current values in the above table and created the sql table by running the below SQL query in SQL Server Management Studio.
CREATE TABLE [dbo].[customer] (
[CUSTID] [int] IDENTITY (1000,1) NOT NULL,
[CUSTNAME] [varchar] (50) NULL,
[CITY] [varchar] (25) NULL,
[PHONE] [bigint] NULL
)
Once the table is created, then I have to import the customer information into SQL table using ImportExport wizard, and the final sql table called ‘dbo.customer’ as shown below:
But the process usually involves more than 2 customers. We receive many excel files with different customer information. Loading lots of data into the sql table 2 at a time makes this task repetitive and difficult.
In this instance I would recommend using one of the ETL tools called SSIS to assist users with this repetitive work by automating it. ‘ForEachLoop Container’ in SSIS will load all the files into SQL table. (To know how to create SSIS package with ForEachLoop Container, please refer the article https://www.red-gate.com/simple-talk/sql/ssis/ssis-basics-introducing-the-foreach-loop-container/).
The SSIS package was created and scheduled to run every morning to pick up the customer information and store it in SQL table. But, unfortunately one day the package failed with an error ‘The value could not be converted because of a potential loss of data’. We understood that the issue was due to the conversion of data while transferring from source to destination. Though the message provided us with the issue, it was very difficult to identify the specific column or row which caused the issue.
To identify the root cause of the issue, I validated the excel file. In that excel file, one of the customers information had the phone number with a country code i.e. +91-9087654321. This caused the data conversion error in SSIS package.
CustomerName | City | PhoneNumber |
ZZ | CHENNAI | +91-9087654321 |
I fixed the issue and re processed the job by altering the sql table column’s (‘[PHONE]’ in sql table) data type to varchar (14), initially it was bigint.
In our example, we had only 3 columns which was too less to investigate, hence we figured out the column of the row that caused the error and fixed it easily. But ideally the sql transaction tables will have numerous columns for which we need a standard way to figure the column names of the rows that cause an error.
As I said above, we can achieve this by using a method called ‘GetIdentificationStringByID‘ in the Script component in Data Flow which is one of the new features of SSIS 2016.
The below article will show you how we can get the column name for error rows in a SSIS 2016 Data Flow task with another example in detail and also with screenshots.
http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/178441/