Slowly Changing Dimension In SSIS

There are various types of slowly changing dimensions:

  • SCD Type 0  – With slowly changing dimension type 0, the dimension attribute value never changes, so facts are always grouped by this original value i.e.it is least frequently used type.
  • SCD Type 1  – With slowly changing dimension type 1, the old attribute value in the dimension row is overwritten with the new value; type 1 attributes always reflects the most recent assignment, and therefore this technique destroys history.

Consider the following Example:

ID PlayerName Team
1 MSD Chennai Super King
2 Virat RCB

If the Player changes his team  over time, as you can see in the change MSD moved to Rising Pune Super gaints  Team below, the record will be updated and destroys history.

ID PlayerName Team
1 MSD Rising Pune Super gaints
2 Virat RCB
  • SCD Type 2 – Preserving the history while adding new rows is known as Type 2 SCD.
ID PlayerName Team Effective Date Expiration Date
1 MSD Chennai Super King 03/02/2007 03/02/2015
2 Virat RCB 03/02/2007
1 MSD Rising Pune Super gaints 03/02/2015

We  add effective and expiration dates to indicate a time period during which the record was active. If the expiration date is NULL it indicates the current active record.

  • SCD Type 3 –Additional columns are added in the table to keep the latest values whereas the older existing columns will have older values.You can see that by using only a single historical column, you can maintain only one historical value per column. So Type 3 SCD has limited usability and is far less popular than Types1 and 2.

 

ID PlayerName Current Team Previous Team
1 MSD Rising Pune Super gaints Chennai Super King
2 Virat RCB
  • Surrogate Key It is a substitution for the natural primary key. The surrogate is usually implemented as an integer, acts as a unique key or primary key for the dimension and gets associated to the fact table using foreign key constraint in it.Surrogate key is important to manage slowly changing dimension.

Using Slowly Changing Dimension Transformation

Let’s first create a Players table, which will have source data and add some data to it; as you can notice below, I have added the ID column, which is a primary key at source and will work as a business key.

CREATE TABLE [dbo].[Players](
 [ID] int PRIMARY KEY,
 [PlayerName] [varchar](50) NULL,
 [Team] [varchar](50) NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Players] ([ID], [PlayerName], [Team])
VALUES
(1, 'MSD', 'Chennai Super King'),
(2, 'Virat', 'RCB')
GO
SELECT * FROM [dbo].[Players]

Now let’s create a dimension table  for storing Players data from the source; as you can notice below I have added a Player_key column as a surrogate key, effective date and expiration date for keeping track of historical changes. I have also added a CurrentFlag column to indicate the current active record:

CREATE TABLE [dbo].[DimPlayers](
 [Player_key] [int] IDENTITY(1,1) NOT NULL,
 [ID] int,
 [PlayerName] [varchar](50) NULL,
 [Team] [varchar](50) NULL,
 [EffectiveDate] [date] NULL,
 [ExpirationDate] [date] NULL,
 [CurrentFlag] [char](1) NULL,
 CONSTRAINT [PK_DimSupplier] PRIMARY KEY CLUSTERED ([Player_key] ASC)
) ON [PRIMARY]
GO
SELECT * FROM [dbo].[DimPlayers]

creating  an SSIS package to implement Slowly Changing dimension.

  1. Add  Data flow Task and making oledb Source Connection to retrieve data from the source table(Players).

step1

2. Click on the Next button to move on to the next screen ; in the next screen you need to first select the destination dimension table(Dim Players) and map the columns, Select the Business key as shown .

step2

PlayerName to be treated as SCD Type 1 and Team to be treated as SCD Type 2 as shown below:step-3

Click on Next button to move on to the next screen, map the start date and end date column to Effective and Expiration Date as shown below.

step4

Click on next

iffered-dimention

Please Click on the Next button to move on the next screen and then click on the Finish button to complete the task .

final-step

on execution the package for the first time you will notice that two records from  created source table will be loaded in the dimension table as you can see below:

executing-pkg

SELECT * FROM [dbo].[DimPlayers]

Player_key ID PlayerName Team EffectiveDate ExpirationDate CurrentFlag
----------- ----------- ------------------------------- -------------- -----------
1 1 MSD Chennai Super King 2016-10-11 NULL NULL
2 2 Virat RCB 2016-10-11 NULL NULL

(2 row(s) affected)

Now let’s go back to source (Players) and update some records. With the below given script, I am updating Team  for ID=1

UPDATE [dbo].[Players] SET [Team] = ‘Rising Pune Super gaints’
WHERE [ID] = 1

Now execute the package again and this time you will notice one record (New) has been inserted and one record (old) has been update

history-capture

SELECT * FROM [dbo].[DimPlayers]

Player_key ID PlayerName Team EffectiveDate ExpirationDate CurrentFlag
----------- ----------- --------------------------------------------------
1 1 MSD Chennai Super King 2016-10-11 2016-10-11 NULL
2 2 Virat RCB 2016-10-11 NULL NULL
3 1 MSD Rising Pune Super gaints 2016-10-11 NULL NULL

(3 row(s) affected)

Thanks 🙂

Hope you guys like it.

Advertisements

3 Comments

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s