There are various types of slowly changing dimensions:
- SCD Type 0 – With slowly changing dimension type 0, the dimension attribute value never changes, so facts are always grouped by this original value i.e.it is least frequently used type.
- SCD Type 1 – With slowly changing dimension type 1, the old attribute value in the dimension row is overwritten with the new value; type 1 attributes always reflects the most recent assignment, and therefore this technique destroys history.
Consider the following Example:
|1||MSD||Chennai Super King|
If the Player changes his team over time, as you can see in the change MSD moved to Rising Pune Super gaints Team below, the record will be updated and destroys history.
|1||MSD||Rising Pune Super gaints|
- SCD Type 2 – Preserving the history while adding new rows is known as Type 2 SCD.
|ID||PlayerName||Team||Effective Date||Expiration Date|
|1||MSD||Chennai Super King||03/02/2007||03/02/2015|
|1||MSD||Rising Pune Super gaints||03/02/2015|
We add effective and expiration dates to indicate a time period during which the record was active. If the expiration date is NULL it indicates the current active record.
- SCD Type 3 –Additional columns are added in the table to keep the latest values whereas the older existing columns will have older values.You can see that by using only a single historical column, you can maintain only one historical value per column. So Type 3 SCD has limited usability and is far less popular than Types1 and 2.
|ID||PlayerName||Current Team||Previous Team|
|1||MSD||Rising Pune Super gaints||Chennai Super King|
- Surrogate Key It is a substitution for the natural primary key. The surrogate is usually implemented as an integer, acts as a unique key or primary key for the dimension and gets associated to the fact table using foreign key constraint in it.Surrogate key is important to manage slowly changing dimension.
Using Slowly Changing Dimension Transformation
Let’s first create a Players table, which will have source data and add some data to it; as you can notice below, I have added the ID column, which is a primary key at source and will work as a business key.
CREATE TABLE [dbo].[Players]( [ID] int PRIMARY KEY, [PlayerName] [varchar](50) NULL, [Team] [varchar](50) NULL, ) ON [PRIMARY] GO INSERT INTO [dbo].[Players] ([ID], [PlayerName], [Team]) VALUES (1, 'MSD', 'Chennai Super King'), (2, 'Virat', 'RCB') GO SELECT * FROM [dbo].[Players]
Now let’s create a dimension table for storing Players data from the source; as you can notice below I have added a Player_key column as a surrogate key, effective date and expiration date for keeping track of historical changes. I have also added a CurrentFlag column to indicate the current active record:
CREATE TABLE [dbo].[DimPlayers]( [Player_key] [int] IDENTITY(1,1) NOT NULL, [ID] int, [PlayerName] [varchar](50) NULL, [Team] [varchar](50) NULL, [EffectiveDate] [date] NULL, [ExpirationDate] [date] NULL, [CurrentFlag] [char](1) NULL, CONSTRAINT [PK_DimSupplier] PRIMARY KEY CLUSTERED ([Player_key] ASC) ) ON [PRIMARY] GO SELECT * FROM [dbo].[DimPlayers]
creating an SSIS package to implement Slowly Changing dimension.
- Add Data flow Task and making oledb Source Connection to retrieve data from the source table(Players).
2. Click on the Next button to move on to the next screen ; in the next screen you need to first select the destination dimension table(Dim Players) and map the columns, Select the Business key as shown .
PlayerName to be treated as SCD Type 1 and Team to be treated as SCD Type 2 as shown below:
Click on Next button to move on to the next screen, map the start date and end date column to Effective and Expiration Date as shown below.
Click on next
Please Click on the Next button to move on the next screen and then click on the Finish button to complete the task .
on execution the package for the first time you will notice that two records from created source table will be loaded in the dimension table as you can see below:
SELECT * FROM [dbo].[DimPlayers] Player_key ID PlayerName Team EffectiveDate ExpirationDate CurrentFlag ----------- ----------- ------------------------------- -------------- ----------- 1 1 MSD Chennai Super King 2016-10-11 NULL NULL 2 2 Virat RCB 2016-10-11 NULL NULL (2 row(s) affected)
Now let’s go back to source (Players) and update some records. With the below given script, I am updating Team for ID=1
UPDATE [dbo].[Players] SET [Team] = ‘Rising Pune Super gaints’
WHERE [ID] = 1
Now execute the package again and this time you will notice one record (New) has been inserted and one record (old) has been update
SELECT * FROM [dbo].[DimPlayers]
Player_key ID PlayerName Team EffectiveDate ExpirationDate CurrentFlag ----------- ----------- -------------------------------------------------- 1 1 MSD Chennai Super King 2016-10-11 2016-10-11 NULL 2 2 Virat RCB 2016-10-11 NULL NULL 3 1 MSD Rising Pune Super gaints 2016-10-11 NULL NULL (3 row(s) affected)
Hope you guys like it.