Thursday, February 13, 2014

Post #3: Dimensions They Are a-Changin’ (Slowly)

Dimensions change over time and there are well-established methods of handling these changes. In this blog post I will talk about some of the different methods of handling slowly changing dimensions (SCDs).


Type 0: Retain original

In this case we just retain the original value as the dimension’s value never changes. We use type 0 for things like credit score and date dimensions where we would never want to change the dimension. Type 0 is quite rare.

Type 1: Overwrite

With type 1, we just overwrite the original value of the dimension. It is used in cases where we don’t really need to record any history of the attribute’s value or in case we need to correct an error.

For example:
Before:
Supplier Key
Supplier Name
State
123142
Foo
AZ

After:
Supplier Key
Supplier Name
State
123142
Bar
AZ



Type 2: Add a new row

This is the most important type as it is encountered very frequently. With type 2, we record the history of changes. We need to add a 3 new columns to the table:
  1. row effective date or date/time stamp
  2. row expiration date or date/time stamp
  3. current row indicator.

So, if a type 2 attribute changes, we add a new row to the dimension table with the row effective date set to the current date and the row expiration date usually set to a far-off date like 9/9/9999. We also need to set the row expiration date of the affected row to the current date.

For example:
Before:
Supplier Key
Supplier Name
State
Row effective date
Row expiration date
Current row indicator
123142
Foo
AZ
1/7/2013
2/13/2014
Expired

After:
Supplier Key
Supplier Name
State
Row effective date
Row expiration date
Current row indicator
123149
Bar
AZ
2/13/2014
9/9/9999
Current


Type 3: Add a new attribute

For type 3, when changing the value of an attribute, we add a new column which will contain the new value. This approach is useful when we want to record changes but the frequency of change is low. It is not used as much as type 2.

For example:
Before:
Supplier Key
Supplier Name
State
123142
Foo
AZ

After:
Supplier Key
Old Supplier Name
State
Supplier Name
123142
Foo
AZ
Bar

Type 4: Add mini-dimension

This technique is used when a group of attributes in a dimension quickly changes. It is split off to a mini-dimension. The type 4 mini-dimension requires its own unique primary key; the primary keys of both the base dimension and mini-dimension are captured in the associated fact tables. The mini-dimension is the one that tracks all changes. The original dimension contains the current data.

For example:
Supplier Key
Supplier Name
State
123142
Foo
AZ


Supplier Key
Supplier Name
State
Row effective date
Row expiration date
123142
Bar
AZ
2/13/2013
2/13/2014
123142
Baz
AZ
2/13/2012
2/12/2014

Reference:

Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

No comments:

Post a Comment