Dimensions change over time and there are
well-established methods of handling these changes. In this blog post I will
talk about some of the different methods of handling slowly changing dimensions
(SCDs).
Type 0: Retain original
In this case we just retain the original
value as the dimension’s value never changes. We use type 0 for things like
credit score and date dimensions where we would never want to change the
dimension. Type 0 is quite rare.
Type 1: Overwrite
With type 1, we just overwrite the original
value of the dimension. It is used in cases where we don’t really need to
record any history of the attribute’s value or in case we need to correct an
error.
For example:
Before:
Supplier Key
|
Supplier Name
|
State
|
123142
|
Foo
|
AZ
|
After:
Supplier Key
|
Supplier Name
|
State
|
123142
|
Bar
|
AZ
|
Type 2: Add a new row
This is the most important type as it is encountered
very frequently. With type 2, we record the history of changes. We need to add
a 3 new columns to the table:
- row effective date or date/time stamp
- row expiration date or date/time stamp
- current row indicator.
So, if a type 2 attribute changes, we add a
new row to the dimension table with the row effective date set to the current
date and the row expiration date usually set to a far-off date like 9/9/9999.
We also need to set the row expiration date of the affected row to the current
date.
For example:
Before:
Supplier Key
|
Supplier Name
|
State
|
Row effective date
|
Row expiration date
|
Current row indicator
|
123142
|
Foo
|
AZ
|
1/7/2013
|
2/13/2014
|
Expired
|
After:
Supplier Key
|
Supplier Name
|
State
|
Row effective date
|
Row expiration date
|
Current row indicator
|
123149
|
Bar
|
AZ
|
2/13/2014
|
9/9/9999
|
Current
|
Type 3: Add a new attribute
For type 3, when changing the value of an
attribute, we add a new column which will contain the new value. This approach
is useful when we want to record changes but the frequency of change is low. It
is not used as much as type 2.
For example:
Before:
Supplier Key
|
Supplier Name
|
State
|
123142
|
Foo
|
AZ
|
After:
Supplier Key
|
Old Supplier Name
|
State
|
Supplier Name
|
123142
|
Foo
|
AZ
|
Bar
|
Type 4: Add mini-dimension
This technique is used when a group of attributes
in a dimension quickly changes. It is split off to a mini-dimension. The type 4
mini-dimension requires its own unique primary key; the primary keys of both
the base dimension and mini-dimension are captured in the associated fact
tables. The mini-dimension is the one that tracks all changes. The original
dimension contains the current data.
For example:
Supplier Key
|
Supplier Name
|
State
|
123142
|
Foo
|
AZ
|
Supplier Key
|
Supplier Name
|
State
|
Row effective date
|
Row expiration date
|
123142
|
Bar
|
AZ
|
2/13/2013
|
2/13/2014
|
123142
|
Baz
|
AZ
|
2/13/2012
|
2/12/2014
|
Reference:
Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.
No comments:
Post a Comment