So, this semester I have taken a course on
Business Intelligence. This is my blog about Business
Intelligence. They say you learn a lesson better when you teach it. Hopefully,
that holds and this blog helps me learn BI better.
Dimensional modeling
Dimensional modeling is the preferred
method of representing analytic data. The two major goals of dimensional
modeling are:
- Delivering data that is
understandable to business users
- Delivering fast query
performance
Dimensional modeling focuses on simplicity.
Dimensional models are quite different from 3NF models of relational databases.
3NF models tend to emphasize on removal of data redundancy by organizing data
in discrete tables that are related to each other. Dimensional models on the
other hand, may have redundant data. The aim is to improve query performance
and keep things simple and easy to understand.
So now that we know why dimensional
modeling is used, let’s get into the fundamentals of data modeling.
Fact tables
Facts are a measure of an organization’s
business processes. So in case of retail stores, individual sales would be
facts. You could evaluate how well a retail store is doing based on its sales
performance. All of these facts are stored in fact tables. A row in a fact
table represents a measurement event; so in case of the retail store, that
could be sale of an individual item or total sales to a customer depending upon
the grain (level of detail).
As you may have figured, fact tables are
usually deep i.e. they have a large number of rows (unless your retail store is
not doing particularly well). Business Intelligence doesn’t bother itself with
analysis of individual rows in a fact table. We would usually want to add up
this data or perform some aggregation operations on it to draw meaningful conclusions
from it. That is how we could address business questions like ‘how well is the
store doing this quarter?’, ‘how much does the average customer spend?’, ‘what
is our top product?’ etc. Fact tables have two or more foreign keys which have references
to dimension tables. A combination of these foreign keys could be used to as a
primary key (called a composite key).
Dimension Tables
Dimensions describe the entities involved
in the events that are recorded in the fact table. So, in case of the retail
store, we can think of date, product, customer or store as dimensions because
they help describe a row in the fact table. Dimensions are used for grouping or
limiting results (similar to a GROUP BY clause in SQL). We could find sales by
month, sales by product etc.
Dimension tables are usually wide as they
have a large number of columns. They usually have fewer rows than fact tables. Dimension
tables have a primary key, which is usually referenced by multiple fact tables.
Distinguishing a fact from a dimension
In order to differentiate between facts and
dimensions, think of facts as being continuously valued. So, dollar amounts of
sales could assume any numeric value. Dimensions on the other hand have a discrete
set of values, like store locations which can only assume a limited set of
values.
This is how facts and dimensions are
related in a star schema. The fact table is at the center and the dimension
tables surround it.
Reference
Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.