Monday, January 27, 2014

Post #1

So, this semester I have taken a course on Business Intelligence. This is my blog about Business Intelligence. They say you learn a lesson better when you teach it. Hopefully, that holds and this blog helps me learn BI better.

Dimensional modeling

Dimensional modeling is the preferred method of representing analytic data. The two major goals of dimensional modeling are:
  •          Delivering data that is understandable to business users
  •          Delivering fast query performance

Dimensional modeling focuses on simplicity. Dimensional models are quite different from 3NF models of relational databases. 3NF models tend to emphasize on removal of data redundancy by organizing data in discrete tables that are related to each other. Dimensional models on the other hand, may have redundant data. The aim is to improve query performance and keep things simple and easy to understand.
So now that we know why dimensional modeling is used, let’s get into the fundamentals of data modeling.

Fact tables

Facts are a measure of an organization’s business processes. So in case of retail stores, individual sales would be facts. You could evaluate how well a retail store is doing based on its sales performance. All of these facts are stored in fact tables. A row in a fact table represents a measurement event; so in case of the retail store, that could be sale of an individual item or total sales to a customer depending upon the grain (level of detail).

As you may have figured, fact tables are usually deep i.e. they have a large number of rows (unless your retail store is not doing particularly well). Business Intelligence doesn’t bother itself with analysis of individual rows in a fact table. We would usually want to add up this data or perform some aggregation operations on it to draw meaningful conclusions from it. That is how we could address business questions like ‘how well is the store doing this quarter?’, ‘how much does the average customer spend?’, ‘what is our top product?’ etc. Fact tables have two or more foreign keys which have references to dimension tables. A combination of these foreign keys could be used to as a primary key (called a composite key).


Dimension Tables

Dimensions describe the entities involved in the events that are recorded in the fact table. So, in case of the retail store, we can think of date, product, customer or store as dimensions because they help describe a row in the fact table. Dimensions are used for grouping or limiting results (similar to a GROUP BY clause in SQL). We could find sales by month, sales by product etc.

Dimension tables are usually wide as they have a large number of columns. They usually have fewer rows than fact tables. Dimension tables have a primary key, which is usually referenced by multiple fact tables.

Distinguishing a fact from a dimension

In order to differentiate between facts and dimensions, think of facts as being continuously valued. So, dollar amounts of sales could assume any numeric value. Dimensions on the other hand have a discrete set of values, like store locations which can only assume a limited set of values.


This is how facts and dimensions are related in a star schema. The fact table is at the center and the dimension tables surround it.











Reference

Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

No comments:

Post a Comment