Thursday, March 13, 2014

BI in the Airline Industry

Airlines are an industry that is known to leverage BI to its advantage. As discussed in the Data Warehouse Toolkit, the following is an airline’s bus matrix:

Over the next few years, airlines are planning on investing in information systems that will enable them to get to know their passengers better and deliver customized services to them. According to the 2013 SITA/Airline Business IT Trends Survey, all respondents to the survey said that they plan to invest in business intelligence (BI) solutions which will allow them to know more about their customers and have better information for decision making in their operations

Airlines are also investing in BI to improve operations and boost revenues. Similar to the retail industry, there is a strong desire to increase revenues using personalization. Nearly three quarters of airlines rate business intelligence for sales and marketing as a high priority.

Another innovative usage of BI in the airline industry is the optimization of fuel consumption. Mahindra Satyam. The IT services company is using its software framework to help some airlines save on fuel consumption. For example, it is helping a major Asia-Pacific airline through advance algorithms crunching several hundred thousand parameters and permutations to see how it can reduce fuel costs and meet tough EU emission requirements.

References:

Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

Thursday, February 27, 2014

Self-service BI



Most BI solutions today require business users to look at standard pre-configured reports that have been put in place by the IT department. Self-service BI allows end users to generate their own customized reports without the intervention of the IT staff. This benefits both parties as it makes end users more independent while freeing up the time of the IT staff.

While gathering requirements, the IT staff may not completely understand how the data is required to be presented to the user. The self-service approach ensures that nothing is lost in translation.

SSBI solutions are intended to be used by users who do not have much technical knowledge. Two important factors in a self-service approach are:

  • User interface: The UI must be intuitive and easy to use.
  • Metadata dictionary: The IT staff needs to provide the users with a metadata dictionary for the users to understand how the data can be used.

Self-service BI sounds like the perfect solution as it empowers users to get things done without the involvement of the IT staff. However, there is a down side to the self-service approach.

Handing over the power to end users means having a lot more users accessing data and querying the system. Having multiple users firing queries and generating reports simultaneously could overload the system. Security is another concern as the system would need to ensure that access to sensitive data is controlled based upon a user’s position in the organization.

Another problem is that of organizing reports. Multiple analysts could each have multiple reports and analyses of the data. While making a decision it is important to have all the data in one place. Organizing the data could be an expensive and time-consuming process.

Self-service BI with its many privileges sounds promising and it does seem like the pros could outweigh the cons. There are several companies offering such solutions. For example, Business Objects' Crystal Reports and Microsoft’s project Gemini. However it is still early times and it isn't easy to forecast how SSBI could be implemented as a full-blown enterprise solution.

Refrences:

Thursday, February 13, 2014

Post #3: Dimensions They Are a-Changin’ (Slowly)

Dimensions change over time and there are well-established methods of handling these changes. In this blog post I will talk about some of the different methods of handling slowly changing dimensions (SCDs).


Type 0: Retain original

In this case we just retain the original value as the dimension’s value never changes. We use type 0 for things like credit score and date dimensions where we would never want to change the dimension. Type 0 is quite rare.

Type 1: Overwrite

With type 1, we just overwrite the original value of the dimension. It is used in cases where we don’t really need to record any history of the attribute’s value or in case we need to correct an error.

For example:
Before:
Supplier Key
Supplier Name
State
123142
Foo
AZ

After:
Supplier Key
Supplier Name
State
123142
Bar
AZ



Type 2: Add a new row

This is the most important type as it is encountered very frequently. With type 2, we record the history of changes. We need to add a 3 new columns to the table:
  1. row effective date or date/time stamp
  2. row expiration date or date/time stamp
  3. current row indicator.

So, if a type 2 attribute changes, we add a new row to the dimension table with the row effective date set to the current date and the row expiration date usually set to a far-off date like 9/9/9999. We also need to set the row expiration date of the affected row to the current date.

For example:
Before:
Supplier Key
Supplier Name
State
Row effective date
Row expiration date
Current row indicator
123142
Foo
AZ
1/7/2013
2/13/2014
Expired

After:
Supplier Key
Supplier Name
State
Row effective date
Row expiration date
Current row indicator
123149
Bar
AZ
2/13/2014
9/9/9999
Current


Type 3: Add a new attribute

For type 3, when changing the value of an attribute, we add a new column which will contain the new value. This approach is useful when we want to record changes but the frequency of change is low. It is not used as much as type 2.

For example:
Before:
Supplier Key
Supplier Name
State
123142
Foo
AZ

After:
Supplier Key
Old Supplier Name
State
Supplier Name
123142
Foo
AZ
Bar

Type 4: Add mini-dimension

This technique is used when a group of attributes in a dimension quickly changes. It is split off to a mini-dimension. The type 4 mini-dimension requires its own unique primary key; the primary keys of both the base dimension and mini-dimension are captured in the associated fact tables. The mini-dimension is the one that tracks all changes. The original dimension contains the current data.

For example:
Supplier Key
Supplier Name
State
123142
Foo
AZ


Supplier Key
Supplier Name
State
Row effective date
Row expiration date
123142
Bar
AZ
2/13/2013
2/13/2014
123142
Baz
AZ
2/13/2012
2/12/2014

Reference:

Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

Thursday, February 6, 2014

Post #2: Cloud based BI

I have been working on a project that involves cloud based BI. So for this post, I will be writing about what I have learnt from my experience so far.

Quite like the entire software industry, the trend to move towards the cloud has hit the BI industry. A lot of startups like GoodData, Birst and BIME have cloud products as their primary sources of revenue. Major players like Oracle, Microsoft, Microstrategy and SAP are slowly trying to expand into this space.

There are two variations of cloud BI tools – SaaS BI solutions which run on the cloud where all you need is an internet-enabled device, and desktop BI tools with an online option which allow you to directly share reports on the cloud. The former is definitely more interesting.

Cloud solutions for BI have advantages that are quite similar to cloud computing in general: 
  • No installation required: Most solutions offered are SaaS. This means you don’t need to install the software on the client machines and no hardware compatibility issues. 
  • High Scalability / Increased flexibility: An organization can quickly increase/decrease the number of users without much of a hassle. It can adapt to changing business needs.
  •  Lower costs: There is also no major expenditure like traditional IT environments. As cloud based BI solutions are offered as SaaS, there is no installation cost and no maintenance cost. Upgrades are also free. Of course cost also depends on the licensing model used. There is some variation here, vendors currently offer licenses on a ‘per user’ (with or without platform fee) and hourly basis.
  •  Easy access: As all the data is on the cloud, it can be accessed from anywhere through any device. A lot of BI solutions also offer a mobile app to view reports. This also makes for easier sharing of data and facilitates self-service BI.


However, cloud based solutions come with certain caveats:
  •  Integration of the tool with data sources is still difficult. Often, connectors need to be designed for specific organizational needs.
  • Data visualizations aren’t as powerful as their on premise counterparts. Cloud BI tools are getting there, they just aren’t there yet.
  • Security of data is not within the control of the organization. Organizations must rely on the vendor (who may actually do a better job actually).


As stated in this article from Forbes, soon, it won’t be a question of “yes” or “no” rather a question of when and how much. 

References:

Monday, January 27, 2014

Post #1

So, this semester I have taken a course on Business Intelligence. This is my blog about Business Intelligence. They say you learn a lesson better when you teach it. Hopefully, that holds and this blog helps me learn BI better.

Dimensional modeling

Dimensional modeling is the preferred method of representing analytic data. The two major goals of dimensional modeling are:
  •          Delivering data that is understandable to business users
  •          Delivering fast query performance

Dimensional modeling focuses on simplicity. Dimensional models are quite different from 3NF models of relational databases. 3NF models tend to emphasize on removal of data redundancy by organizing data in discrete tables that are related to each other. Dimensional models on the other hand, may have redundant data. The aim is to improve query performance and keep things simple and easy to understand.
So now that we know why dimensional modeling is used, let’s get into the fundamentals of data modeling.

Fact tables

Facts are a measure of an organization’s business processes. So in case of retail stores, individual sales would be facts. You could evaluate how well a retail store is doing based on its sales performance. All of these facts are stored in fact tables. A row in a fact table represents a measurement event; so in case of the retail store, that could be sale of an individual item or total sales to a customer depending upon the grain (level of detail).

As you may have figured, fact tables are usually deep i.e. they have a large number of rows (unless your retail store is not doing particularly well). Business Intelligence doesn’t bother itself with analysis of individual rows in a fact table. We would usually want to add up this data or perform some aggregation operations on it to draw meaningful conclusions from it. That is how we could address business questions like ‘how well is the store doing this quarter?’, ‘how much does the average customer spend?’, ‘what is our top product?’ etc. Fact tables have two or more foreign keys which have references to dimension tables. A combination of these foreign keys could be used to as a primary key (called a composite key).


Dimension Tables

Dimensions describe the entities involved in the events that are recorded in the fact table. So, in case of the retail store, we can think of date, product, customer or store as dimensions because they help describe a row in the fact table. Dimensions are used for grouping or limiting results (similar to a GROUP BY clause in SQL). We could find sales by month, sales by product etc.

Dimension tables are usually wide as they have a large number of columns. They usually have fewer rows than fact tables. Dimension tables have a primary key, which is usually referenced by multiple fact tables.

Distinguishing a fact from a dimension

In order to differentiate between facts and dimensions, think of facts as being continuously valued. So, dollar amounts of sales could assume any numeric value. Dimensions on the other hand have a discrete set of values, like store locations which can only assume a limited set of values.


This is how facts and dimensions are related in a star schema. The fact table is at the center and the dimension tables surround it.











Reference

Ralph Kimball, M. R. The Datawarehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.