Star vs Snowflake


 

Deference between OLTP database and OLAP data warehouse:

Data warehouse database OLTP database
Designed for analysis of business measures by categories and attributes Designed for real time business operations.
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table.
Loaded with consistent, valid data; requires no real time validation Optimized for validation of incoming data during transactions; uses validation data tables.
Supports few concurrent users relative to OLTP Supports thousands of concurrent users.

The question looks a bit absurd if you are an expert BI practitioner. However, that is one of the very basic question most of the beginners unable to answer.

Data warehouse

Data warehousing is the process of collecting data from multi-varied sources of an organizational and reposting it into one comprehensive and easily manipulated database. And data warehouse is a repository of an organization’s electronically stored data. Data warehouses are designed to facilitate reporting and analysis

Data mining

Data mining (also called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analysing data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Difference: Data warehouse vs. Data mining

Data warehouse is the database on which we apply data mining. 

Data warehouse Data mining
It consists of historical data, stored in the form of relational database. Data is acquired from different sources. This is the process of finding hidden information from a large sample of data.
Information about present time can  be obtained. Future information can be predicted.
This system answers questions like:Who are our customers?

Who is purchasing our products?

This system answers questions like:Who are not our customers?

Who are not purchasing our products?

Whether to build the data warehouse first or the data mart first?

That is one of the all-time debated question in data warehousing. What Ralph Kimball said was “The data warehouse is nothing more than the union of all the data marts”. However there are a lot of confusion in the fact that whether an organization has to start building a data warehouse or just start with a data mart and then expand. There are supporting facts for both approaches.

The Data Warehouse

A “data warehouse” will typically contain the full range of business intelligence available to a company from all sources. That data consists of transaction-processing records, corporate and marketing data, and other business operations information; for example, a bank might include loans, credit card statements, and demand deposits data, along with basic customer information. This internal data is frequently combined with statistical and demographic information obtained from outside sources. The cross-divisional nature of the data on file explains why a data warehouse is often called an enterprise warehouse  because the wealth of data it gathers supports the informational needs of the corporate enterprise as a whole.

The Data Mart

Here we move to the next level down in the information hierarchy. A company’s marketing, purchasing and finance departments will all make use of data stored in the enterprise warehouse. In many cases they will use the same data, but each department will massage that data in different ways. So each department sets up its own “data mart” designed to extract data from the enterprise warehouse. The key point here is that each mart processes the data in a form which suits its own departmental needs.

Data mart vs. Data warehouse

  1. Data warehouse focuses on enterprise wide across all or many of the subject areas of the organization
  2. Data mart is restricted to single business process or single business group
  3. Union of data mart is nothing but a data warehouse
  4. The decision to set up a mart usually originates in that part of an organization with the most business “pain”, and, thus, the opportunity for greatest gain. A warehouse usually comes into being when a senior executive notices a business problem recurring in several departments. Subsequent discussion reveals a greater need for cross-divisional data analysis than anyone thought. So a warehouse is born to collectively help all divisions behave as a single corporation.

Star and snowflake are most common types of dimensional modelling. Always a debating question in the data warehousing context is which one works better? You will hear arguments favouring both sides; however the question is incomplete without mentioning the system/business. The decision whether to employ a star schema or a snowflake schema should consider the relative strengths of the database platform in question and the query tool to be employed.

The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a centre. The centre of the star consists of one or more fact tables and the points of the star are the dimension tables.

The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.

  Snowflake Schema Star Schema
Which Data warehouse? Good to use for small datawarehouses/datamarts Good for large datawarehouses
Normalization(dim table) 3 Normal Form 2 Normal Denormalized Form
Ease of Use More complex queries and hence less easy to understand Less complex queries and easy to understand
Ease of maintenance/change No redundancy and hence more easy to maintain and change Has redundant data and hence less easy to maintain/change
Query Performance More foreign keys-and hence more query execution time Less no. of foreign keys and hence lesser query execution time

 

Khan – SQLDBA – MCTS – www.addarr.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s