What is Data Vault Modeling and Why do we Need it?

On 17 Dec., 2020

Data vaults are becoming increasingly popular. Organizations are using them more and more because of their benefits. In this blog, let's check out in detail "Data Vault Modeling".

What is Data Vault Modeling and Why do we Need it?

Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. The formal definition is as follows:

The Data Vault is a detail-oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. 

The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s data warehouses.


The Data Vault model consists of three basic entity types: Hubs, Links, and Satellites

 

#1. The Hubs are composed of unique lists of business keys. The Links are composed of unique lists of associations (commonly referred to as transactions, or intersections of two or more business keys). The Satellites are descriptive data about the business key OR about the association.

Data Vault models are representative of business processes and are tied to the business through the business keys. Business keys indicate how the businesses integrate, connect, and access information in their systems. Data Vault models are built based on the conceptual understanding of the business.

#2. The Links represent association across the business keys. The associations can change over time, some have a direction (akin to mathematical vectors), others are directionless. Links are physical representations of foreign keys, or in data modeling terms: an associative entity.

Hubs and Links are like the skeleton and ligaments of the human body – without them, we have no structure. Without them, our Data Warehouses are blobs of data loosely coupled with each other. But with them, we have a definition, structure, height, depth, and specific features. We as humans couldn’t survive without a skeleton. The Data Warehouse cannot survive without Hubs and Links. They form the foundations of how we hook the data together.

#3. Finally, the Satellites are added. Satellites are like skin, muscle, and organs. They add color, hair, eyes, and all the other components we need to be described.

Remember this: the Data Vault is targeted to be an Enterprise Data Warehouse. Its job is to

integrate disparate data from many different sources, and to Link it all together while maintaining source system context.


Role of Hub in detail 

The job of a Hub is to track the first time the Data Vault sees a business key arrive in the warehousing load, and where it came from. The Hub is a business key recording device. The business keys in a Hub should be defined at the same semantic granularity. 

The purpose of the Hub is to provide a soft-integration point of raw data that is not altered from the source system but is supposed to have the same semantic meaning. 

The Hub key also allows a corporate business to track their information across lines of business; this provides a consistent view of the current state of application systems. These systems are supposed to synchronize, but often don’t – when they don’t synchronize, business keys begin to be replicated and worse yet, are then applied to different contextual data sets.


Role of Link in detail

Link entities act as the flexibility component of the Data Vault model. They are the glue that pulls together any related association of two or more business keys. Where business keys interact, Links are created.

Link entities are generated as a result of a transaction, discovery, relationship, or interaction between business units, business processes, or business keys themselves.

Links provide flexibility to the Data Vault model by allowing change to the structure over time. The mutability of the model without loss of history is critical to the success and long-term viability of the enterprise data warehouse. 

In other words, the model itself can now be adapted, morphed, and changed at the speed of business without loss of audibility, and compliance. 


Role of Satellite in detail

Satellite entities are the warehousing portion of the Data Vault. Satellites store data over time. Satellites are descriptive data that provide context to the keys and associations at a point in time or over a time period. Descriptive data in warehouses often changes; the purpose of. The Satellite is to capture all deltas (all changes) to any of the descriptive data which occurs.

A Satellite is a time-dimensional table housing detailed information about the Hub’s or Link’s business keys. The purpose of the Satellite is to provide context to the business keys. Satellites are the data warehouse portion of the Data Vault.

The Satellite tracks data by delta, and only allows data to be loaded if there is at least one change to the record (other than the system fields: sequence, load-date, load-end-date, and record source).

Satellites are typically arranged by type or classification of data, and rate of change. There are many different manners in which to set up classifications of data within a Satellite. For example, the attributes could be classified by data type, or by content, or by context – each of which will yield the same result physically – but a different result in the understanding or interpretation of the model.


Applying the Data Vault

Data Vault modeling is uniquely useful when modeling a data warehouse. An Enterprise Data Warehouse (EDW) project is specifically well aligned with the features of the data vault modeling.

One primary benefit is the ability to adapt easily to changes in both upstream sources and downstream data mart requirements. This provides us the ability to build incrementally and to run a truly agile data warehouse program.

The data vault data warehouse also easily integrates data and inherently manages history providing for a true enterprise data warehouse.

Data Vault modeling has also proven to be the preferred modeling pattern for special data warehouse situations including truly operational data warehousing, Big Data integration, Information model-based DW models, meta-data driven data warehouse deployments, and even data-driven generic data warehouse models.

Understanding the full benefits of the data vault modeling approach starts with getting your certification. This process is facilitated by Genesee Academy and includes materials, online lectures, exercises, two days in a classroom with lectures, labs, and group modeling exercises. On the last day, there is an exam that results in the certified data vault data modeler (CDVDM) designation.


Final words

The Data Vault approach is growing and adapting from year to year. Incremental changes to the modeling approach, rules, and best practices can be expected with some frequency.

If you want to get more detail about this topic and are still confused, then you can contact the best software consulting company in India in order to get an accurate consultation.

Good luck!






 

Read more...