Evolution of the Data Warehouse

Teradata's Data Warehouse Solutions

Data Warehousing:
The Analytic Foundation

What is a data warehouse?

A data warehouse is a design pattern or data architecture that tracks integrated, consistent, and detailed data over time, establishing relationships between them using metadata and schema.

The origins of the data warehouse

The data warehouse concept started in 1988 when Barry Devlin and Paul Murphy published their groundbreaking paper in the IBM Systems Journal. Their vision sparked a need for more specific definitions of database implementations, which Bill Inmon and Ralph Kimball provided in the early 1990s – and Gartner further clarified definitions in 2005. Now any discussion on data warehousing also includes how or where a data warehouse solution is implemented, such as within the cloud, or spanning on-premises and cloud in a hybrid manner.

A data warehouse isn’t a collection of tables or measured in terabytes. It’s a design pattern, a data architecture with many characteristics:

Subject-Oriented

Reflects business entities and processes that the organization works with daily. The level of detail in the subject area is what is important: if detailed data is there, it is a data warehouse. If summary or only limited data is there, it is a data mart.

Integrated, Consistent

Data formats and values are standardized across all tables to ensure complete, accurate data that users can understand. It must also have integrity: e.g., it cannot have purchasing transactions without a corresponding customer record.

Nonvolatile History

A warehouse captures data changes and tracks data changes over time. All data is kept and does not change with transactional updates. Whether traditional, hybrid, or cloud, a data warehouse is effectively the “corporate memory” of its most meaningful data.

How Does Data Get into the Data Warehouse?

Data Sources

It’s not uncommon to have 200 or even 500 different applications sending data to the warehouse, which consolidates and integrates all such data into the subject areas. The warehouse gets input from applications such as enterprise resource planning (ERP), customer relationship management (CRM), and supply chain management (SCM).

Clickstream data from mouse clicks on web pages are another source, as is sensor data from machinery vehicles, and so on. There is also unstructured data such as JSON (JavaScript Object Notation) which does not conform to rows and columns but is still captured in the data warehouse.

Data Integration

Before the data goes into the data warehouse database, it passes through the data integration step, a complex process that rationalizes data from multiple sources into a single result. Originally this was called extract, transform, and load (ETL) because the data had to be pulled from the source, refined, then loaded into data warehouse relational tables.

Data Cleansing

Modern integration processes include data cleansing, which involves detecting and correcting corrupt or inaccurate records. Errors occur due to faulty inputs, hardware corruption, or simple human error. The data integration task combines the best, most accurate and most complete data from multiple applications into a clean, reliable “golden record” in the warehouse.

Data Loading

Data is loaded into the warehouse in a continuous process – typically all day long. Data loading leads to the business purpose of the warehouse: the foundation for finding answers to questions. Data scientists apply advanced mathematics to find patterns and anomalies, while business analysts use reports and dashboards with visualization.

Data loading segues to the business purpose of delivering data to the business users: getting insight and finding answers to business problems. Data scientists apply advanced mathematics to large amounts data to find patterns and anomalies. Multi-dimensional analysis is where all the data is highly summarized, enabling fast review of rollups by region, city, sales person, and product sold, for example. Executives and business analysts (or "citizen data scientists") use reports and dashboards with visualization, all pulling from the source of governed data: the data warehouse.

The Cloud Data Warehouse and Teradata Vantage

Both Teradata and the industry at large evolved to incorporate the benefits of cloud deployment and scalability. Teradata Vantage, the company’s flagship offering, builds on the strong foundation of Teradata Database and incorporates advanced analytic capabilities acquired with Aster Data in 2011.

Vantage is available for Amazon Web Services (AWS), Microsoft Azure, Google Cloud, Teradata infrastructure (Teradata Cloud or Customer Cloud), and commodity hardware running VMware virtualization software.

Vantage on AWS Data Warehouse Solutions
Amazon Web Services Use AWS infrastructure with Teradata Vantage
Vantage on Azure Data Warehouse Solutions
Microsoft Azure Combine Azure resources with Teradata Vantage
Vantage on Google Cloud Data Warehouse Solutions
Google Cloud Leverage Google Cloud with Teradata Vantage
Get started now with cloud analytics

Take your analytics into the cloud

Download White Paper