Virtual Data Processing for Business Intelligence

Virtual Data Processing for Business Intelligence

To make better-informed decisions, organizations need to collect and analyze diverse and accurate data in a timely manner. Traditional practice has been to extract, transform and load (ETL) data into an enterprise data warehouse (EDW) – a repository providing an integrated view of the business, including historical aspects.

Building data warehouses takes time: typically 6-24 months – even if you're enhancing an existing one. During that period, there's no way to get an enterprise view of the data. By the time the EDW is complete, business requirements may have changed, and any analytics developed in early iterations will need to be reworked.

Hadoop and other Big Data sources can bring further complications.

The Virtual Concept

Data virtualization (DV) technology underlies the new architecture of virtual data processing. Also called data federation, this approach quickly serves up comprehensive data views of remote sources, without having to run through slower ETL processes.

DV techniques can combine disparate sources within a logical layer or “virtual database”. This “information fabric” or “logical data warehouse” can be accessed by reporting applications, downstream (i.e. consumer) applications, and Online Transaction Processing (OLTP) applications.

DV maps data from heterogeneous sources (internal and external), and provides a virtualized layer on top of the source data, which can be understood by target applications. Information is not really taken out of source applications, so there's no need for physical extraction, persisting or massaging of data. This is a faster approach.

The technology can carry out functions relating to data quality, data transformation, data masking, etc., by applying transformation rules on the fly within the DV application. Analytical applications can be built very quickly with no need to spend time understanding the design of source systems or devising extraction methods.

DV is also useful for data cleansing since it provides easier access to operational data. Users can look at a report or dashboard, drill into the data, identify where there's false information, or pinpoint operational problems.

DV can be deployed in a phased manner, and can help organizations gradually construct an enterprise data model.

Virtual Components

DV technology includes three major components:

1. Integrated development environment (IDE):

This is the user interface for development and access.

2. DV server environment:

Its logical layer, created by connecting the DV application to disparate databases, presents a single enterprise data model. Target applications can file a query on source systems via the DV server.

The result set is stored in the cache database/file, so it can be used in future. If the same query is filed again the query operates against the cached database, avoiding excessive hits to the source data. The DV server can store the cache in standard databases like Oracle, SQL Server, DB2 or in any of the in-memory databases.

3. Management environment:

This provides monitoring, administration and error handling.

Business Intelligence (BI) Tools

BI tools make extensive use of dashboards: custom utilities that gather, organize and present information in an accessible way. They offer trend analysis, forecasting and drill-down capabilities. You can combine data from multiple sources, view it from different perspectives, and easily distribute it. 

Many tools can pull data from Excel spreadsheets and Access databases, or any database with an Application Programming Interface (API). This can be of value to organizations whose data is compartmentalized in separate information silos.

Most BI tools will move the data into a cache (virtual storage space), or a separate data warehouse, effectively creating a separate database. Data can be manipulated for analysis without affecting the original databases. Information can be fed manually, loaded automatically at pre-set times, or when certain actions occur. The tools can display data dynamically from a variety of perspectives, and in near real-time.

BI tools can produce sophisticated graphics like time-dependent scatter plots, spark-lines, and forecasts based on different user-generated scenarios. Often, the graphics can be customized with colors and themes to match your organization’s brand.

Reports and program data can be emailed to staff on scheduled days and times. Dashboards can be viewed and manipulated from a smartphone, tablet, or anywhere with an internet connection.

BI Best Practices

  • BIstrategy should align with your overall IT strategy and enterprise goals. So, create a business case for it, and outline the expected benefits to bring stakeholders and senior management on board.
  • Develop a vision for the project that can be implemented in steps. It should be phased through multiple sub-projects, each going through iterations. Conceptual, logical, and physical data models should be drawn to provide a foundation for the overall data architecture.
  • Encourage parallel development tracks where multiple steps can be performed simultaneously and multiple activities can occur at the same time within each step.
  • Key Performance Indicators (KPIs) should be used to evaluate your progress. These may include statistical information like sales trends, profit values, customer satisfaction measurements, etc.
  • Constraints on the use of data still apply, even when data is virtualized. So if certain data can only be used within Europe and other data within North America, that condition will have to be observed when using DV.
  • Every application has its own framework for handling updates. The DV layer should not be used to update source systems in real time. Results could be unpredictable and inconsistent.