Business Intelligence Glossary


OVERVIEW

Target audience: McGill administrative staff

Business Intelligence (BI) refers to a set of technologies and practices used to support planning and decision-making in a company or organization.  This article provides definitions for various BI-related terms.

In this article:

Select the links below to access the BI definitions. 

 A-C  |  D  |   E-L |  M-R |  S-Z |


A-C

Aggregation

Pre-calculated summaries of the detailed data in a data warehouse. Aggregation is an OLAP technique used to shorten response times, thereby enhancing the performance and usability of a data warehouse.

E.g. Expenditures in an academic year: the underlying table contains all details of expenses on a daily basis, and the database also stores the aggregated total.

Relates to OLAP, Dimensional hierarchy


Attribute

A field that makes up part of a data dimension. Each attribute is represented by a column in a table, a report, or a chart.

E.g. A Person dimension has attributes such as ID, name, age, and gender.

Relates to Dimension


Balanced scorecard

A performance management tool that summarizes an organization’s performance through multiple perspectives on a single page. It is used to integrate performance measures into the basic management structure of the organization.

Relates to Dashboard


BI (Business Intelligence)

Business Intelligence (BI) is a way of structuring, analyzing, and leveraging data. It is an environment that provides methodologies facilitating in-depth analysis of detailed business data. It includes technology (database and software applications) as well as analysis practices. Business intelligence allows users to receive information that is reliable, consistent, timely, understandable, and easily manipulated.

Relates to DWH - Data Warehouse, EDW (Enterprise Data Warehouse)


Conformed dimension

A dimension occurring in multiple domains of an organization that has been harmonized for coherent, consistent use. A conformed dimension can be used in different data warehouses in the same organization. This supports expansion and growth of the warehouse.

E.g. A Time dimension with attributes day, month, and year is conformed if it is defined once and reused across multiple cubes.

Relates to Dimension table


D

Dashboard

A reporting tool that presents an array of important indicators on a single screen. The information presented can include measurements, metrics, and scorecards. Dashboards often present a combination of metrics from across different areas, or in a mixed set of numeric and visual formats.

Contrasts with Balanced scorecard

Relates to Metrics


Data cube

A database structure that forms the basis for analysis applications. The name "cube" suggests that the data inside has multiple "dimensions" to it. That is, if a regular spreadsheet table has two dimensions through which its data can be viewed or calculated (one set of labels going across and one going down), a data cube has three or four or sometimes many more. These dimensions can be stacked, combined, and drilled into in multiple ways, and are well suited to browsing in pivot tables. Cubes often contain pre-calculated sub-totals, in myriad combinations and at different levels of aggregation, to enhance speed and usability.

Relates to OLAP, Dimension, Measures


Data mart

A data mart is a structure/access pattern specific to data warehouse environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Data warehouses have an enterprise-wide depth, whereas the information in data marts pertains to a single department.


Data quality

Data quality pertains to aspects such as availability, completeness, accuracy, consistency, relevance, and timeliness of data. High data quality is essential to business intelligence’s role as a means of decisional support.

Examples of poor data quality: missing fields, old or inaccurate information, data conflicts, inaccessible data in legacy systems.


Data staging

The data staging area is a system where all the data extraction, transformation, and loading (ETL) operations are performed. This is the work area where data warehouse developers clean, summarize, filter, decode, and prepare data.

Relates to ETL, Transformation


Data vault

The data vault is a detail-oriented, historical tracking. It is a uniquely linked set of normalized tables that support one or more functional areas of business. A data vault is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model designed specifically to meet the needs of today’s enterprise data warehouses. Data vault is the modeling technique used in EDW - Enterprise Data Warehouse.


Data warehouse

A data warehouse (DW) is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data used for creating analytical reports for workers throughout the enterprise in one single place.

At McGill, we have the following data warehouses: DWH (Data Warehouse), EDW (Enterprise Data Warehouse)


De-normalization

A design technique for relational databases. De-normalization simplifies the table structure of a database, making it quicker and easier for end users to consume. This is done at the cost of some internal efficiency. To de-normalize, data elements with multiple entries can be "flattened" into a single row with additional columns to store the duplicates, and cross-references or calculations may be done ahead of time to save the burden of doing them at the time a query is made.

E.g. A normalized Degree table contains one row per student, per awarded degree. To de-normalize, you would group each student’s rows into a single row and create additional columns to handle multiple degrees. "Awarded Degree" on multiple rows would be replaced by "Awarded Degree 1", "Awarded Degree 2", etc.

Contrasts with Normalization


Dimension

A perspective for analyzing the data in an OLAP cube. Each dimension in a cube is built on fields or columns found in the data. A dimension can be composed of a single attribute, or more. Its data can exist on a single level of granularity, or on multiple levels stored as a hierarchy.

When you are browsing a cube, you can view the data through different combinations of dimensions. If you were to describe a report as "Show me X, broken down by Y and Z," the Y and Z would represent dimensions.

E.g. For a Student Program Enrolment database, the dimensions could include Student, Degree, Program, Major, and Time.

Relates to Dimension table, Dimensional hierarchy, Attribute, Conformed dimension, Data cube


Dimension table

A dimension table is a database object containing the fields and elements that make up one dimension by which a set of data (facts) can be analyzed. A fact table is typically related to multiple dimension tables, using database keys to keep them in synch with one another.

E.g. Student Program Enrolment is a central fact table with multiple dimensions linked to it: Faculty, Student, Term, Degree, etc.

Relates to Dimension, Fact table


Dimensional hierarchy

An arrangement of multiple levels of granularity within a single dimension. With a hierarchy in place, data for a given dimension can be rolled up to aggregated totals, or drilled down into for finer analysis. This can be represented in a data model by multiple columns within a dimension table in standard star schemas called hierarchy columns.

E.g. Degrees are arranged in a hierarchy of Degree Level (Undergraduate, Graduate, etc.) with individual Degrees (Bachelor of Arts, Master of Science, Doctor of Philosophy, etc.) below.

Relates to Aggregation, Drill down, Roll up


Drill down

A BI feature allowing users to click on a value inside a dimension to view a lower (finer) level of detail. Drilling down is the equivalent of expanding or collapsing a sub-total in a table. It is available in dimensions that have a hierarchy.

E.g. From an summary report on Citizenship, a user can drill down into the region of Africa to see how it breaks into individual countries.

Relates to Drill through, Aggregation

Contrasts with Roll up


Drill through

A BI feature allowing users to click on an aggregated number (measure) in a BI table or chart to view the individual data records (facts) that compose that number.

E.g. If a report shows a headcount of 37 students in a given unit, the user can drill through to see exactly who those 37 students are and what their records show.

Relates to Fact, Measures


DWH (Data Warehouse)

McGill's operational data warehouse that provides range of specially- prepared data sources, built on top of Banner and current data from other systems. It is an Oracle database. Also known as ODW, Operational/Old/Oracle Data Warehouse and BanRep.


E-L

EDW (Enterprise Data Warehouse)

McGill's data warehouse that integrates current and historical data in one central repository of data from a range of disparate sources, built on top of Banner and current data from other systems. It is built with the Microsoft BI suite.


ELT (Extract, Load, and Transform)

For the last couple of decades, ETL (Extract, Transform, and Load) was the traditional approach for data warehousing and analytics. The ELT approach changes the old paradigm; instead of first loading extracted data into the target system, ELT performs transformations after the data is loaded into the data warehouse.

ELT is the technique used to manage data in EDW (Enterprise Data Warehouse).

Relates to ETL (Extract, Transform, and Load)


ETL (Extract, Transform, and Load)

The three-step process at the core of data warehousing. First, an organization’s operational data is extracted from multiple sources. Then, it is transformed (where necessary) so the data types, formats, and definitions match what is needed. Finally, it is loaded into the data warehouse where it can be consumed.

Relates to Data staging, ELT (Extract, Load, and Transform), Transformation


Fact

An individual record of business activity that is stored in a data warehouse. Each fact contains one or more measures (numbers, amounts, or prices) and a series of fields (dimensions) by which the fact can later be analyzed. Facts are the foundation of data warehouse tables and OLAP data cubes.

E.g. In the area of Faculty Research, one fact is recorded per researcher, per project that he/she proposes, per year. Each fact contains several measures (dollars requested, dollars awarded, etc.) and several dimension fields (the researcher, sponsoring agency, type of research, title of the project, etc.).

Relates to Fact table, Measures


Fact table

Central table in the data warehouse that contains the business facts being stored. The fact table has columns for each measure and each foreign key linking to a dimension table.

Relates to Fact, Dimension table


Forecasting

One of the activities rendered possible by BI. It is the formulation of trends, predictive models, and scenarios for the purpose of better decision-making.


Gap analysis

Gap analysis is the study of whether the available business data fully meets an organization’s end needs. This answers questions such as: Are the right data sources accessible? Is there enough data to reach meaningful conclusions? Is any data missing? Is data remotely located, or in legacy systems? Gap analysis examines this information and determines what resources and efforts are needed to satisfy requirements.

Relates to Data quality


Granularity

Granularity refers to the level of detail or summarization of data in the data warehouse.

Related to Aggregation


Institutional performance management

The process of basing an organization's actions and decisions on actual measured results of performance. This process integrates performance measures, benchmarks, and goals in order to achieve optimal results. The balanced scorecard is one approach used in performance management.

Relates to Balanced scorecard


Logical data model

A model that represents the logical structure of data, independent of software or hardware implementation constraints. This representation of business concepts is laid out in a visual format that clearly shows these concepts and their various relationships.

Contrasts with Physical data model


M-R

Measures

A measure is a numeric piece of information attached to a business fact that has occurred or a record that is stored. Measures, along with dimensions, are the foundation of data analysis: measures provide the "number" part of a cube while dimensions provide the "description" or "categorization" part. Measures are recorded individually and can then be aggregated to higher levels.

If you were to describe a report as "Show me X broken down by Y and Z," the X would represent the measure.

E.g. A researcher’s $50,000 of funding for year 2007. A student’s count of 4 courses registered in Fall 2009. A department’s average of 3.8 graduate advisees per professor.

Relates to Data cube

Contrasts with Metrics


Metadata

"Data about data," or the sum of all documentation about the data warehousing process. Metadata describes the contents of the data warehouse, its structure, and the processes involved in its setup. Metadata should be updated as the system evolves.

E.g. Documentation about common definitions of terms such as student, investigator, or project. Documentation about technical decisions in the design of the data warehouse.


Metrics

Measures of performance that monitor progress and assess trends within an organization. A metric is the comparison of two or more measures, or a single measure situated in a meaningful context.

E.g. A researcher’s annual amount of funding from 1997 to 2007. The metrics here are based on two measures: funds over time.

Relates to Measures


Normalization

A standard design technique for relational databases. Normalization helps a database and its users to consistently record and retrieve individual transactions with minimal risk of data loss, data error, or duplication.

E.g. Instead of referring to a researcher by name in multiple tables, simply store the name in a table and use an ID number to link to this researcher.

Contrasts with De-normalization


ODS (Operational Data Store)

An integrated database environment designed to support operational monitoring. The Operational Data Store (ODS) contains data that is business-oriented, volatile, current, and detailed. The ODS presents a current view of the state of operations.

E.g. McGill’s data warehouse presents characteristics of an operational data store. Data is refreshed daily, can be queried, and integrates information from across the university (HR/FIS/SIS).


OLAP (Online Analytical Processing)

Online Analytical Processing is an approach to quickly providing answers to complex analytical queries.

E.g. McGill’s BI data cubes are OLAP structures.

Relates to Data cube


OLTP (Online Transactional Processing)

Online Transactional Processing is a type of system that immediately handles operational data and transactions. This operational system allows the smooth functioning of an organization’s daily business. Usually OLTP offers no analytical capabilities.

E.g. Student registration on Minerva.


Partitioning

A technique where a data table or cube is divided into pieces on the server in order to improve manageability or performance.

E.g. A very large table containing historical academic records can be partitioned by term. This will improve performance when querying within any single term.

Relates to Logical data model


Physical data model

The actual implementation model for a data warehouse, based on the technology, hardware, and software.

Contrasts with Logical data model


RDBMS (Relational Database Management System)

A system used to store, process, and manage data arranged in relational tables. Such a database is organized and accessed according to the relationships between data values.

E.g. A database with tables for student, finance, and human resources contains relationships among those tables. A student can be an employee.


Roll up

A BI feature allowing users to summarize data at a higher level.

E.g. A report of admissions in the Department of Physics can be rolled up and summarized into a report of admissions to the Faculty of Science.

Relates to Aggregation

Contrasts with Drill down


S-Z

Slice and dice

An informal name for the feature of a BI tool that allows users to view data from any angle. Pivot tables and interactive pivot charts have this ability: users can easily rotate the presentation between rows and columns (or X and Y axes), swap dimensions and measures in and out, stack dimensions on top of each other, and expand or collapse dimension hierarchies.

E.g. After generating a report on student enrolments by faculty, a user swaps dimensions to see the same enrolments by degree level instead. She then adds a filter to look at only her faculty, or only Canadians and Permanent Residents.

Relates to Drill down, Roll up


Snapshot

A preserved view of a data set at a particular instant in time.

E.g. A snapshot of certain Banner tables is used to produce GDEU reports.


Star schema

A star schema is a logical model used to organize data in a data warehouse. The "star" is a central fact table with an array of dimension tables organized around it. This is a natural fit for cube data: it allows information to be viewed from many perspectives and facilitates multidimensional querying.

E.g. A central fact table for enrollment can have dimensions for Term, Faculty, and Citizenship.

Relates to Fact table, Dimension table, Logical data model


Transformation

Part of the ETL and ELT components of data warehousing. It includes activities that manipulate, validate, and standardize data.

E.g. Program codes are split into different columns: degree and major. The titles of majors can be standardized throughout the system. Misspelled program names are corrected.

Relates to Data quality, ETL (Extract, Transform, and Load), ELT (Extract, Load, and Transform)


references

ADDITIONAL REFERENCES: