The Data Requirements Gap

There are two typical challenges when we gather data requirements. 

First, we need to capture and document the requirements in a way the person providing them understands, so they can review them and be confident you have understood what they need.  Second we need to document the requirements in a way the team delivering the data can understand, so they know what they need to build.

Document Hand Over

Let’s look at the way we typically capture data requirements.

We will  often capture our data requirements in a traditional application-centric requirements document which is used to document the requirements for a report or a dashboard.  This is will typically have a list of fields required, plus a specification of layout and a bunch of functional and non-functional requirements.

Or we will capture a set of high level user stories that are focused on the presentation of the data (i,e reports and dashboards) and pretty much ignores the data altogether.

Once completed the Business Analyst (BA) will hand this document over to a data designer or modeller who then has to understand what the list of fields or user stories mean and where to find them.

Of course, the document or user stories typically holds minimal data context and the data modeller will either have a guess or engage with the key users to ask them more questions to discover the context.

This results in a gap between what the key users asked for, what the BA documented and finally what the data modeller designed for.

Data Driven or Report Driven Modelling Approach

When we first started delivering data using a Data Warehouse approach, we often took a data-driven method.  We would spend a long time looking at the data in the Systems of Record and modelling how we could land and store this data to make it easier to use in the future.   The problem with this approach was it took a long time as we often spent time understanding and modelling data that a user may never consume, just in case it was needed in the future.

Then we moved to a report driven method.  We would document the content required by the users and then acquire and model the data to meet only these reporting needs.  This was often faster to deliver, but the problem was we often ended up with a large number of disparate reporting data sets that overlapped and were hard to manage.

Both of these methods lead to a gap in expectation of the time it took to deliver data and content and the ability to quickly add or change this data in the future.

Business Event Driven Approach

Business Events are events that take place in the course of the normal operation of a business and that reoccur as business processes are executed.

What is BEAM✲?

BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses. It was developed by Lawrence Corr(@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema (Amazon, eBook).

Why BEAM✲?

There are three parts to the BEAM method that provides the Corr of its value:

  1. Business Event Driven
    You identify data requirements based on Business Events.
  2. Bridging the language gap
    The data requirements are documented in a language that can be understood by business users and developers alike.
  3. Modelstorming
    You gather these requirements using a modelstorming approach that is interactive and fun.

Business Events

One of the benefits of documenting data requirements as Business Events is that an organisations Business Events will not change often.  In fact, they tend to markedly change only when the organisation’s core business model changes.

For example, a company that makes Ice Cream will still be making Ice Cream even when they introduce new Ice Cream products, open’s in new markets or targets new types of customers.  However the Data Factories that they use to capture the data and the types of analytics and reporting content they need will change over time.  If the organisation does indeed change the core of what it does, for example, starts delivering health care, then it becomes obvious that a major change has happened then the impact to the data requirements will be major as well.

Bridging the Language Gap

In the past we would typically ask the key business users data specific questions that we understand as data experts.  For example, what are the key dimensions of the business data or what is the “grain” of the data?  We understand these because we have worked with them for years, our business users have not and this leads to either the business users or the developers having to translate.

Good requirements need both developers and business stakeholders to speak a common language and part of BEAM’s magic is that it gives business stakeholders and developers this common language. For example, the business event has a Customer who Purchases Product. Business users understand this concept, there are two business aspects; Customer and Product and the interaction between them. Developers can also understand this.

This enables us to ask the question “what events happen in your business?” which is a much easier way to start gathering requirements than “what entities do you want to report on?” Following up with “what other events happen around the event we just discussed?” is better than “does this FACT table share any dimensions with other FACT tables?”  Rather than relying on the BA or Developer to translate the data requirements, the BA can facilitate the agreement about the answers to the questions using by BEAM method.

BEAM✲ relies on 7 Ws (or really 5 W’s and a couple of H’s); who, what, when, where, why, how and how many.

These simple questions enable you to gather all the data details you need to model and build. You take a business event like Customer Purchases Ice Cream and ask these questions to understand the aspects which are important to that process.

  • Who? – In this case, we have Customer and probably an Employee who are involved in this event;
  • What? – The Product is one of our whats, it’s the item that’s involved;
  • When? – The date/time the purchase happened;
  • Where? – The store the purchase occurred in.
  • Why? – This can be interpreted in different way’s, for example, it was a hot day;
  • How? – The key piece of data that tells me this actually happened, for example, OrderID or Purchase Receipt;
  • How Many? – These are the values that you need to know. e.g. How many products were purchased and what were their costs.

Using this pattern, you can understand the data behind every business event. It gives you a structured set of questions to document the requirements you need to model your data.