How A Semantic Layer Makes Data Mesh Work At Scale

2022-05-27 20:35:25 By : Mr. Dave S.G

CTO, AtScale, empowering customers to democratize data, implement self-service BI & build more agile analytics for better decision making.

In my first article in the series, I introduced a model for assessing your organization's analytics maturity. I suggested a path to achieving maturity that included technology, process and people considerations. In my next article, I offered advice for choosing the right technology to build out your analytics foundation. In this article, we'll focus more on the people and process side of building a mature analytics capability for your organization.

Over the years, we've seen a couple of different organizational models for delivering analytics to the business. While both models have their advantages, each model has some severe drawbacks that make it inadequate for meeting the needs of today's data-hungry consumers.

1. Centralized. The birth of the data warehouse in the 1980s changed everything. By storing data in a single, curated location, everyone could find and query their data with confidence. With central control over the data platform and standards, data could be defined consistently and delivered reliably.

In practice, however, there are a few big problems with this approach. First, only IT could build the data warehouse because data had to be carefully curated and loaded, making IT a bottleneck for integrating new data. Second, since IT typically didn't understand the business, it struggled to translate business requirements into technical requirements—therefore exacerbating the bottleneck and frustrating customers. Finally, business users struggled to parse through thousands of data warehouse tables, making the centralized data warehouse appealing to only the most sophisticated users.

2. Decentralized. Later, as a result of end-user frustration and the explosion in popularity of visualization tools, business users took matters into their own hands. Instead of waiting for IT to deliver data, business users created their own data extracts, data models and reports. By decentralizing data preparation, business users broke free from IT and avoided the "lost in translation" issue associated with the centralized, IT-led approach.

In practice, however, this approach—like the centralized approach—also introduced some major challenges. First, with a lack of control over business definitions, business users created their own versions of reality with every dashboard they authored. As a result, competing business definitions and results destroyed management's confidence and trust in analytics outputs. Second, the decentralized approach drove a proliferation of competing and often incompatible platforms and tooling, making integrating analytics across business units difficult or impossible.

Hub And Spoke: A New Model For Delivering Analytics At Scale

It's clear that neither approach, centralized or decentralized, can deliver agility and consistency at the same time. These goals are in conflict. There is a model, however, that can deliver the best of both worlds if implemented with proper tooling and processes.

The "hub and spoke" model, sometimes referred to as a data mesh architecture, combines a central data team that owns the data platform as well as tooling and process standards with business-embedded data stewards who own the data models for their business domains. This approach solves the "anything goes" phenomenon of the decentralized model while empowering subject matter experts (SMEs) to create data products that match their needs.

The Semantic Layer Is The Key

The key to this approach lies in the semantic data model. It is the semantic model that creates a digital twin of the business by translating bits and bytes into business terms. Domain experts can encode their business knowledge into digital form for others to use.

For this approach to work at scale, it's critical to implement a common semantic layer platform that supports data model sharing, collaboration and ownership. With a semantic layer, the central data team (the "hub") can define common models and definitions (i.e., business calendar, product hierarchy, organization structure) while the domain experts (the "spokes") own and define their business process models (i.e., "shipping," "billing," "marketing"). With the ability to share model assets, business users can combine their models with models from other domain owners to create new mashups for answering deeper questions.

The "hub and spoke" model succeeds because it plays to the strengths of the IT and business teams. IT owns and operates the technical platform, while the business creates its own domain-specific data products using a consistent set of business definitions.

Moving to a "hub and spoke" model for delivering data products doesn't need to be disruptive. There are two paths to success depending on your existing model for analytics delivery.

If your current organization is centralized, the IT and business teams should collectively identify key business data domains and embed an analytics engineer into each. The analytics engineer may come from the central IT team or the business team, if possible. Using a semantic layer platform, the embedded analyst can work inside the business domain team (the spoke) to create data models and data products for that domain. The embedded analytics engineer works with the central data team (the hub) to set standards for tooling and process.

If your current organization is decentralized, create a central data team (the hub) to establish standards for tooling and process. In addition to managing the semantic layer platform and its shared objects and models, the central data team may manage data pipelines and data platforms shared by domain teams. The business domain teams can then designate an existing team member to serve the role of the analytics engineer for the team.

The optimal organizational model for analytics will depend on your organization's size and maturity. However, it's never too early to build for scale. No matter how small, investing in a "hub and spoke," decentralized model for creating data products can pay dividends now and in the future. By promoting data stewardship and ownership by domain experts, using a common set of tools and semantic definitions, your entire organization will be empowered to create data products at scale.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?