Data Mesh: Enabling cross-domain communication with Data Models
Data Mesh is a hot term in the data world right now, and as we wrote in our previous post on the topic, it has elements in it that you need to consider – no matter what you think about the idea as a whole. One of these extremely important features of the Data Mesh paradigm is that the data landscape of an organization is divided into various business domains, connected via a network of Data Products.
However, anyone who has ever worked in an organization with more than a couple of departments knows that crossing the domain boundaries is often difficult, even in terms of casual water cooler talk – let alone moving and merging vast amounts of data in an organized fashion! Every business unit and function has its own terminology and very little understanding (or visibility) of what is going on elsewhere.
How is it possible, then, to build this highly interconnected Mesh across the entire organization? The answer is not “more technology” – it’s all about communication. There has to be a way to communicate between domains in order to understand what all that data is really about. That is what we’ll explore in this article.
Business Capabilities define your domains
Piethein Strengholt has written an excellent article on domains and Domain-Driven Design in the Data Mesh context here. In it, he recommends that before you start figuring out technology stacks and letting loose your data teams, you should take a look at the big picture and identify your “problem spaces”. This is vitally important for clarifying responsibilities and improving the management of the whole setup.
Piethein writes: “For grouping your problem spaces, I encourage you to look at your business architecture. Within business architecture, there are business capabilities: abilities or capacities that a business may possess or exchange to achieve a specific purpose or outcome. Such an abstraction packs data, processes, organization, and technology within a particular context, aligned to the strategic business goals and objectives of your organization.”
In Piethein’s article, he provides an example “business capability map” of a fictional airline company, containing capabilities like “Online Ticket Management”, “Customer Management”, “Flight Management”, “Baggage Handling and Lost Items”, etc. These are, simply put, the building blocks of your business – stuff your organization does in everyday life.
Why do we want to focus on this (and Piethein’s excellent article) here at the start? The critical thing about successful Data Mesh design is that everything, absolutely everything, must be built around this. The business capabilities define your problem spaces: they are the domains that define the Mesh. Systems, databases, applications, APIs, teams, data products – all map into these domains.
This is a business-focused approach to the Mesh, which ensures real value delivery. Going the “usual IT route” and starting with systems architecture instead of business architecture ensures nothing but wasted resources and long-term failure.
Danger zone: why domain boundaries are critical
A single domain, such as “Online Ticket Management”, usually has its own people (your organizational structure likely reflects the business capabilities), processes, and systems. The IT/data landscape of an individual business domain is generally optimized for that domain; clearly, because your company is still in business, the everyday operations within a domain are possible and work reasonably well.
The problem is that while individual domains might be well-optimized internally, it might be very difficult to integrate information between domains. Sure, Sales feeds data to Finance every day, but cross-domain analytical needs require more than just transactional system-to-system feeds. The need to integrate cross-domain data is, after all, the whole point of much of the data management business!
Crossing domain boundaries with data is difficult for many reasons: systems designed for operational within-domain usage might not be easily interoperable, the processes might be difficult to synchronize, or simply the people involved are distant and don’t know each other very well. This is the danger zone, with all kinds of hazards to navigate through.
While technological obstacles are of course important to clear, technology alone isn’t sufficient to ensure successful cross-domain integration. We also need to ensure cross-domain understanding.
Crossing domain boundaries – the danger zone
Perhaps the most obvious problem we data modelers face when doing cross-domain work is the language barrier. And not in terms of English vs. Spanish! Unsurprisingly to us, but quite often as a bit of a shock to the people of one business domain, the folks over at that other domain either a) use the exact same words as you do but mean completely different things by them, or b) use utterly weird and different words for things you have quite clearly defined as something else.
This is not just the data modeler’s problem – it’s everyone’s problem. If we lack a common understanding of what things exist and what we call them, how can we ever hope to technically integrate any data? How can you define requirements for Customer analytics, if “Customer” means seven different things across the enterprise (“seven” is not an exaggeration by the way)? What good does your fancy Data Product do, if its content is not understood outside your domain?
How to build a common language and cross-domain understanding with Data Models
Let’s consider (deceptively!) a simple example. This is a conceptual model of Logistics data:
An example data model for Logistics
It could, for example, define the contents of a Logistics Data Product, managed by your Logistics Data Team in our Logistics Domain.
A very simple diagram like this is extremely effective in documenting a) what content is to be expected in the Data Product and b) what structure it should have. Neither point has to do with the technological solution of the product – it’s the business content and the structure of the data in real life. Sure, if the product is delivered as a relational database schema, it might look like the model above, but when we’re talking about communication, language, and understanding, that is not really important yet (naturally, there would be more detailed technical documentation to describe those aspects).
The model above also reveals some risks – we might be in the danger zone! Consider the Customer entity. For our Logistics Data Product, it’s important to be able to show which customers get which deliveries, but the Logistics department might have a completely different understanding of what a “Customer” means than the Customer Management department.
The same goes for the Vehicle entity. In the context of the Logistics Domain, it’s rather obvious, but the world doesn’t end there. Perhaps our Asset Management team in Finance has their own understanding of how to register vehicles belonging to our company and how they differ from rentals or contractors etc.
Conceptual Data Models (and, on a more detailed level, Logical Data Models as well) can make these danger zones visible. Cross-domain data usage is given; it needs to happen. We should ensure that decisions regarding cross-domain data usage are made consciously, not “by accident”. That’s why we need to be able to document not only the technological designs of our data products but also the business content – that is the only way to achieve a cross-domain understanding of what is where.
The “what” is, however, an even more involved topic than merely documenting our own designs. We also need to consider common language across domains: what do we mean by things? What is a “Customer”?
Why a centralized Glossary is vital for a Data Mesh setup
Whatever Data Products and Domains you have, you need to be able to document their contents. But in documenting these contents, you will encounter language problems. There might not exist a shared understanding of even critical key terms like “Customer” or “Product” (which, by the way, are in our experience some of the most devious things to define in many organizations!).
The only way to solve cross-domain language problems in a sustainable and consistent fashion is to decide what words mean. The result of these decisions is the Business Glossary – a repository of terms and their definitions. Without a Glossary, data models are nothing.
Having agreed on terms in a Glossary means that you can start mapping Data Products to other Data Products in other Domains: Logistics has data on Deliveries related to Customers, Finance has data on Invoices related to Customers… This mapping exercise is simply not feasible at a large scale if it is being only done on the lowest technical level of individual database columns and technical identifiers.
You need to have a higher-level understanding of a) what are your core “things” you have data about, b) how these are defined, and c) which Data Products contain information about which “things”. Technical metadata of the Products and their related components is necessary, but alone it’s not sufficient to give you that big picture.
The Big Picture of a “Mesh piece”
It’s not just a Data Mesh issue…
We have been talking a lot about Data Mesh here. But let’s take a step back here – is all of this actually specific to the Mesh at all?
In any organization following (or attempting to follow) any data management paradigm, you will end up with domain-specific solutions, domain-specific language, and a need to combine data across domains. The danger zones exist, Mesh or no Mesh!
The overall idea presented here on the value of common language and understanding based on business domains and business terms is not new. Whatever your approach, the same basic principles and lessons hold true.
What can be said is that if your approach is Data Mesh, then the importance of these principles and lessons is even greater. Optimizing a single Data Product won’t do much, but setting up a network of interconnected products with a shared language and understanding of their business contents enables something that is certain to be larger than the sum of its parts.