Why Data Mesh is the Talk of The Town
The whole data world is now talking about Data Mesh, a new approach to managing centralized data platforms.
Early adopters such as Zalando, Netflix, and many other data-savvy companies, are doing it, and the rest are following their example.
The Slack channel “Data Mesh Learning” grows faster than most other data-related sites.
The term Data Mesh was originally introduced by Zhamak Dehghani from ThoughtWorks. It has a lot to do with the fact that companies such as Netflix, Facebook, and Amazon are originally software startups.
As they’ve become masters of utilizing their data assets (that’s what makes them so successful!) they’ve incorporated app-building practices, such as microservices and agile development, into data management.
all domains, you’ll end up with business-driven information architecture.
The problems of the traditional approach
Data Mesh was developed to tackle the problems of a centrally managed data platform that’s been the paradigm for the past years. Most people are aware of how that works, but here’s a short recap.
The centrally managed basic architecture usually consists of a data lake for unstructured data (so-called big data) and a data warehouse for structured data, such as clients, customers, and products. The data is used for different purposes, such as BI, data analytics, AI/ML, and applications.
All of this is usually managed by a team or department of data engineers and other technical experts. The team is usually in a matrix structure, serving all departments and business units or domains.
This involves various technologies that we are not going to address here in more detail, as Data Mesh is not a new technology – it’s more of a design principle.
According to Dehghani, there are some fundamental challenges in the current, centralized approach.
First, most of the data is always produced in operational systems and applications such as ERPs or CRMs. There are dedicated people who are responsible for these domain-specific applications.
Currently, there is little to no communication between the operational application team and the data platform team. The former doesn’t even necessarily know that “their” data is being brought to a data platform in the first place.
Second, the data team, in turn, doesn’t really understand what’s going on in business applications.
This leads to a situation where none of the groups take ownership of the data as a whole. The outcome of this is that data might be of poor quality, or the quality issues are addressed too late in the process.
The data engineer team working on the centralized platform struggles to produce usable data for consumption as they don’t understand how it’s generated and what it actually means as there is no context to it: it’s just tables with weird names, rows, columns, or files.
How can you deal with data of, for example, customer leads, if you’re not aware of what a “lead” is and how it looks like in the marketing applications?
The Data Mesh concept aims to solve these problems. Next, introduction to the very basics of Data Mesh; there’s a lot more to it of course, but you’ll get the picture.
Data Mesh basics: Domain-centric data pipelines
Eric Evans introduced the theory of domain-driven design, providing a broad framework that emphasizes the meaning of vocabulary in business design. The idea is that applications should be based on domain expertise and language.
In Data Mesh, data pipelines should be built by business domains, meaning that one dedicated team should understand the whole data pipeline of a certain domain, from where it’s produced to how it’s consumed.
Let’s say a company builds automation for managing marketing leads. The dedicated team of that domain would be able to understand the domain (marketing) and it’s operational systems (CRM and other marketing applications). They, therefore, become competent in dealing with leads and other customer data.
This aims to solve the “ownership dilemma” that was addressed earlier and the fact that one must understand business in order to understand and utilize the data it produces.
Data Mesh basics: Data as a product
In the “old architecture”, data is a by-product of operational systems. In Data Mesh, data becomes a product in itself. The dedicated team of a domain builds and maintains these data products.
Data products require product management. Consider, for instance, the biggest fashion online store in Europe, Zalando: they have teams building applications for their webshop. The teams are getting their directions from Product Owners (PO for short). The PO is in charge of the product vision, roadmap, and backlog. Similarly, the Data Mesh teams have their own Product Owners for their data products and similar processes with backlogs and roadmaps.
If data is a product, it should have a customer. In the case of a corporation, the customer is the person utilizing the data, such as a data scientist, business controller, or whoever needs to solve their problems with data.
Identifying the products and the customers is the key to Data Mesh thinking.
Building data warehouses or lakes isn’t the same as it was 10 years ago. It is based on fast sprints, agile methods, and fast experimentation.
Data Mesh reflects a wider shift going on in the IT industry: from building monolithic enterprise applications and software to a microservices architecture, where applications are built from pieces with lots of APIs, and the same services can be utilized in multiple contexts.
Also, the teams are smaller, applying agile tactics as opposed to the large project teams and “waterfall” deployment of the monolithic approach.
Data management is starting to follow the same trends that influence application development as a whole.
Modeling Business Domains
There is still a need for centralized data storage because even though the data pipeline is built by domains, it doesn’t make sense to develop siloed data platforms for each domain.
There is just too much overlap between the departments. Who could really own customer data completely?
Because of the demand for separate data pipelines on the other hand and centralized data platforms on the other, this becomes quite complicated to understand and keep track of.
Business-driven data modeling (also called conceptual data modeling) is ideal to manage this complexity. In fact, it is ideal for domain-based thinking and improving communications between all counterparts.
Let’s have a look at what that means.
What do you mean by…
In business-driven modeling, everything starts with defining the key business concepts. You go:
“What do you mean when you say Customer, Product, or Project?”
The domain experts then explain what they mean in a way that’s understandable for all participants: the developers, other subject matter experts, and even the project management!
After you’ve identified and defined these key concepts of the business domain, you draw relationships between them in order to explain how they relate to each other. This is something where only the actual business domain expert knows what is correct. Let’s take a look at an example:
“What is a Lead?”
Ask the Data Engineers, and they’ll have no idea. Is that surprising, when it’s not really even their job to know that?
No, it’s the marketing domain expert that knows that by heart. Let’s ask her!
The marketing rep explains that in our company, a “lead” is a customer that has ordered a newsletter, downloaded a white paper, and requested more information three times in the past two months.
Now we know what the lead data is about in our hypothetical company. You can easily imagine a thousand different answers to the question; “lead” is not a universal concept, the definitions will vary significantly across companies and industries.
Definition of a concept is a business decision and a rule set by the business domain decision-maker.
By doing conceptual modeling you start understanding the domain-specific rules and its data structure, and that’s what Data Mesh is all about.
Ellie is a perfect match for Data Mesh
When you come up with documented business models of
We have developed Ellie exactly with that purpose in mind. You can create architecture, applications, and data platforms that are based on the language that the business uses.
And you can do it piece by piece, use case by use case. The models are all up to date and reusable, exactly what the Data Mesh approach expects models to be.
Ellie is fully cloud-based, so you can share all the models across your organization. One Ellie client has added more than 200 users into Ellie. First, this seems a bit odd: are there 200 data pros doing modeling in that organization? The answer is no: but there are 200 business domain experts who participate in defining the key concepts, and utilize the documented data structures in their daily work.
In Ellie, the business concepts, such as Customer and Product, are reusable. Once you’ve defined them, you can use them in different models.
The most useful thing in Ellie is that you get things done with it. It’s an easy to use data modeling tool that the work actually proceeds, which is not very often the case when you deal with Enterprise Architecture or reverse engineering.
Even if Data Mesh wouldn’t be your flavor of choice, the thinking behind it is sound. Getting familiar with its basics is recommended for all data professionals.