Data Mesh is Not for Everybody, or Is It?
By now, most of you have probably heard about Data Mesh, which is a whole new paradigm for data management.
We wrote about Data Mesh in an earlier blog post, but here is a short recap.
Data Mesh essentials:
- decentralized vs centralized architecture
- microservices vs monolithic IT-systems
- small and agile teams vs large allocations
- API infrastructure
In addition, there is no Data Mesh without concepts like data products and domain-driven design. We’ll get back to them later in the article.
The inspiration for this text arose from real customers’ use cases. As it happens, there are a growing number of large corporations planning the implementation of Data Mesh with Ellie in it, so we get to hear firsthand information of the reality they live in.
Data Mesh – know your apples from your oranges
Today, Data Mesh is by far the most hyped concept in data management. Many old-school data warehouse guys have lost it for good as they perceive it as one of the latest attempts to tear down their beloved data warehouse paradigm.
Back in the day, there were lots of talk about wiping out the Enterprise Data Warehouse (EDW for short) by big data, self-service BI, and Data Lake folks – with very little success.
What has happened here – the way we see it – is that people are mixing apples and oranges, if you will.
There is, perhaps, some confusion about use cases, and how different the data management challenges are in different types of organizations. Data Mesh saw its first large-scale implementations in the context of tech companies such as Zalando (the biggest online fashion store in Europe) and Netflix. The use cases in those companies, whose entire business is built around digital services and data, can be quite different from more traditional organizations.
In the same fashion, Hadoop-driven big data architecture was originally supposed to be a solution for companies like Facebook (now Meta) and others for analyzing massive amounts of web data.
Predicting user behavior via web analytics can be quite an unusual use case for many other industries outside the big tech and platform companies such as Meta and Amazon. Despite this, some people in the data business – for reasons beyond understanding – wanted to force this Hadoop way of things into all use cases everywhere, including Data Warehousing. This didn’t go very well.
The challenges of Data Mesh for a traditional company
The key to understanding Data Mesh for what it really is, beyond the tech hype, is to start with the basics.
Companies have domains, in other words, business functions or divisions. These domains build and develop, independently, different types of data products.
For instance, in Zalando, one domain might be the online store search engine. Consumers are searching up, for instance, Nike sneakers and the engine recommends items matching that search. So there is a team responsible for data products relating to that.
This small team consists of programmers led by a product owner, who represents the business but preferably also understands backlog management, epics, and other development lingo.
Different industries are, however, also different in terms of their data needs and related competencies. More “traditional” industries, such as manufacturing, logistics, construction, pharma, etc., might not share much with the tech-heavy web companies.
A business domain for an industrial company could be production, shipments, or billing.
Setting up domain-specific development teams is one thing, but in these industries, it might be difficult to find the all-important product owners: business experts who understand data and agile, and who can truly participate in data product development.
One important aspect of getting the business experts really involved in all this is the shared understanding of the data. To achieve that, we need to consider what a “data product” actually is. That, in turn, has to do with a concept called “domain-driven design”.
Data products and Domain-Driven Design: what are they?
There’s a lot of talk nowadays in professional social media about “data products”. There’s also a lot of confusion about what that really means.
A good article that explores the topic, and that makes an important distinction between “data products” and “data-as-a-product”, can be found here. However, for the purposes of this article, let’s try to make it as simple as possible.
A data product is an application or interface that uses or provides data. It could be a customer-facing application: for example, some kind of an online bank in which you apply for a mortgage, and it checks your credit score before making a decision. It might be an internal system monitoring a production line based on a feed of sensor readings. Or it might be just something that provides clean and useful data for your organization’s analytics needs.
For the latter example, for those familiar with data warehousing, a data mart is a type of data product. Data marts are usually domain-oriented data sets built for business intelligence purposes. Usually, they’re built by a small team to solve an analytical business problem such as a dashboard for monitoring sales per location.
Like Data Mesh, domain-driven design (DDD) has its roots in software development tradition. According to DDD, software development should be based on the terminology and logic of the business – the collaboration between business stakeholders and developers is therefore considered crucial.
Domain-driven design principles guide us to organize our technological solutions according to the “real” business, not the other way around. This is highly relevant in the context of data products: they need to match with real business concepts and requirements to ensure quick value creation and reusability.
Use the best parts of Data Mesh
It might be that organizations need to be quite highly matured in terms of data management in order to implement Data Mesh to the fullest. Having data-savvy business experts in every domain to own their data product development pipelines is no small feat.
The thing is, however, that the essential parts of Data Mesh – domain-driven design and product thinking – are not at all unreachable for even “traditional” industries. Yes, Zalando’s search engine team lead is probably highly skilled in agile methods and there’s a ton of technical gizmos running all kinds of automated data pipelines, but we have to consider what is the core of the whole setup.
Perhaps the essential question is:
How can we get business people and other domain experts, the actual consumers and producers of our data, involved in data development in a way that would ensure true business value from every project?
Business-Driven Data Modeling to the rescue
Several multinationals have approached us here at Ellie with that thought in mind: they are considering or already implementing Data Mesh and/or domain-driven design.
With Ellie, our data modeling & governance tool, they have the ability to define their data assets and products in a simple way that connects business and IT experts.
Ellie helps you create business-driven data models with domain experts that can be derived into more technical data models, that in turn can become the actual blueprint for your data product. This bridges the gap.
A very useful feature in Ellie is its cloud-native architecture that takes sharing and collaboration to a whole new level. No installations, no screenshots lost rotting on shared folders – just give everyone direct access right there on their web browsers. Easy sharing and collaboration are quite vital in a Data Mesh setup, as it calls for maximum cooperation (even in remote working conditions) between data consumers and data engineers.
Judging by the interest and attention Ellie has experienced lately, it’s no wonder our revenue doubled in 2021. There’s a burning need for reaching that shared understanding about your data.
Don’t rush into Data Mesh
You can see data nerds arguing online whether Data Mesh can replace Data Warehousing or is it just another hype term, doomed to be eventually forgotten.
On one hand, we firmly believe that there’s something very valuable here – some of its fundamentals are simply too valuable to be ignored. The Mesh paradigm and its decentralized, business-focused nature capture something that has perhaps been getting too little attention in the tech-heavy data talk of the last few years.
On the other hand, when you look at the conversation taking place on the Data Mesh Learning Slack channel, you can’t help thinking whether this resembles the hype around Hadoop-driven big data ten years ago – like with Hadoop earlier, millions might go to waste as organizations rush to invest in Data Mesh as a purely technical exercise without understanding its core idea or thinking how it fits their specific industry.
Smart companies get the basics right first and pay attention to how to utilize the best parts of Data Mesh even where the overall organizational data maturity might not yet be enough for a full-fledged bells-and-whistles implementation.
Thinking about your business domains and data products, and figuring out how to involve the real business experts in data development, is the key to success.
If you’re interested in hearing more about our experiences in helping companies realize the true value of Data Mesh, drop us a line, and we’ll have a chat!