July 16, 2022
7 minutes

How to think of Data as a Product

Product Design
Data Mesh
Data Industry
Juha Korpela
Chief Product Officer, Ellie.ai

“Data is an asset” is a smart-sounding sentence often found in corporate presentations and shiny consultant slides, but recently, especially with the waves of Data Mesh discussion washing over the data business, the tide has turned: now “Data is a Product”. Why is this distinction so important, why is product thinking so important for data, and how can we ensure good Data Product design? These are the topics we will be covering in this blog post!

Data Asset vs. Data Product

The old mantra of “data as an asset” has a good point about data being something that you should care about, keep track of, and maintain as you do with your organization’s physical assets. Some organizations have even adopted various kinds of “data accounting” methods – just like you track your financial assets, they want to track the data assets.

But an “asset” is inherently something relatively passive. If data is an asset, you’d like to hoard it and hope it produces positive value for you over time. The approach is inward-looking: how can I get a lot of data and keep it in good condition?

Product thinking is, by comparison, outward-looking: how is your product being used? By whom? How do the users benefit from it? With the ever-increasing need to utilize data in every possible part of almost any business model, and the rather worrying track record of value creation in data initiatives of the past, thinking of your data as a product might be just the trick needed to make a difference.

Data Product vs. Data as a Product

Something to understand when you start this journey is that not all “data products” are alike, and not everyone uses the term in the same way! In this rather thought-provoking article, author Xavier Gumara Rigol differentiates between “Data Products” (i.e. something that primarily uses data to do something, like a dashboard or even a self-driving car) and “Data as a Product” (i.e. something that primarily provides data, such as a data mart or a machine learning model providing forecasts via an API).

These terms are, however, often used interchangeably, and some consider the only difference between a “Data Product” and “Data as a Product” to be the level of refinement. Are you getting just raw data from the product, or is there something new happening inside the product that gets you more interesting data? In fact, you could consider that the self-driving car is merely taking this line of thinking to its logical conclusion, refining the input data all the way into automated decisions!

If we look at Data Mesh, its inventor Zhamak Dehghani quite clearly defines that the Mesh is all about “Data as a Product” – but then proceeds to use “Data Product” as a shorthand for these products. This is what we will do here as well, to keep things as simple as possible.

What is the benefit of product thinking for data?

Why is this so important? Because we want to create value out of our data.

The key to all product thinking is to understand that the product must be Feasible, Usable, and Valuable. In short, “Feasible” means that the product must be something that can actually be built at a reasonable cost; “Usable” means that its users must be able to find the product and be able to utilize it; and “Valuable” means that by using the product, the users will gain something out of it – value.

Data As A Product

This is an excellent way to think about data! We all want our data to be something that people use and create value out of it, and certainly, we want to avoid wasting money and effort building impossible fever dreams.

Applying product thinking to data means that we turn our eyes towards the actual users and real value creation.

Designing good Data Products

To ensure that your Data Products are actually Feasible, Usable, and Valuable, you must design them well. Data Mesh defines all kinds of capabilities and principles that should be applied to Products, but even if that isn’t your direction at all, you can benefit by following the same high-level thinking.

In Zhamak Dehghani’s book Data Mesh (O’Reilly), she defines what she calls “baseline data product usability attributes”, or in simpler terms, what a good Data Product should be:

  • Discoverable, i.e. something that you can know of and find in the first place
  • Addressable, i.e. clearly identified with one or more access points
  • Understandable, i.e. having a clear semantic meaning
  • Trustworthy and truthful, i.e. being transparent about its sources and quality
  • Natively accessible, i.e. available with whatever tools and connection methods its users require
  • Interoperable, i.e. something that you can safely link with other Data Products
  • Valuable on its own, i.e. containing enough information in itself to be valuable for someone
  • Secure, i.e. having access control and confidentiality built into it and controlled by documented policies.

These are good things to have for any Data Product in any setup, Mesh or no Mesh! At Ellie, we are especially keen on the Understandability and Interoperability aspects of Data Products. Turns out that some smart data modeling can do a lot of good in these areas!

Why data models are so important in Data Product design?

Consider the point above that a Data Product needs to be “Understandable”. What it means in practice is that the user needs to be able to understand what they are getting when they access the Data Product, and this is not just in terms of the technical structure of the output – it’s about the semantic meaning of the data.

In her book on Data Mesh, Dehghani writes about needing to be able to describe the business domain around the data in semantic terms: what is all this about and what are the things this domain deals with? For people within that domain, everything might be self-evidently clear; however, the whole idea of the Mesh, and in fact, any organization that wants to really benefit from data, is to share data between different domains.

Conceptual Model Ellie

We wrote about enabling communication across domains earlier. Our point was (and is!) that data models are an excellent way to concisely explain what “stuff” the domain is made of, and a shared glossary of terms that links these models enables other domains to understand what that means. Having a high-level domain model and well-defined terms in the glossary is vital for achieving cross-domain understanding and interoperability of Data Products.

However, on the level of individual Data Products this does not mean that the domain model should be implemented 1:1. Rather, a Data Product has its own structure which is deemed optimal for its own use cases and users. Some Data Products might be databases with dimensional schemas, some might be flat files, some might be event feeds – or even all of these at the same time if there are multiple different ways the Product needs to be accessed (remember “Natively accessible” from above!).

Logical Model using Ellie

The important thing about a Data Product-level data model is that it needs to be mappable to the model of the domain. The lower-level model is needed to describe what the product looks like, but in order to deliver the semantic meaning, it has to link with the core business terms used in that domain.

Ellie uses mappings between the entities in the more detailed logical models and the conceptual entities in the glossary to achieve this. By providing the domain models, the detailed logical models, and the glossary with all the definitions in one easily accessible and business-friendly place, enables the Data Product users to understand what the data means – and to link it with other Data Products that have to do with the same core terms. Ellie can also provide the data model information via APIs on request, making it possible to technically link the models directly into a Data Product.

Data Product thinking is good for you – in any situation

You should by now have an understanding of what all this product talk is about. The best part of this approach is that you don’t need massive technological or organizational changes upfront: you can just start thinking differently.

When you start considering data as an actual product – something that is Feasible, Usable, and Valuable and has its own users you need to cater to – you are already taking a big step forward on your Data Maturity journey. Of course, this has some practical implications for architecture and implementation, and we will be covering some of that in a later blog post. However, the important thing is to start the thinking process and consider your products not from a technological but from a design perspective.