January 26, 2021
5 Minutes

Modeler’s Corner #1: Identifying entities

Data Modeling

Modeler’s Corner is our series of blog posts on best practices and practical tips & tricks for all you Ellie modelers out there. The series focuses on everyday issues you might face as a Data Modeler. We aim to help you build the most informative, understandable, and efficient business-driven models with Ellie. For more comprehensive training needs, don’t hesitate to ask us!

What Is and What Should Never Be – Identifying entities

Building a jigsaw puzzle is one thing. Knowing which pieces in your box belong to that particular puzzle is another thing. The single most important phase in any modeling effort is also the one that causes the most problems: correctly identifying the entities that make up your model. A puzzle built of wrong pieces will not make much sense to anyone.

At Ellie, we always talk about business-driven modeling. For us, this is the key to bridging the gap between business and IT specialists – a way to find out what the data is about, rather than what it technically looks like. This enables better and more precise communication and makes data utilization easier and more efficient. Business-driven modeling is all about modeling real life instead of systems and databases. However, all modelers at some point find themselves struggling to define what real life is made of!

That’s the Way

There are a bunch of good rules-of-thumb for defining good entities that will help you build better models. We’ll walk you through some of them below.

Rule 1: Entities are singular nouns.

This is a very simple rule that you should always follow, with no exceptions. When you create an entity, it should be a singular noun. Not Customers, but Customer. Not Condition monitoring (e.g. for a machine in a factory), but perhaps Condition measurement instead. Think of the business problem or process you are modeling in verbal form, as a story: “A customer creates an order for some products. We then dispatch the delivery to them.” Pick the nouns from this story, and you have very good candidates for entities.

Rule 2: Entities exist in the real world of the business.

This real-world rule is not always so straight-forward, but it is very important. A system or a report isn’t really what your business is about, right? Your business is about customers and products and factories, or other things like that. Focus on these real-life concepts, and avoid adding into your model the artifacts that perhaps contain information about the real world. Later in a data initiative, we might define that we need information about some part of the model presented in the form of a report or some other analytical solution. Or maybe the data needs to be stored in a system. But you need to understand the difference between defining data content (i.e. a model of real-life entities) and utilizing/storing it (i.e. a report or a system).

Rule 3: Entities are countable.

The third rule is perhaps not entirely self-evident at first, but it’s perhaps the most powerful one. Think about the occurrences or instances the concept represents: your Customer entity represents all your 15,000 customers – you don’t have to add 15,000 entities in your model! If you have an entity where there is nothing to count, you’ve likely modeled something wrong. A common mistake is to add things like process steps or abstract terms as entities. You can’t have three Analytics, nor can you count the number of Billing; these don’t even make any grammatical sense, so why would they make sense in a data model? An easy way to test this is to pick an entity and say out loud to yourself: “I have three *entities*”. Does it make any sense? Make a habit out of this, and you will soon develop an intuitive ability to avoid uncountable entities – after which you no longer need to worry about talking to yourself and getting strange looks from your colleagues!

Rule 4: Attributes are not entities.

Put this way, rule 4 seems quite obvious. Goes without saying, right? However, it’s an extremely easy mistake to make. Your business experts care deeply about KPIs and measures such as turnover, account balance, or customer satisfaction. These are nouns that constantly come up in their “story” that we told you to listen to in rule 1! To avoid modeling attributes as entities, you should ask yourself the following question: “Do these words refer to things in the real world, or are these facts that we know about some other thing?” You can easily see that “customer satisfaction” is a fact that you know about a customer, and “account balance” tells you something about an account. List these as attributes of their respective entities, but don’t promote them to separate entities. Rule 3 (countability of entities) will also help you in these cases, as surely you cannot say “I have three turnovers”!

Dazed and Confused?

Why is this important, then? Can’t we just mindmap whatever on the canvas? Well, yes of course – nothing is stopping you from doing that, and what you’ve read above are merely our suggestions for best practices. There are, however, two main reasons to follow these guidelines.

First, if your model is to be used in actual data development, at some point it needs to turn into logical and physical models describing some kind of data storage. At that point, if you had “Analytics” as an entity, your data engineers will be extremely baffled! What do you store in that table? Your entities need to be something you have data about, and the link between the conceptual entity and the physical data item has to exist.

Secondly, and perhaps more importantly, business-driven modeling is not only about providing requirements to an individual data initiative. It’s also about creating your information architecture piece by piece and concept by concept. If you can define robust, sensible data concepts, you can easily re-use them in other models, or use them to define the ownership scheme in your data governance system, or perhaps link data creation with your process diagrams (where concepts such as “invoice” or “customer” might be the “tokens” flowing through your process steps). A well-defined and widely understood set of key data concepts of the business – your business glossary – is a massively valuable asset for any company that wants to improve its data maturity. And this comes to you almost free, as a by-product of data modeling done right!

In the following installments of this series, we will be talking more about drawing the models and e.g. identifying various situations where certain template-like structures can be utilized as building blocks. All of that is (hopefully!) going to be good and helpful to you, but always remember that your model is only ever as good as your entities are.