January 21, 2022
/
5 minutes

Modeler’s Corner #6: Category vs instance

Guide
Data Modeling
Juha Korpela
Chief Product Officer, Ellie.ai

Modeler’s Corner is our series of blog posts on best practices and practical tips & tricks for all you Ellie modelers out there. The series focuses on everyday issues you might face as a Data Modeler. We aim to help you build the most informative, understandable, and efficient business-driven models with Ellie. For more comprehensive training needs, don’t hesitate to ask us!

The “category vs instance” problem

Our previous Modeler’s Corner entries have dealt with various patterns, like subtypes and roles, that are helpful tools in any modeler’s toolbox. This time, we will cover something that is not as much a reusable pattern, but rather a pattern-like problem that all modelers eventually will come across. Being able to recognize this problem is a good tool to have, and of course, there will be some recommendations as to how to solve it.

Our topic is the “category vs instance” problem, also known as the “thing vs type of thing” problem. It’s not the easiest topic if you’ve just started modeling, but once you learn how and why this happens, many things will become clear!

Our example: investing and shares

Let’s consider the following example situation. You are an investor, and you own shares in a few companies. Some of them are perhaps publicly traded, some not, but in any case, you know that you own 300 shares of Company X and 150 shares of Company Y. Perhaps you have organized these in different portfolios.

If you already have a bit of experience in conceptual data modeling, or especially if you’ve read our earlier Modeler’s Corner entry about selecting the right entities, you’ll quickly come up with a list of entities that are needed for a data model describing the above situation. Clearly, there’s “Company” and “Portfolio” and then there’s “Share” and yeah, this is quite straightforward, let’s get drawing…

Hold your horses! What is a “Share”?

We have just walked into a minefield called the “category vs instance” problem. Or, to be more precise with this metaphor, we were about to walk into it, but luckily our category-vs-instance-mine-detector alerted us just in time! Let’s take a careful step back and consider the situation.

What is a “category”?

A “category” could also be called a “type of thing”, or a “classification”. This is pretty intuitive, but the important thing to realize is that the category itself can be an entity. In Ellie’s entity types classification, it would match with the Reference type.

For example, if you were modeling vehicles, you could have a “Vehicle Type” entity. This would have values like “car” and “helicopter”. Simply put, the category (or reference) entity is a list of all possible types.

What is an “instance”?

By “instance”, we mean here the actual thing: the specific vehicle, your car sitting on your driveway. This, of course, is modeled as its own entity. In the vehicle model, our “Vehicle” entity contains all these instances of actual cars and helicopters and boats and whatnot.

A category entity and an instance entity

In this vehicle example, our model would look like the diagram above. Note that there would be an alternative way of modeling this with subtypes – very useful if you have only a couple of possible categories – but for the purposes of this article, we are using a more generic approach.

Reconsidering the investment example: what is a “Share”?

Now we can get back to our original example. If you own 300 shares of Company X, what is a “Share”? A good practice is to always ask: “what is one of these?”

Clearly, it’s an instance – you have 300 instances of Company X shares and 150 instances of Company Y. A “Share” must be an instance because we know that in practice we own a specific number of them, and we could say “I’d like to buy one Share”. But are we really interested in the individual shares?

The answer is no. A single “Share” is nothing! It’s not something that we can have any meaningful information about. There is no practical difference between any of those 300 shares of Company X that you own.

What we have here is an example where the instance is meaningless. It would make no sense for us to add “Share” as an entity into our model: there is no information about them on the instance level. This is the mine we need to avoid.

How to solve the problem

So what is the correct solution, then? Simply put, we must choose. Either we model the instance, or we model the category.

In real life, you do not trade individual shares as if they were physical papers exchanging owners. Everything is managed digitally and highly automated. Central systems keep track of who owns what.

There is simply no need to track individual shares anywhere. What matters is the category, i.e. what types of shares you own, and how much of each type. This means that our model should consider types of share as an entity, and record the owned amount per type instead of attempting to keep track of individual shares. In fact, this is quite important in real life, because companies can sell different types of shares (so there can be more than one type per company).

This way of thinking is the key to solving the “category vs instance” problem:

  1. You need to understand that there are these two options and they are different.
  2. You must make a conscious decision: either your entity represents the category, or it represents the instance. (Of course, it is possible to represent both in the same model, but in that case, there must be two separate entities: a single entity can never be both at the same time.)
  3. Once you’ve decided, it’s vital that you document the entity definitions accordingly. Otherwise, you or someone else will eventually use the same entity in some other context, and there the choice might not be the same anymore!

Investment example, modeled and ready

Now that we have identified and solved our problem (and avoided the mine!), we can safely finish drawing the model. Below is a conceptual model of our investments, with some additional details:

Example model of investments

“Stock” lists here the possible “share types” that are released by a company, and “Composition” records that portfolio A includes 200 pieces of stock X, portfolio B includes 100 pieces of stock Y, et cetera.

Now the model avoids a fruitless attempt to record anything about individual shares, while it captures all the information we are actually interested in.

Conclusion

If you are at all familiar with investing and stock markets, our example might seem quite obvious in hindsight. In real-life stock markets, no-one has cared about the individual “paper shares” in ages: it’s all “two hundred this”, “one thousand that”. The overall lesson, however, is important for us data modelers.

There is a major difference between entities representing instances and entities representing categories. You should always make a conscious effort to identify which is which, and this should be documented (fortunately very easy with Ellie!). Deciding between category and instance should become a natural part of your entity identification and definition procedure.

Often, the first sign of a category-vs-instance-minefield is a tingling sensation that “something doesn’t quite work with my model”. Learn to ask yourself “what is one of these?”, and you’ll have a your mine detector up and running in no time!