October 11, 2021
10 Minutes

Modeler’s Corner #5: Roles and subtypes

Data Industry
Juha Korpela
Chief Product Officer, Ellie.ai

Modeler’s Corner is our series of blog posts on best practices and practical tips & tricks for all you Ellie modelers out there. The series focuses on everyday issues you might face as a Data Modeler. We aim to help you build the most informative, understandable, and efficient business-driven models with Ellie. For more comprehensive training needs, don’t hesitate to ask us!

Roles and Subtypes – how to do it and what’s the difference?

Previously in our Modeler’s Corner series, we covered subtypes and supertypes, a powerful modeling pattern for situations where an entity has recognizable kinds of it while still maintaining some commonalities. It’s a must-have for every data modeler’s toolbox – but occasionally you’ll run into something that almost feels like a subtype situation but doesn’t quite seem to fit. That means you’ve probably encountered roles.

In this Modeler’s Corner entry, we’ll cover roles – what they are, how they differ from subtypes, and what we think you should do with them. Interestingly, we find that whereas identifying your entities in the first place is where the most fundamental modeling mistakes tend to happen, the roles-versus-subtypes issue is perhaps the most usual reason for “technical” modeling errors. Read on to find out how to get it right!

What are roles?

In the context of conceptual data modeling (which is what this blog series is all about!), “roles” are basically different ways an entity can interact with other entities. Consider an example: insurance policies and people involved with them. There are multiple people involved in different roles:

  • Policyholder – the person insured and the “official” insured
  • Claimant – the person requesting payments from the insurance company
  • Additional insured – someone other than the policyholder who is still insured within the same policy
  • Beneficiary – the person who will benefit from the insurance, i.e. actually get money

Any single person could be in any of the roles – including combinations, such as being the policyholder and the beneficiary at the same time.

If you now consider our entities to be “Insurance Policy” and “Person”, you’ll see how there are at least these 4 different ways in which a Person might interact with an Insurance Policy. These are the roles a person can take in our insurance model.

What are the differences between roles and subtypes?

Now, if you have read our previous post on subtypes, you might feel the urge to define our above examples simply as subtypes of the Person entity – after all, aren’t they types of Person? However, this would be a major mistake!

The important distinction between roles and subtypes is that whereas subtypes are always mandatory and mutually exclusive, roles are neither. An instance of an entity (an individual person, for example) can only ever represent a single subtype – and it must always represent one subtype. With roles, this is not true. An individual person can act in multiple roles at the same time: someone might be both the policyholder and the beneficiary of the same insurance policy, while also acting as a claimant on a separate policy.

Thus, roles are always optional. This also means that we need to have a different way to represent roles in our models – Ellie’s box-in-a-box notation is reserved for subtypes. It turns out there are in fact many different ways of doing that, and while all are “correct”, not all of them are equally good in all situations.

3 ways to model roles – and our recommendations

Modeling roles as relationships

We said that “roles are basically different ways an entity can interact with other entities”, and if that sounds like relationships to you, that’s perfectly understandable. Modeling roles as relationships means that when an entity (“Person”) interacts with another entity (“Insurance Policy”) in different ways, you draw a relationship line between them – one for each way of interaction.

Roles as relationships. Note the differences in cardinality: only one policyholder per policy, but e.g. multiple claimants.

This makes your model structurally very simple. You have your core entities, you have your relationships, and all the different interactions are identified as separate relationships with clear naming (you wouldn’t use multiple relationships without naming them, would you?). The roles are visible and understandable.

The problem with this model is that you cannot express anything else about the roles. What if you have different attributes for the primary policyholders and the claimants? What if the people who are claimants should be linked with claim events, but the other non-claimants obviously shouldn’t? The model is simple – but in many situations, perhaps too simple.

Modeling roles as a bridge entity

Some of the more technically oriented among us might perhaps balk at the relationship-based model above because it implies a lot of foreign key attributes between the same two entities (this would come up when creating a more detailed logical model – which is by the way something that you can soon do with Ellie!). What if we come up with a new role?A more generic and future-proof way would be to model the roles as a “bridge entity”, as these things tend to be called. In this separate entity, you would have the links between Persons and Insurance Policies, as well as the type of link, meaning the role.

Roles as a bridge entity. Note how the role type is defined as a reference entity.

Now you have a model which can handle any number of roles, with no changes whatsoever no matter how many roles you add later. The “Person-Insurance Policy role type” reference entity contains all the possible role types (in our case: policyholder, claimant, additional insured, and beneficiary).

Technically, this is very well optimized and powerful. But take a step back and consider what it looks like from a business perspective: what on Earth is a “Person-Insurance Policy role”? Where are the claimants, asks your business expert? You have created a model that describes a technically nice solution, but in the process loses almost all contact with real business.

Some of the key reasons why we do conceptual modeling are improved communication and common understanding. This model forfeits those aspects in favor of technical detail, and should therefore be considered only as a possible implementation alternative on lower-level logical and physical models.

Modeling roles as separate entities

In most of the cases we’ve encountered, the best way to model roles is to model them as separate entities. This means that you have optional-one-to-one relationships between the original entity (“Person”) and the role entities (“Claimant”, “Policyholder”, etc.) – each person can be a claimant and/or a policyholder, and so on. These role-entities are then linked with the Insurance Policy in our example, and in fact, with any entities with which that particular role is relevant.

Roles as separate entities. Note the optional 1:1 “can be” relationships: each Person can be a Claimant, but every Claimant is always a Person.

Modeling the roles this way achieves two very important goals:

  1. Your model has all the important “business words” in it, for improved readability & accessibility
  2. Your model can handle role-specific attributes and/or relationships.

Compared with the bridge-entity model, this one is far easier to intuitively understand; compared to the relationship model, this can express much more business logic. And furthermore, now you have additional entities in your Business Glossary: your next model might have to do with claims processing, so you can just re-use your Claimant entity there for added accuracy instead of the generic Person entity, while still retaining the knowledge that a Claimant is a Person.

This is our recommendation for modeling roles: it maintains readability and business context while giving you the ability to express complex role-specific business logic. If there are little to no differences between the roles in terms of attributes and relationships, then it can of course be simplified into something like the bridge-entity model in technical implementation (e.g. in a logical model), but on the conceptual level, this way of modeling tends to give the best results in terms of understanding and communicating.

Combining roles and subtypes

An additional note before we wrap this up: when you have both roles and subtypes in your modeling toolbox, you can create extremely powerful models that are very rich in information while still being simple to read. Consider this: what if our insurance company offers policies not only to persons but also to companies?

The result is something we’ve seen over and over again in all kinds of businesses, small and large, across a variety of industries: the party model. But we’ll go into more detail on that some other time!

A party model utilizing both subtypes and roles – more on this later!