HUS – An Ellie Customer Success Story
In this blog, we’ll tell you about a successful enterprise data warehousing project in a large organization. Judging by our decades of collective experience in the field, the approach applied in the project can be considered extremely advanced in terms of building an enterprise data warehouse (EDW). We will provide here a detailed description of one of the most ambitious, challenging, and largest data platform projects delivered in the EU, and its architecture.
Helsinki University Hospital (HUS) is one of the largest in Europe. It encompasses 17 hospitals and has all major medical specialties represented. Of the 17 hospitals, Töölö hospital is one of the largest trauma centers in northern Europe, with a catchment area of two million inhabitants. Some two years ago HUS decided to build a data platform that would be used for all data, whether it’s structured (such as financial data) or unstructured (such as X-ray photos).
Now, if you happen to have some experience in data stuff, you already know this is a very difficult task! The concept of the enterprise data warehouse system or EDW – meaning, one centrally managed data warehouse – has not been a definite success in the past. It rarely has worked out as intended in practice. Back in the day, the technology just wasn’t ready. The projects were often too time-consuming, expensive, and difficult to implement. That is the reason why larger corporations and public institutions ended up building several independent department- or location-wide data warehouses.
Things have changed, though. When HUS started their project, technology and methods weren’t the same as in the early 2000s or before. They had evolved a great deal. Perhaps now was the time to bring up the old concept again? Only this time it was called the data platform.
What are the pitfalls of a typical data warehouse project?
The project team was formed of seasoned data experts who knew very well the pitfalls of a data warehouse project and wanted to avoid them right from the start. These are:
- Lack of expandability
Many DW architecture solutions are difficult to expand and develop further. That’s why Data Vault 2.0 -methodology was chosen, as it’s the most flexible solution for modern multi-usage platforms. To scale up the data processing, cloud technologies were utilized.
- IT-centric project
The team of data experts knew from the past that to build a platform that really benefits business, it is essential to engage business domain experts in the project. Conceptual data modeling was chosen to do that, as it’s an efficient method to gather business requirements.
- Lack of automation
The goal was to maximize the level of automation in the data pipeline. Data warehouse automation tools weren’t that common when HUS began the project. Yet, the team decided to implement WhereScape as it would save a lot of time and money.
- Source system-driven approach
Traditionally, data warehouses have been built by copying the source system structures into the DW. This could work for a local data warehouse but doesn’t work for a real EDW. For instance, at HUS, the Patient Management System was going to be changed. It wouldn’t have made sense to copy the structure of the old system, as the solution would’ve needed extensive refactoring very soon.
Many industry thought leaders are nowadays supporting the idea of a model-based DW instead of the source system-based approach. New Ellie data modeling tool was utilized to achieve this by designing a DW model based on business reality, not source structures.
The Architecture of an Enterprise Data Warehousing Project
At the start, the platform was named “Big Data platform”. Even though it indicated the usage of unstructured data, the first implementation phase converted structured finance data into the DW model. Data Vault was chosen as the DW design methodology.
As Data Vault is a very complex design for business users, Ellie was used for building the business-driven conceptual data models first. These models would work as blueprints of the data warehouse.
HUS applies architecture that combines both Data Lake (for big data) and a relational database system (for the traditional data warehouse). The Data Lake is used as a staging area: all data is brought into it in raw format. The Data Vault consists of the Raw Data Vault layer and Business Data Vault layer.
The data mart layer on top of the Data Vault is built using star schema design, which is typical in EDW-solutions. In this design, the data marts are not physical implementations: they are virtualized using views. Data is already in many areas refined and re-structured in the Business Data Vault layer: this helps to build views faster and with better performance.
The process of designing an Enterprise Data Warehousing Project – Data Pipeline
At the beginning of a data platform exercise, it’s important to understand how the business and data are related to one another. Too often the assumption is that the data experts would just somehow magically dig out information from source system databases. This is one of the biggest misunderstandings in the history of data warehousing: the data expert cannot comprehend the data structures, their context, and how the data is generated, without talking to business domain experts.
Ellie as a business-driven data modeling tool turned out to be very suitable for the process.
In the HUS-case, Ellie models were first composed with business domain experts, resulting in business-oriented conceptual data models. The way businesses think of their data is described in data models and data definitions. So the main emphasis is not how the operational systems happen to define data but in the business “reality”.
In the second phase, these models were adjusted and modified based on the source systems data, to account e.g. things that should exist from a business point-of-view but of which there is no systematic record. The result was a model that depicts almost perfectly the actual available data content – but still in business terms.
These models were then exported from Ellie and ingested into WhereScape, in which they were turned into Data Vault structures. With this process, the requirements (i.e. Ellie-models) were automatically fed into the data pipeline.
In this way, the business view and interpretation of their data are reflected in the data warehouse structure. This is called model-driven development which leads to a more business-friendly data warehouse – instead of the often used source system-based data warehousing which only technically copies the old, vendor-specific ERP structures.
For reporting purposes, Power BI and SQL queries were taught for about 200 business controllers, who were considered the primary consumers of the data at this phase. Other user groups, such as Data Scientists, utilize the Data Lake area of the platform for experimental data analytics and AI development.
What are the benefits of an Enterprise Data Warehousing Project?
A successful EDW implementation consists of two interconnected layers: the technical and the business layer.
The technical layer is about things such as Data Vault design, tables, ETL, metadata, coding, and configurations in your tools and cloud infrastructure. In the HUS case, the key success factors in this layer were not only in technology but also in ways of working: agile methodologies, piloting, and demoing. This meant that they were able to bring some data pretty quickly to business, even though it would’ve been just a fraction of all possible data.
The business layer is about collaboration with all counterparts such as internal IT staff, consultants, tech vendors, business experts, and so on. With Ellie’s modeling techniques you can speed that up a great deal, as communication between all stakeholders becomes easier and more precise. According to the IT managers of HUS, about 40% of the time was saved in the usually tedious requirements gathering phase by using Ellie.
Going forward, HUS can now continuously maintain and develop new models as their needs and data use cases expand and change. Ellie gives new users at HUS a clear overall picture of all the data assets in the organization.
In the HUS case, as with all data platform projects, internal selling was a necessity. This means that you need to inform all your data consumers that the new platform is coming up; it requires lots of PowerPoint presentations, pitching, and persuasion. If nobody knows about the new data platform, they won’t start using it, and it will become a legacy system in no time.
This was tackled in HUS right at the beginning, as the Product Owner of the platform wasn’t a technical expert, but a transformation leader. This was a true game-changer, as he was able to change the corporate data culture and bring in e.g. Data Governance policies. This included more responsibility, commitment, and education to HUS IT and business people.
The New Era of Data Warehousing/Platforming
It is clear that if you build a shopping mall or large skyscraper, there is a blueprint on how the building should eventually look like. Even the idea of building something without one would sound crazy. In data warehousing, it’s equally important.
After the higher level blueprint (i.e. conceptual data models in Ellie) is done, the actual database should be built incrementally. This ensures that the so-called “time-to-market”, an essential KPI of such a project, is kept optimal. In other words, the data consumer – say, a business executive of a certain domain – should have their first BI reports available very quickly, from the first batch of data. After such practical results from the first area have been produced, the project moves to the next domain, and so forth. This is the essence of Agile development. Even though Agile methodology has been around for some time now, some still build data platforms as waterfalls. This is one of the main reasons why DW & data platform projects are often considered long and heavy.
The so-called Big Data movement a few years back would have claimed this isn’t the best way to go, as data modeling is “old school” and takes too much time. It is in fact the opposite. All data professionals now know that without modeling, it all becomes a big mess: it will take basically forever to get anything out of that big data system. You would need an army of coders to develop and maintain this kind of monster.
As mentioned above, the technology and tools available have evolved massively. What used to be the bitter realities of data warehousing are no longer valid. Utilizing Ellie and the rest of the modern tech stack with modern, agile methodologies, you can speed up the development process enormously. The EDW is viable again!