For most of my data-focused career, I’ve been dealing with semantic layers one way or another. Either because the tool I was using to present data required it explicitly or because the solution itself needed data to have relationships defined to make sense and be better organized.
With the recent focus and hype on AI-infused solutions, there’s been seeing more and more chatter around semantic layers. What is it? What is it used for? Does my organization need one? and, What does it have to do with AI?
What are semantic layers?
In its simplest form, a semantic layer is a collection of rules on how different data concepts are related. For example, your organization may have the concept of office location and territory. Each office location belongs to one (and only one) territory. A semantic layer would contain the definition that a group of office locations will make up a territory. Similarly, a Person may have a current address assigned to them. All these definitions are typically defined by the business and how it operates. A typical business analyst would be able to define this in your organization.
The semantic layer bridges the gap between how the data is stored and how it’s used by the business.
History of semantic layers
Pre 2000s
1970s-1980s: As relational databases started to become conceptualized, there was a need to create high-level, business-oriented, views. These sometimes included some sort of business logic in the form of rollups, simple aggregations, and more. These concepts started laying the groundwork for modern-day semantic layers.
1980s-1990s: Data warehousing started to become common and we saw the emergence of OLAP cubes. The whole purpose of data warehousing was to serve analytical processing, again, for business consumption. We saw the rise of Ralph Kimball’s modeling approach (which is still very much relevant today). This started to focus on business needs when relating data tables in a warehouse.
Additionally, we saw the invention of the Online Analytical Processing (OLAP) Cubes. This took data warehousing a step further because multi-dimensional “cubes” allowed data to be accessed in multiple intersections of the dimensions the data had a relationship with. You can try to visualize a 3-dimensional cube that hosts transaction data with the axis being: Time, Cashier, Product and the intersection being the Sales Price. Any point in the cube will hold Sales for all permutations of the dimensions.
Prior to 2000, accessing data still required a high level of technical skill in addition to understanding how the business would use the data in order to solve problems or perform day-to-day operations.
Early 2000s: The Rise of Semantic Layers
The early 2000s saw a significant increase in the popularity of semantic layers. This was primarily driven by the adoption of business intelligence tools. Companies like Business Objects, Cognos, Hyperion, and MicroStrategy all had their own semantic layers. The aim was to make it easier for business users to access data.
Business Intelligence tools utilized their own semantic layers to provide:
- Consistency and governance
- Performance utilization
- Caching and precalculated aggregates
- Some tools had their own in-memory layer that served as a quicker way to store aggregated data for quick retrieval
- Dashboards and reporting
- Users could create their own reporting and dashboards without going to IT by leveraging the business-friendly entities without having to worry about how the underlying data was structured.
The Fall of Semantic Layers
As BI tools (and semantic layers) became more popular, a new type of professional was born: The Business Intelligence Professional. These were highly-analytical people who sat between IT and the business users who were able to translate business requirements into IT requirements. Additionally, they were able to create semantic models and configure the different business intelligence platforms to extract the needed business value from the stored data.
As business intelligence tools became more monolithic and harder to maintain, we started to see the emergence of departmental business intelligence tools. The most notable example is Tableau.
In 2005 Tableau launched on the world with the promise of “eliminating IT” from business intelligence. Users had the ability to connect to databases, spreadsheets, and other data directly, without the need for someone in the organization to provide connectivity or curate the data.
Because of how easy it became for the business users to connect to data and manipulate it. There was no “single version of the truth”, no governance on the data being consumed, and certainly no centralized semantic layer that housed the enterprise’s business rules. Instead, each business user or department had their own view (and presentation of the data). The time from requiring data to be presented or reported on to the time it actually happened was reduced dramatically. It was during this time that enterprise-wide semantic layers became less popular.
In parallel, many of the business rules started to become more and more incorporated into the ETL and ELT processes. This allowed for some of the semantics to be precalculated before it was consumed by the business intelligence layer. This had many drawbacks that were not apparent to the typical Data Engineer, but were very apparent (and important) to the business intelligence professionals.
The rise (again) of Semantic Layers
As time went by, we started seeing business executives, business operators, and other data consumers doubt the veracity of the data. Since there was no centralized location, there was no central owner of the data. This is when the industry started seeing the creation of the Chief Data Office role which, among other things, typically has the responsibility of data governance.
For some years, the battle between centralized BI and departmental BI continued. Agility vs uniformity constantly fueled arguments and as companies started to force centralized BI, the emergence of shadow IT groups within organizations started popping up. You can likely see this in your own organization where departments run part of their operations in Excel because of lack of access to proper data.
We also saw the popularity of Analytics Center of Excellences increasing. They took care of data governance and the single version of the truth. The greatest tool at their disposal was the mighty semantic layer.
Enter Gen AI
No doubt generative AI has taken the world by storm. Everyone is trying to make sense of it: how do I use it? Am I doing it correctly? What do I not know? One thing is for certain: for Gen AI to work properly, it needs to understand how the user uses the data and what it means for them. This is accomplished by semantic layers. This little concept that has been sticking with us for decades is suddenly even more important than it was in the past.
There is a current push for smaller, purpose-built, LLMs. This will, undoubtedly, increase the importance of semantic layers to feed necessary metadata to the application that is making use of them.
What’s going on right now?
Right now, we can see more and more semantic layer-only tools that are decoupled from a business intelligence platform. Companies like AtScale, Denodo, and Dreamio promise to host the business rules and apply them to queries issued by business intelligence and visualization tools. They act as a broker between such tools and the underlying data. This, in theory, has the great benefit of having many tools utilize the semantics built into the data in their favorite tool of choice whether that is a command-line SQL interface, a REST API call, or a visualization tool like Tableau. Additionally, companies like Tableau, which didn’t have semantic layers before, are adding semantic layer capabilities to their suite of tools. Others, such as Strategy (former MicroStrategy) are decoupling their powerful semantic layer from their BI suite to provide it as a standalone product.
Does my organization need one?
By now you probably already have an idea of the answer to the question of whether your organization could benefit from a semantic layer. If you want your organization to succeed in its quest to leverage AI properly and derive proper business insight from it, you should think about what is telling AI how your business operates and how that data is organized.
What do I do now?
Contact Perficient for a conversation around how we can help your organization leverage analytical tools (including artificial intelligence) properly through our experience with semantic models.
Source: Read MoreÂ