Model Domain Identity but Depend on System Identity

Introduction

There have been many debates over the years, dating all the way back to when the idea for relational databases was conceived, on which type of identifier or key is best to use. A natural key or surrogate key. But as in most best practices or which one is better debates, the wrong question is being asked. Asking an either or type question, is searching for a silver bullet answer.

What should be asked is what type of scenario or activity is each good for. This can build some context around when it is more appropriate to use one over the other, playing just on their individual strengths. With that context in place, each can be further generalized to having a single responsibility. A cost is associated to the mixing of those responsibilities as we’re not just playing to each other’s strengths anymore. Like many other ideas, its strengths, the things it is most valued for, are the very things that can turn into weaknesses if applied in the wrong context.

After identifying these responsibilities, along with some context of when and how to use each type of Id, cheaper and higher quality decisions can emerge. But before we can do that, we need to find and test the assumptions that tend to lead us astray preventing us from coming to these deeper insights.

Note: These concepts aren’t bound to a relational database, or any particular technology for that matter, but to keep the terminology consistent, I’ll refer to the terminology popularized by relational databases. That is, Primary Key, Surrogate Key, and Natural Key.

Finding the assumptions

Natural keys are so appealing because they feel so natural (pun intended) when we think and speak about a business domain. The terminology of a particular Id is used openly and freely by the business when they discuss their processes and the specific entities that are utilized within them. They’re telling you how they want to identify a particular entity. All too often, it’s these types of seemingly simple and innocent ideas that can cause us to have implicit assumptions around a domain concept that infers and imposes details into our solutions that don’t really exist. In this case, the learning about a natural key inferring a very specific type of implementation. That it should be the primary key.

Why does this happen? When we’re doing analysis, the domain experts are concentrated in the problem domain. Their view of the concepts and particular problems we’re trying to solve are much more pure than ours. We’re only mere visitors in the problem domain. Our primary domain, the solution domain, is consumed by technical details. So much so, we tend to get lost in them, the technologies we’re using, the hip new pattern we’re trying to apply, etc. that we lose touch with the problems we’re being tasked to solve. Because of this, our view tends to be polluted with irrelevant and potentially damaging details during analysis.

When you listen to a domain expert speak about a concept or problem, how quick do you jump to a solution in your head polluting the ideas with all sorts of technical jargon? Instead, try to separate the activity of domain learning from solution exploring. With these separated, it’s our job, not the domain expert’s, to do this conscious mapping between the problem and solution domains. Make this a deliberate and thoughtful process and not just an accidental process that re-enforces conclusions from imaginary and untested assumptions.

Separating Responsibilities

What the business is really telling us is how they want to reference a particular entity. This identifier is a concept of the problem domain. A primary key is a concept of the solution domain and NOT the problem domain. It is how the system wants to reference a particular entity. This is an important point. How an entity’s data is partitioned and related to each other is a solution domain concern. Put more bluntly, it is a technical or implementation detail.

While there should be alignment between the problem and solution domains on the partitioning of the data and its responsibilities, how the data is partitioned and how it is pulled together when it’s needed, is a technical detail. Business stakeholders and users of the software don’t care how this data is partitioned or the fact it even is. As long as the rules are upheld and they get the benefits they were anticipating, they will be as satisfied as they’ll ever be.

After identifying these responsibilities, it should start to become clearer that a natural key needs to be modeled, but it shouldn’t be the primary key. The natural key can be an attribute on an entity just like any of its other attributes. It can have rules around it like any other piece of data and in this case, the rule to be enforced is that of uniqueness. But what is the uniqueness bound to? A single entity? Time? The state of a process or another entity? Point being, not only should the implementation of a primary key not be inferred by the requirement of uniqueness and identity, but additionally, determine, or at least consider, what the uniqueness rule of the data in the domain is bound to.

Adding to these responsibilities thus further strengthening the case to keep them separate, is the concept of data ownership. The system does not own the natural keys in a domain, yet are treated as if it does when they are used as the primary key. The only way for the system to own the Id is for it to generate it. The concept the surrogate Id is for should have a logical meaning in the domain, but its value should not. If it did, the system wouldn’t own it. The value should have no context to the domain whatsoever. While database generated Ids can be used, it is most certainly not implied here. Any Id generation algorithm will work so long as it provides uniqueness across the entities of a certain type, within the system.

Making Educated Decisions

Have you ever thought you understood something, made a decision, and by the time your project was done, wished you had made a different one? This is a very common experience, because in most cases, we’re fairly new to a particular domain when we start a project. Our understanding of it and the concepts from it that are represented in the software will change over time. Problem is, more often than not, our naive assumptions about the rules and concepts are tested too late in the game. When we receive requests to enhance or change the software after we already built something around those assumptions.

Agile, or any type of iterative development practice for that matter, doesn’t inherently solve this. Those practices are great for receiving quick post-game feedback on decisions made on small pieces of work. But some decisions, if initially wrong, can be fairly costly to change. Those decisions should not be solely made on the simplest thing that could possibly work mantra. Instead, those decisions should be evaluated and your assumptions tested through a series of trade-offs resulting in the simplest educated thing that could possibly work. Without the trade-offs to build enough context, a better, but slightly more complicated solution, might not even be considered. Or even worse, when it is, it could turn a cheap decision into an expensive one.

During analysis, it is very easy to mistake slowly changing pieces of data as being immutable. This is the exact type of data that we falsely assume won’t change and will masquerade itself as a natural key. Using this value as a natural key is a bet against change instead of a preparation to allow it. Even if your assumptions are currently right, this doesn’t mean the business won’t ask for a change in the future that will invalidate them. A value in the domain is immutable, until it isn’t.

A great example of this is a “code” that is used in a domain to identify an entity. Something like a Product Code. While they uniquely identify a particular entity, they tend to have format requirements associated with them that are external to the software. More often than not, the people building the software are completely unaware of this. When marketing wants to change the format, the separate system the value is originating in changes the format, or just a simple typo occurs, that “immutable” code in your system, will have to change.

Isolating Identity Value Changes

If what’s in question is the stability of our understanding of a concept or the value in the domain and how it might change over time, more analysis is probably in order and it might be best to defer the decision. If a decision must be made, the reversibility of that decision should be considered. Stating it another way, what would be the cost and impact if we made the decision incorrectly? If a value was used as a natural key and was duplicated across a system, changing this would be fairly significant. The size of the request to change this is unfortunately disproportional to the cost of change.

To mitigate this risk, the decision and its various options should be made less significant in the architecture. This is accomplished by encapsulating the decision thus making the cost of change more reasonable. This is exactly what good architecture is all about.

Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change. -Grady Booch

By isolating and encapsulating responsibilities, when the details change, the cost of the change is proportional to the size of the request. With the domain data isolated to one place, it is open to change as freely as it wants. The separation of system identity and domain identity creates flexibility. If they were one and the same, change would be a fairly difficult distributed change. The isolation turns the change into a cheap local change. Only the immutable, system generated, and system owned Id is shared and duplicated within the system.

Even with this flexibility within your system, if these values do change, there’s probably going to be issues outside of your system. While this is true, neither using a surrogate key or natural key will solve this. We were never trying to eliminate the problem though. Just control the cost of it.

Technical Implications

Obviously, having one Id is more simple than having two Ids for the same entity. But keep in mind, they each have their single responsibility. The natural key is for the end-user while the surrogate key is for the system. Correlation between the two should only need to occur a maximum of once per business process. Depending on the UI design and architecture, this could be further lowered. Additionally, the correlation code in the software could be re-used so it’s only written once.

After the correlation has happened, the rest of the process can use the internal system Id for the entity. As far as viewing the natural key on various UI screens, it can be treated and composed like any other piece of data in the system.

Summary

Some initial decisions aren’t as reversible or malleable as others, and as a result, carry a fairly significant cost if wrong. The risk of a high cost of change can mitigated by evaluating the relevant trade-offs through an educated decision making process. In this case, for a small investment, by creating a second identifier owned and generated by the system, the cost of change is controlled, reasonable, and proportional to the size of the request from the business.

This is accomplished by insulating the identity values from business concerns and by only depending on the internal system generated surrogate Ids when operating within the system. The natural key is still modeled and accounted for, but is treated like any other piece of data that an entity has associated with it.

Bypassing the System is Risky Business

Introduction

From time to time, cases will arise when a system contains data in a state that the business considers to be invalid or undesirable. A common cause of this is when a bug in the software alters a set of data in an unexpected or incorrect manner. A less common cause is when an edge case that the software doesn’t currently support is experienced creating a gap between what is reflected in the system and the real world. Regardless of the cause or case, more often than not, there can be negative business consequences if the invalid data isn’t corrected.

All too often, the response to these problems is to write scripts that bypass the system to update the data to a different and desired state. This data-centric thinking, viewing the problem as a data update problem, tends to misguide and emphasize riskier solutions by narrowing the problem focus in the wrong area.

Bypassing the system is when you go directly to the database, circumventing the rules of the system, to apply an update to the data.

This definition does not discriminate in terms of technologies. Bypassing the system isn’t limited to using the natural built-in language, api, etc. of the database of choice. A system is bypassed when any technology is used to circumvent the rules of the system.

Through those quick and short-sighted solutions, the business can open itself up to much more risk than it had with the problem they were originally facing. Fortunately, there are much more direct solutions to these problems that help control and mitigate risk as opposed to increasing it.

The Risk

A software system defines rules and behaviors around data to constrain it in order to optimize for some business benefit in a controlled, reliable, repeatable, and automated way.

These are the very benefits that are being undermined when a system is bypassed. Different code is run against the data to correct it than the code that has been written to enforce its validity and maintain its integrity. This can further corrupt the data or corrupt even more data that was actually valid. Outside of the obvious problems with data corruption, one often overlooked problem is the fact that it can introduce undefined behaviors in the system as the invalid state was not something designed for or considered.

Increasing the Risk

Today, systems are becoming larger, more complicated, and dealing with a lot more data than they have in the past. This is due to the digital transformation most businesses are going through to stay competitive in their respected markets. These systems aren’t just about capturing data and executing some logic around it. Enterprise wide processes are being modeled and built directly into these systems.

This type of problem complexity is handled by splitting the large problems into many smaller problems. This means multiple components in a system need to work together to solve a larger problem. If you’re bypassing the system, each component’s responsibility, how they interact, and what their impact and influence on each other is, must be understood and considered.

Another way problem complexity is handled is through having multiple models to optimize or accommodate a particular scenario. If data is replicated or passed between these models and the behaviors of the system are bypassed, the introduction of inconsistent data is probable. Data also tends to flow through multiple models and business processes in a system. If this design is bypassed, inconsistent states or incorrect decisions in the related business processes are also probable.

In addition to advancements in the architecture space, many new technologies have been introduced and popularized in reaction to these types of demands. Using just one of these is often too limiting, so it’s fairly common to see newer systems being built with many different technologies and architectures. Databases are only able to enforce a very small subset of the rules of the systems being built today so it’s very common to see business rules being implemented in many different technologies on many different tiers.

Real Consequences

An insurance provider needed to update the amount they would reimburse customers for a particular product if it was defective. To update the amount, they bypassed the system using a SQL script. Because the rules of the system were not adhered to, data became corrupt. It took 5 business days for the company to realize it’s mistake. During that time, they lost $1.8 million dollars due to the corrupt data incorrectly influencing decisions.

Another company, a software platform provider, already in the habit of executing scripts against a particular area of the system, executed a script that introduced duplicate records. This went unnoticed for a week or so until that data was actually needed. Given the software didn’t expect duplicate records, certain business operations failed causing the specific department in the business to come to a screeching halt. This had a direct impact on the customer and the platform provider’s ability to deliver what they agreed to in the time frame they agreed to delivery it in.

In both cases, the change seemed simple in isolation, but had a very large negative impact on cost, time, and the ability for an organization to function as expected. Even worse and unfortunately common when a system is bypassed, the impact was delayed. The cost of these problems to the business increases based on how valuable it is for the data to be correct and that cost can be amplified through a delayed impact.

Taking a step back, have you ever experienced problems, no matter how big or how small, as a result of bypassing a system? Unfortunately, chances are, you have.

Leverage the System

Who owns the data? Who allowed the data to get into its current form? Who has the authority to transition the data from one state to another? You got it. The System

Instead of viewing the problem as a data update problem, take a behavior-centric view of it. Ask yourself questions in terms of behaviors. What behavior is wrong that is producing this undesired data, that when adjusted, will correct it? What sets of behaviors can be applied to transform the data into the desired state? Look at how the data in question is being used. What are the rules and invariants currently in the system that must always be true about the data? What else depends on the data? Is it pushed to other processes or models?

Investigate how the behaviors in the system can be leveraged to resolve the issue. More often than not, if you can leverage what the system already knows how to do, you can narrow your focus to only one component. Interacting with that single component will ensure the data maintains its integrity and the downstream processes are eventually consistent with the desired state changes.

How do you interact with that component? Well, who owns the data again? By taking a behavior-centric view of the problem, it should start becoming clearer that we need to extend the software in order to properly address the issue. Extend the system to detect the problem condition and then migrate the data into the desired state by interacting with the relevant component(s). The only new behavior is the detection of the problem condition. The existing code and the behaviors it implements, are leveraged to ensure the system is kept in a valid and deterministic state.

It is important that unless there was a bug that was causing the data to be in the undesired state, that the production code is not modified but extended. This is both the single-responsibility and open-closed principles in action. The extension’s only responsibility is to detect the problem and to send commands into the system to apply the corrections. The problems with changing working code is it introduces risk of breaking changes or missing dependencies that should change as well.

Changing working code is a bad idea. If you change working code, it is less likely to work after. -Greg Young

In the case of a bug or an enhancement to the software, the code changes and migrations should be packaged, versioned, and tested together.

By modeling the problem as a set of behaviors, we’re able to simplify the problem, partition the problem, isolate responsibilities, leverage existing behaviors, and maintain data integrity and consistency thus reducing and controlling risk.

Summary

Code written that bypasses the system can easily corrupt more data, introduce inconsistencies, and/or introduce undefined behaviors. This is because the rules of the system are circumvented. Even if the rules are duplicated, it is very prone to mistakes through incorrect translation, missing context, or misunderstood assumptions. Scripts that bypass the system tend to deal with many accidental details or unrelated problems whereas if the system can be leveraged, the focus will be just on the essential details enabling a simple and direct solution to be found.

When bypassing the system, the increased risk on the business, that if was known, would be largely unacceptable in most cases. The risk is unnecessary and can be mitigated by leveraging the system to do what it already knows how to do.

Developer on Fire Interview

I recently had the pleasure of being interviewed by Dave Rael on his Developer on Fire podcast. The show is pretty unique in that Dave follows a template he created that focuses on the guest, their experiences, and what makes them tick.

Here is the description of the show:

Podcast with inspiring interviews with successful software developers, architects, testers, and other professionals with stories of success, failure, excellence, and inspiration.

It was a lot of fun speaking with Dave and I hope you enjoy it!

You can listen to my interview here: Episode 106 - Gary Stonerock - Finding the Underlying Problem

Let me know what you think!