A CRDT Primer Part I: Defanging Order Theory17 Nov 2015
When it comes to distributed systems, coordination is slow. This is true whether we’re talking about a system that favors consistency or one that favors high availability. CRDTs are data structures that promise strong eventual consistency for highly available systems without the costs of coordination. A gossip protocol will do the trick.
If you’ve read anything about state-based convergent CRDTs, you may have run across the term “monotonic join semi-lattice.” Despite the intimidating name, this concept (drawn from order theory) is built out of familiar elements. And rather than indicating that you need an advanced degree in mathematics to understand how CRDTs work, it turns out that with a little work, it can help clarify why, when it comes to state-based CRDTs, you can rely on gossip to get your system to converge on the One True Value.
In this post, I’m going to start from the familiar elements constituting a join semi-lattice and build up to the complete concept. In the next post, I’ll explain how state-based CRDTs work, relying on the lessons we’ve learned from order theory.
We really only need three core concepts. The first one is perfectly familiar: less than or equal to (written ). It’s a way of establishing an order between elements. If we can say that or , then we know that is comparable to .
Building off this last point, the next concept we need is incomparable. In order theory, if and are incomparable, we write ∥ . This means that we cannot compare and using .
The last of the three core concepts is a join between two elements, written . Now, if you’re new to order theory, the concept of a join might not be familiar, but don’t worry. By the time we get there, it’ll feel like old hat.
This post promises to defang order theory. So let’s start by trying to understand what order means. Imagine that we have a collection of elements all of one type. For example, we could have a collection of integers. Any two integers can be compared to each other by using the “less than or equal to” relation. For example, .
You can perform a simple exercise: try to find two integers that cannot be compared in this way. I call it a “simple” exercise, but it will take forever. This is because “less than or equal to” can be used to compare any two integers, which, as we will see, means that it is a total order.
In the language of order theory, integers make up a set. And the familiar “less than or equal to” relation that compares integers is called a binary relation. A binary relation is simply an operator that relates two elements. And now we have everything we need to define an order:
An order is a binary relation on a set , written <>.
So “less than or equal to” is one possible binary relation. But there are countless others. For example, the descendent-of relation is another example of a binary relation. Consider a daughter, mother, and grandfather. We can put any two of these people into the binary relation of descendent-of. For example: .
But notice that for descendent-of, it really is a simple exercise to find two people who don’t stand in the relation. Just go through your list of friends and I’m sure you’ll find some people who are neither your ancestor nor your descendent.
In our first example, we defined as “less than or equal to.” In our second example, we defined it as “descendent-of.” In our first example, our set was all the integers. And we found that for any two integers, we could compare them with our binary relation. In our second example, however, our set was the set of all people. And it was possible to find two people who couldn’t be compared using our relation. So in the language of order theory, two such people would be called incomparable according to . When two elements and are incomparable, we write ∥ .
Not only have we introduced the first two of our core concepts, but we’ve also established everything we need to talk about two types of order: total order and partial order. An order is total if for any and in the set, either or . Less than or equal to over the integers is an example.
We can draw a diagram representing this order. Since we don’t have infinite space, I’m going to focus only on the set of integers from 5 through 10:
In this diagram we can draw an arrow between any two integers, where the arrow represents our binary relation. I haven’t drawn all the arrows, but you can add one between any two of these nodes. Keep in mind that is transitive, so if we have
there exists an implicit arrow between and that we can draw as
A partial order is weaker than a total order. It does not require that every pair and in a set can be compared. Some examples of partial orders are the descendent-of relation we’ve already discussed, the genus/species relation, and what we might call the “located-in” relation, which we’ll look at now. The diagram below shows a representation of the located-in order over a small set of locations:
The first thing you should notice is that our elements are broken into two separate blocks. No element from the block on the left can be put in the located-in relation with an element in the block on the right.
Here we’re defining as “located-in.” The diagram shows us some comparable pairs. For example, and . Notice that, as with the diagram for integers, the arrow from to is implied. Again, that’s because is transitive. So though I won’t draw an arrow between every pair of comparable elements, it should always be clear from the diagram.
Unlike the integers case, we also have incomparable pairs for “located-in.” For example, ∥ and ∥ .
Now let’s look at an example of a partial order that’s more directly relevant to CRDTs: vector clocks. A vector clock timestamp is a vector of integer timestamps. We can write them like . Each element of the vector corresponds to a node or process. So, in essence, the vector clock timestamp is a collection of logical timestamps for all the nodes or processes we’re interested in.
When talking about vector clocks, we define as “happened-before.” A vector clock timestamp happened-before a vector clock timestamp if and only if every element in is less than or equal to the corresponding element in . This is our definition of “happened-before.” So, for example,
We can diagram the happened-before relation as follows:
As with located-in, our diagram has two separate blocks. In this case, each block has two elements that are comparable to each other. But if we have two vector clock timestamps, and neither happened-before the other, then we say they are incomparable. So, for example, if you try to compare elements across our two blocks from the diagram, you’ll find that they are incomparable. Take and :
Since some (but not all) of the elements of one are greater than or equal to the other, we cannot put these two timestamps in a “happened-before” relation. We conclude that ∥ . They are incomparable.
Now that we’ve defined the happened-before relation, we can create a more interesting diagram of a more interesting order:
Here we’ve drawing on a set of eight timestamps. The happened-before relation defines a partial order on this set. To reiterate, that’s because some elements in the set can be compared in terms of happened-before, but others cannot. For example, but ∥ .
We only have one more of our three core concepts to cover, and that’s the concept of a join. To understand a join, we need to understand what it means for an element to be an upper bound. If we have a set and we have a binary relation on that set, then an upper bound is an element of the set that is every other element in the set in terms of that relation. This is easier to see by looking at another diagram:
In this diagram, our set consists of the 15 locations represented. Earth is an upper bound on our set. Pick any other location in the set, and you’ll see that it is located-in Earth. There is no other upper bound.
If you look at our happened-before diagram for vector timestamps above, you can see that is the upper bound on that eight-element set of vector clock timestamps. Again, that’s because we can pick any other element in the set, and see that it happened-before .
So now we’re ready to define a join. Here’s the technical definition (don’t worry, we’ll unpack it):
For a set , an order <>, and two elements , the join of and (written ) is a least upper bound of according to our order <>.
This means that when we take the join of and , we’re looking for some element (which could be or ) for which and . We also want this to be the smallest element that satisfies this condition. So, for a simple example, is equal to . There’s no integer smaller than that’s an upper bound on both and . Notice that is also an upper bound on and , but it’s not a least upper bound since .
When we’re dealing with a total order, as in the case of less than or equal on the integers, a join of two elements is always equal to one of those two elements. That’s because they can be directly compared, and so it’s always going to be true that one is greater than or equal to the other. However, if we’re talking about partial orders, this is not guaranteed. Let’s look again at our located-in diagram to illustrate:
What is ? Well, Mumbai is not located-in Delhi, and Delhi is not located-in Mumbai. So, in order to find the least upper bound on these two elements, we’re going to have to dig into our background set, in this case our 15-location set from the diagram. Since Earth is an upper bound on all those locations, we know that Earth is an upper bound on Mumbai and Delhi. However, it’s not the least upper bound. That’s because India is located-in Earth, and both Mumbai and Delhi are located-in India. Now, there’s nothing located-in India that both Mumbai and Delhi are located-in. This means that .
Here are the diagrams of the orders we’ve considered so far with some joins indicated:
Hopefully these examples provide a good sense of how joins work. It’s important to note that joins obey three laws (all of which are relevant to understanding CRDTs):
Commutativity is useful because it means that it doesn’t matter what order we take the join of our elements in.
Associativity is important because it means that before looking for the join of three elements, we can start by taking the join of any two of the three, and then take the join of that result with the remaining element. As we’ll see in the next post, this is quite useful.
Idempotence is important because it means that no matter how many times we take the join of an element with itself, we get the same result.
When it comes to CRDTs, what we’re looking for is the ability to apply an operation in any order and as many times as we want without corrupting the result. The laws obeyed by joins give us exactly this.
Now we have everything we need to define a join semi-lattice. Let’s look at the definition first and then break it down:
A join semi-lattice is an order <> for which there exists a join for any .
In order to understand this definition, we need to understand three concepts: set, order, and join, all of which we’ve covered. We need only think of a set as a collection of elements. An order, as we’ve seen, is a binary relation on a set. So if our set is the integers, then less than or equal to is an order on that set. Loosely, it orders the set in the sense that it puts them in order. If we have an order on a set, then we can draw a diagram using arrows as we’ve been doing to show how elements are related. And finally, the join of two elements is the least upper bound of those elements in terms of some order.
So a join semi-lattice is a type of order. That means that for any join semi-lattice, there will always be a corresponding diagram. What makes an order a join semi-lattice is that we can take any two elements from the set and find a join for them. In the case of the integers and less than or equal to, it’s straightforward to see that we have a join semi-lattice. That’s because for any two elements in the set of integers, the least upper bound is going to be one or the other of those two integers. The same idea will hold for any total order. Things get more interesting when we look at partial orders.
Returning to our diagram for the happened-before order, we can see with a little thought that for any two elements there exists a join:
Here are some examples. It will be helpful to find them on the diagram and convince yourself that they are correct.
But not all partial orders are join semi-lattices. To see this, take a look at the following diagram:
No join exists for and , for example. Keep in mind that we’re saying the background set consists of only the seven elements shown on the diagram. And this is important, because if the background set were all possible vector timestamps, then we could always find a join for any two elements.
Ok. So now we’re finally ready to define the mythical monotonic join semi-lattice. Drumroll…
A monotonic join semi-lattice is a join semi-lattice that sounds more esoteric.
“Monotonic” in this context means that whenever we take a join , we are not moving “downwards” in terms of our order. And, understood in terms of our diagrams, joins by their nature never move downward. Let’s look at an example, returning to the located-in order:
Let’s start with the Bronx all the way at the bottom left. If you take the join of the Bronx and Brooklyn, then you get NYC. So you’ve moved up to NYC. And if you take the join of NYC with itself, you still get NYC. That’s because joins are idempotent.
Moving on, if you take the join of NYC with Seattle, then you move up to the US. If you take the join of the US and Mumbai, then you have to move all the way up to Earth, since there is no other upper bound in between.
At each step, you either moved up or stayed in the same place in our diagram, and this would have been true no matter which two elements you chose at each step. The important thing to notice is that the following statement is true:
As an experiment, you can try taking joins in any order, always joining the result of one join with any other element you like. Write down the results in sequence. You should be able to place between any two results in your sequence. This is one way of understanding what it means for joins to be monotonic. We never go downwards.
What’s the significance of this fact for CRDTs? In short, just as joins tend to move “upwards,” so do merges of state-based CRDTs tend to converge on the One True Value, and for the very same reasons.
In this post, we unpacked the scary-sounding but actually relatively simple idea of a monotonic join semi-lattice. With this concept in hand, we are ready to tackle the wonders of state-based convergent CRDTs, which is exactly what we’ll do in Part II.
In the meantime, if you’d like to dig deeper into order theory, I recommend Introduction to Lattices and Order by B.A. Davey and H.A. Priestley.