Today, in order to both sharpen my understanding, (as well as hopefully to give you something you might not have known), I'd like to talk about vector clocks.
These are something I have thought about quite a bit recently (due to my class at Georgia Tech currently being Distributed Systems), and I thought they were
neat enough to talk about here. So through next few minutes, we will work up the *why* and the required requisite knowledge to work with vector clocks.
#### Logical Clocks
So why do we need any kind of special clock? Why won't a normal clock do?
Big surprise, but time is hard to keep. [Like really hard](https://en.wikipedia.org/wiki/History_of_timekeeping_devices).
Therefore, in a large distributed system (of possibly tens, or hundreds of machines), we really don't want to try and keep all clocks in sync.
Given we are talking about computer systems, where events happen on the scale of milliseconds down to microseconds or less, makes precision especially important.
This is exactly what we stated in the earlier section! This is because logical clocks are really only useful if they are at least *consistent*.
> To convince yourself of this, think about the following clock function that associates each unique event with a *time stamp*, that is not *consistent*
> ```java
> @Data
> public static final Time {
> Event e;
> Date d;
> }
>
> public synchronized static TimestampEvent c(Event e) {
> return new TimestampEvent(e, new Time(e, new Date()).hashCode());
> }
> ```
> Despite each event in time generating a unique time stamp, the new time stamp is not all that useful, as they cannot be ordered!
A synchronous clock like the one we defined above is also *strongly consistent*, however this attribute is much harder to maintain in a distributed context.
#### Lamport Clocks
A *Lamport Clock* is a type of *Logical Clock*, that guarantees *consistency*, but not *strong consistency*. We can pretty much use our example from above, where
we make each one of our timestamps a simple increment for the case of a single process/node generating events. Things get slightly more complex when we introduce
This *guarantees* that the time at \(p_j\) will be greater, ensuring that the timestamp both at \(p_j\) is always larger, which means all things that happened before
receiving this message (at least on the scale of this node), have smaller timestamps. Take a moment to convince yourself of this.
This is an interesting concept, because by only adding some ever growing integer, we can determine some order to our events. This is going to grow into something far
more useful in our next section.
#### Vector Clocks
Finally, on to what we were here for. A *Vector Clock* is pretty much just a vector of the *Lamport Clocks* at all the other nodes, or at least the
last known state of them. So to start, let every node keep a vector of size *n* in memory, where *n* is the number of processes/machines/nodes in the distributed system.
Each of these vectors should be zero initialized. To make our lives easier, call this vector *v*.
So each node, *i* is the index of the node, will keep a *Lamport Clock* in index *v[i]*. And for some node *j*, its *Lamport Clock* will be *v[j]*. Now, what do we do
when we have an event?
If the event, called *e*, is on *i*, then we just increment *v[i]*, just like a normal *Lamport Clock*. If, however, the event is the node *j* getting a message from
*i*, then we would ensure that all messages contain the senders' vector clock. Lets call the vector clock thats coming in *vi* (because its the vector clock of node *i*).
Node *j* would iterate over both vectors, and update its own vector to be the max of the two values in each index in the vector. This ensures that the end vector at
*j*, *vj*, is the max of the two vectors. Next, we would run *vj[j] += 1*, which is us just incrementing the clock for the receive event that we are experiencing currently.
Finally, we can now process our message, and do whatever business logic that we do.
#### Why do we care?
We would want to keep a *Vector Clock*, because this lets us know the general state of the system. For example, if a node thinks that it should hold on to some data, because
some node may request it, but it knows that data will not be needed after event 100 say, it can just wait till all *Lamport Clocks* in the vector reach at least 100, then
discard that information, because it knows that all nodes have used that information. This is basically just distributed garbage collection.
#### Conclusion
Hope this was somewhat informative, and gave you a general idea of what a vector clock is, and why you might want one. They may not come up all the time, but in your
career, just may need one to keep your applications and processes in sync.