Wrangling monotonic systems in TLA⁺

Tractable modeling of replicated logs & CRDTs

Wrangling monotonic systems in TLA⁺
TLA⁺ sees a lot of use modeling distributed systems. The ability to explore all possible interleavings of events makes concurrency simple to reason about. For this TLA⁺ uses something called finite model-checking, which is really just a breadth-first search through the entire state space. The key here - and this really must be emphasized - is that the model is finite. There can’t be an infinite number of states, or of course the model checker will run forever. [Read More]

Using TLA⁺ at Work

Designing a snapshot coordination system

Using TLA⁺ at Work
Here’s a short report of a time I used TLA⁺ at work, with interesting results. TLA⁺ is a formal specification language that is particularly effective when applied to concurrent & distributed systems. TLA⁺ made it tractable for an ordinary software engineer to reason about a tricky distributed systems problem, and it found a bug introduced by an “optimization” I tried to add (classic). The bug required 12 sequential steps to occur and would not have been uncovered by ordinary testing. [Read More]

How do you reason about a probabilistic distributed system?

How do you reason about a probabilistic distributed system?
In which I am stunted upon by coin flips Wasn’t too long ago that I felt pretty good about my knowledge of distributed systems. All someone really needed in order to understand them, I thought, was a thorough understanding of the paxos protocol and a willingness to reshape your brain in the image of TLA⁺. Maybe add a dash of conflict-free replicated datatypes, just so you know what “eventual consistency” means. [Read More]