Distributed-Systems -

Wrangling monotonic systems in TLA⁺

Tractable modeling of replicated logs & CRDTs

Posted on November 1, 2023 | Andrew Helwer

TLA⁺ sees a lot of use modeling distributed systems. The ability to explore all possible interleavings of events makes concurrency simple to reason about. For this TLA⁺ uses something called finite model-checking, which is really just a breadth-first search through the entire state space. The key here - and this really must be emphasized - is that the model is finite. There can’t be an infinite number of states, or of course the model checker will run forever.

[Read More]

tlaplus distributed-systems formal-methods

Using TLA⁺ at Work

Designing a snapshot coordination system

Posted on April 5, 2023 | Andrew Helwer

Here’s a short report of a time I used TLA⁺ at work, with interesting results. TLA⁺ is a formal specification language that is particularly effective when applied to concurrent & distributed systems. TLA⁺ made it tractable for an ordinary software engineer to reason about a tricky distributed systems problem, and it found a bug introduced by an “optimization” I tried to add (classic). The bug required 12 sequential steps to occur and would not have been uncovered by ordinary testing.

[Read More]

tlaplus distributed-systems formal-methods

How do you reason about a probabilistic distributed system?

Posted on September 11, 2020 | Andrew Helwer

In which I am stunted upon by coin flips

Wasn’t too long ago that I felt pretty good about my knowledge of distributed systems. All someone really needed in order to understand them, I thought, was a thorough understanding of the paxos protocol and a willingness to reshape your brain in the image of TLA⁺. Maybe add a dash of conflict-free replicated datatypes, just so you know what “eventual consistency” means. Past that it’s just some optimizations and mashups which come easily to your TLA⁺-addled brain.

[Read More]

distributed systems formal methods