The current state of TLA⁺ development

The current state of TLA⁺ development

The 2025 TLA⁺ Community Event was held last week on May 4th at McMaster University in Hamilton, Ontario, Canada. It was a satellite event to ETAPS 2025, which I also attended, and plan to write about in the near future. I gave a talk somewhat-hucksterishly titled It’s never been easier to write TLA⁺ tooling! which I will spin into a general account of the state of TLA⁺ development here. The conference talks were all recorded, so if you’d like this blog post in video form you can watch it below:

Outline

The thesis of this post is that almost all the dreams & desires we have for TLA⁺ are downstream of making it easy to develop in or on TLA⁺ language tooling. We break this down into three parts:

  1. Overview of existing TLA⁺ language tooling
  2. Overcoming the legacy code challenge
  3. Some TLA⁺ development ideas for the near and middle future

I’m very optimistic about the future of TLA⁺, despite the challenges. We now have the TLA⁺ Foundation, which has very generously been paying me a comfortable living wage to work on the tools for the past six months. I will talk about some of the things I’ve accomplished. Being paid to work on FOSS is a blessed existence; if that sounds appealing to you as well, the TLA⁺ Foundation is looking to fund more contributors!

Existing Tools

This is a summary of what TLA⁺ language tooling exists, to stoke the imagination for what tools you could contribute to or build upon.

Parsers

Parsers are of course the foundation of any language tooling, because your tools need to know what some source code actually says. We have a decent stable of them:

  • SANY (Syntactic ANalYzer) - our flagship parser, a Java-based recursive descent parser generated by JavaCC. This is the only parser that really handles all the syntax and does a lot of semantic checks (including level-checking, which is the closest thing TLA⁺ has to type checking) that other parsers don’t.
  • The TLAPM parser - a parser-combinator-based OCaml parser, this is used by the TLA⁺ Proof System. It handles almost all TLA⁺ syntax and does some rudimentary semantic analysis.
  • tree-sitter-tlaplus - I wrote & presented this at the 2021 TLA⁺ Community Event. Like all tree-sitter grammars, it uses a JavaScript-based DSL to generate a LR(1) parser in C along with some handwritten C to handle context-sensitive syntax. This makes it very portable, and a strong tree-sitter ecosystem has developed with nice bindings for Python, Rust, JavaScript, and other languages. This parser only handles syntax, no semantic checks, but that’s good enough for a lot of tools. Its LR(1) nature also makes it a great static analyzer for ambiguity introduced by proposed TLA⁺ language extensions.

Those are the big three. There are a couple others floating around, in varying stages of completion: one in Python, Haskell, Haskell (again), and Rust (I got halfway through writing that last one myself before getting lost in the sauce of the Rust type system & burning out).

Interpreters

Moving up the stack. For the longest time we had only one interpreter, and now we have two!

  • The one used by the TLC (Temporal Logic Checker) finite model checker - it’s written in Java and uses SANY for parsing. It also now has a REPL.
  • Spectacle - presented by William Schultz at the 2024 TLA⁺ conference, this is a web-based interpreter written in JavaScript using tree-sitter-tlaplus as a parser. I think this tool is really neat! It has a large focus on sharing specs & traces with colleagues.

Model Checkers

Again moving up the stack. We now have three model checkers!

  • TLC - mentioned previously, is the only fully-featured model checker. It’s a finite state model checker, with all the benefits (predictable & understandable performance) & drawbacks (combinatoric state explosion) that implies. It is the only checker that fully handles liveness, in particular refinement - a banner TLA⁺ language feature, and arguably its raison d’être. It is written in Java and uses SANY for parsing.
  • Apalache - a symbolic model checker that uses Z3 to model-check TLA⁺ specifications. It handles some liveness properties but not all. It is written in Scala, and also uses SANY for parsing.
  • Spectacle - this was recently updated to do basic safety checking by running a breadth-first search of the model’s state space, in the browser! Once you do the difficult work of developing a TLA⁺ interpreter it is not too much extra effort to add safety checking (liveness checking is where it really starts to get hairy). As covered above, this is written in JavaScript and uses tree-sitter-tlaplus for parsing.

Others

Here’s a grab bag of other notable TLA⁺ language tools:

  • TLAPM - a system for validating proofs written in TLA⁺’s proof language. It’s written in OCaml and uses its own parser, described above. TLA⁺ proofs are translated to obligations for discharge by an array of backend provers including Isabelle, Z3, and Zenon.
  • The Snowcat Type Checker - this is bundled with Apalache and verifies TLA⁺ type annotations encoded in comments. I think it’s a very neat project that is underutilized. Modulo some weirdness around conversions between functions and sequences, many industry-produced TLA⁺ specs have variables spanning a small set of values and would benefit from catching type errors at parse time.
  • Tlaplus-formatter & tlafmt - we now have not one but two TLA⁺ formatters! The former is written in Java and uses SANY for parsing, while the latter is written in Rust and uses tree-sitter-tlaplus.
  • LSP Server - written by Karolis Petrauskas, this is aimed at improving support for the TLA⁺ proof language in the TLA⁺ VS Code extension. It is written in OCaml on top of the TLAPM parser.
  • TLAUC - the TLA⁺ Unicode Converter, written by myself in Rust using tree-sitter-tlaplus. This converts your TLA⁺ specs between their LaTeX-like ASCII and Unicode representations. I introduced Unicode support to TLA⁺ at the 2024 TLA⁺ conference. Write your TLA⁺ specs in Unicode! It’s great!
  • VS Code Extension - slowly replacing the Eclipse-based TLA⁺ Toolbox as the IDE of choice, this is written in TypeScript and uses regexes to cobble together some concept of TLA⁺ parsing. Don’t get me wrong, it’s really a nice project! It recently bundled SANY so is offloading more and more functionality onto a real parser.

Summing it up

I think the current state of the TLA⁺ tooling ecosystem is strong. This section was quite long once I wrote it all out. People don’t just want to use TLA⁺, they want to make it better! That’s a really good energy to have in the community. A lot of people don’t like being tied to the JVM but alternatives are slowly taking shape.

Overcoming the Legacy Code Challenge

Many working software engineers have read (or at least heard of) this book:

It has a concise & objective definition of legacy code that I like: legacy code is code without tests. However, as I’ve grown in my career I’ve arrived at a fuzzier definition: legacy code is code not in living knowledge. The codebase or even a particular module does not exist in totality inside the heads of the people who work on it. We’ve probably all written software from scratch and know how easy it is to make massive changes because you know how every part works. If you hand the same task to a codebase newcomer you can reasonably expect it to take 10-100x as long to effect the same changes, and the codebase end state will be much less coherent.

We have a large shortfall of living knowledge in the TLA⁺ project, which I will further elaborate in the next section. The thesis of this section is simple: we must overcome this challenge if TLA⁺ is to thrive. TLA⁺ will trundle along on inertia for quite a while, but eventually find its way into irrelevance if we do not conquer the challenge before us.

One last note on living knowledge: it is interlinked with tests. Once living knowledge is lost it must be rebuilt from scratch, and that is only possible if you can write tests for the codebase. Poke & prod it, see how it flows and is interlinked. Most of Working Effectively with Legacy Code deals with how to bootstrap your way to safely making changes in a codebase that lacks tests, when writing tests requires changing the codebase to make it testable.

The Challenge for TLA⁺

First, a quick history: Leslie Lamport published the paper A Temporal Logic of Actions in April of 1990. Jean-Charles Gregoir wrote the initial SANY TLA⁺ parser in the late 1990s. The TLC model checker was written over 25 years ago, announced in the 1999 paper Model Checking TLA⁺ Specifications by Yuan Yu and Panagiotis Manolios which described writing an interpreter & safety checker for TLA⁺. By 2002 the liveness checker had been developed. The tools were subsequently heavily modified around 2010 for the release of TLA⁺ 2, which added the proof language. Around this time Inria & Microsoft Research partnered to write TLAPM so the proof language could be formally checked. Inria funding lapsed in the late 2010s and TLAPM development slowed dramatically. Development on the Apalache symbolic model checker began in the late 2010s, and accelerated with Informal Systems bringing it in-house in the early 2020s, but development has similarly slowed dramatically since Informal Systems spun out Apalache near the end of 2024. The TLA⁺ Foundation was founded to support TLA⁺ as it was spun out from Microsoft Research in the leadup to Leslie Lamport’s retirement at the end of 2024.

We will focus on the challenge of maintaining SANY and TLC. By comparison to SANY & TLC, TLAPM and Apalache are much less approachable by non-academic software engineers and I would be interested in watching talks on how their development can be ramped up & made sustainable.

So, we have a large quarter-century-old Java codebase that is difficult to unit test due to prolific global static state, where all the original authors have long since left the project, and where the three regular core contributors lack knowledge of its many parts. Each of us focuses on a different area: I tend to work on the parser, Markus Kuppe tends to work on the model checker, and Calvin Loncaric tends to work on the interpreter & is also our resident true Java expert. There are many things we want to do but are restrained by the unknowns. This all sounds somewhat dire, so why am I optimistic?

The Dream and Our Blessings

Let’s start with a dream: a world where we are bold and unafraid to make large changes to the core TLA⁺ tools. How do we get there?

Thankfully, we start from a position of strength. Language tooling is a blessed area of software engineering. It is one of the few places where program requirements are relatively complete & unambiguous. It is actually possible to complete a parser! Usually starting a software project is kind of like casting a curse upon yourself, where you must continually provide it your labor or it will disappear from relevance and all your work will be for nought. Not so here. Of course, conventional language tooling demands do become stronger over time. Parsers have to be useful in a language server context now, which requires better error recovery than was previously standard. But these are from relatively slow changes in development culture and only crop up every decade or so.

Concrete Strategies

Enough faffing around. Here are the three strategies I think will bring us from dream to reality:

  1. Tests - Of course! The TLA⁺ Foundation has been funding me to put a ton of effort into developing standardized implementation-independent test suites for TLA⁺ parsers, then applying them to the existing parsers. We now have a nice source-code-input/expected-AST-output syntax suite applied to the three prominent parsers (good for conformance!) along with a semantic-level test suite with lots of in-language assertions about levels and references (thanks to lobste.rs users for some great suggestions on its design) and - my favorite - a set of 150ish TLA⁺ parse inputs that should each trigger a specific standardized error code in any conformant parser. Combined, I think these make every TLA⁺ parser open for development! We are no longer constrained to analyzing bugs by git-bisecting & comparing inscrutable code; just write a test and trace it in the debugger!
  2. Developer Onboarding - We must go beyond documentation. Treat the codebase as a fixed artifact to be studied & learned by any new contributor, then develop a syllabus for it. The TLA⁺ Foundation has been funding me for this in various ways, such as writing developing & contributing guides, highlighting good first issues in monthly development updates, and - most substantially - writing a long tutorial on how to write your own minimal TLA⁺ model checker! This last one is very fun and is based on the excellent free online textbook Crafting Interpreters by Robert Nystrom. Give it a look!
  3. Grants & Stipends - The fundamental contradiction of FOSS development is that every substantial FOSS project requires full-time commitment for long periods of time, but people need to eat and it’s hard to make money giving something away for free. When I first got into this game I thought Linux was something people developed on their evenings and weekends. That’s true for some people, but the overwhelming majority of code is contributed by people working on Linux full-time! A lot of digital ink has been spilled about how to manage the fundamental FOSS contradiction. Foundations with corporate sponsorship are one method, and that’s what we have. The TLA⁺ Foundation can fund people to work on the tools, and they are. Interested? Join us!

Summing it up

So, we are starting from a position of strength and have concrete strategies to overcome the remaining challenges. We should also recognize that our struggle is shared. People in the TLA⁺ community occasionally say things like “TLA⁺ is not a programming language”, but we face the same challenges as the developers of every other programming language out there! Even the exotic parts of TLA⁺ are “just” algorithms. By which I mean learning algorithms is relatively easy, fun, & contained compared to the challenge of developing an unknown codebase. As complicated as the liveness checking algorithm is, it’s “just” an algorithm on graphs that an ordinary developer can learn in a few days of study, given a suitable tutorial. And it’s our job to write that tutorial and make it fun.

What’s Next?

I ended the talk with a few ideas I had for TLA⁺ tool development in the near future. One, inspired by attendance at Antithesis BugBash last month, is generative testing for the TLA⁺ tools! It’s all well & good to lock me in an office to crank out hundreds of test cases, but what if that process could be automated? It would certainly be more fun to develop than a large static test corpus. Actually it’s good to have both: static test cases to nail down the core language functionality, and generative tests to explore the nooks & crannies.

I also tentatively proposed simplifications to the TLA⁺ syntax. I’m a bit of a weird guy who likes the TLA⁺ syntax (especially in its Unicode form) but other people do struggle with it, as Hillel Wayne can attest from his work teaching TLA⁺ both online and offline. I’m not a purist who is against evolving the syntax. I have what I think is quite a nice RFC to combine & disambiguate the set filter & set map syntax. Larger changes will have to be debated. This is design & consensus work; the actual code changes will likely not be substantial in comparison.

I also want to improve the SANY API & consumption experience so people can easily build their own tools instead of requesting features be upstreamed. A new API has already been prototyped, and all SANY unit tests are currently exercising it. Markus has also put effort into automatically publishing a Java package from the rolling release. Finally, we need to write a collection of how-to guides for consuming & developing against the package. There are actually quite a few projects & features in TLA⁺ land that exist but have not had documentation written & disseminated for them. I think I could spend an entire year writing only tutorials & documentation. I am coming closer to being persuaded that would actually be a good use of time, maybe even the best use of my time for TLA⁺.

The One Billion States Per Minute Initiative

I wanted to end the talk with a bang so here it is: I think we can increase the TLC model checker throughput to 1 billion states per minute (1000x speedup) by writing a bytecode interpreter. Currently, TLC clocks about 1 million states per minute. Calvin has done some very preliminary prototyping with transpiling TLA⁺ specs to C++ then compiling & running them as little breadth-first search programs, and saw some head-turning numbers. The TLC interpreter is a tree-walk interpreter written in Java. As optimized as it is, there’s a limit to what can be done with the format. This would be a bold, substantial project but I think it is worth exploring.

Conclusion

Thanks for reading! I hope this has invigorated & inspired you to consider developing your own TLA⁺ tooling or contribute to the existing tools, while giving a nice overview of the current state of things.

Links & Discussions

Murat Demirbas wrote a blog post version of his talk at the TLA⁺ community event, on using TLA⁺ to specify cross-shard transactions in MongoDB. He also wrote a post summarizing other talks from the event.

A. Jesse Jiryu Davis wrote a blog post version of his TLA⁺ community event talk, on the possibility of adding statistical & performance modeling capabilities to TLA⁺. He also posted his notes about other talks at the event.

Discuss this post here: