TLA⁺ Unicode support -

TLA⁺ was developed by Leslie Lamport, originator of \(\LaTeX\), so it’s unsurprising that TLA⁺ syntax looks pretty \(\LaTeX\)-y. It’s a very mathy language, with much use of symbols like (among others) \A,\E, /\, \/, and \in denoting \(\forall\), \(\exists\), \(\land\), \(\lor\), and \(\in\) respectively. The language tools include a tla2tex command to format TLA⁺ specs into \(\LaTeX\) for integration in research papers. However, research papers are not where I spend the most time looking at TLA⁺. Here’s the tale of how I brought those beautiful symbols into the code editor, and how I learned to work with others in FOSS along the way!

Real \(\TeX\)-heads know that a proper typesetting language should always separate presentation from content, but that border has shifted since the 80s! With the advent of Unicode and its widespread implementation in commonly-installed fonts, many of the symbols that were once unrepresentable in a typical monospace text editor are now rendered without issue. While APL blazed this trail back in the 60s with language-specific font & hardware keyboard requirements, modern mathy languages like Lean and Agda showed that incorporating Unicode symbols is relatively free of hassle for end users. This conclusion is not unanimous; closely-related language Idris intentionally does not support Unicode at this time.

Photograph of an IBM 5100 microcomputer with APL keyboard

An IBM 5100 microcomputer with APL keyboard (source: Wikimedia Commons)

In the preceding years I had slowly constructed my own personal suite of Unicode TLA⁺ tooling:

A TLA⁺ tree-sitter grammar for syntax highlighting
An ASCII/Unicode spec conversion tool, more complicated than search & replace since some TLA⁺ syntax depends on vertical alignment (this is also why ligatures do not work)
An as-you-type Unicode translation plugin for Neovim

I then dogfooded these tools across several paid contracts & personal projects, writing my specs in Unicode with the Neovim plugin then converting them to ASCII for use with existing language tooling. I loved it!

Using the Neovim plugin to write Unicode TLA⁺

After building consensus for the idea in monthly TLA⁺ community calls, in the new year I decided to take the plunge and dedicate several months of full-time unpaid work implementing Unicode support in the main TLA⁺ language tools. The deadline? A talk I would give on the feature at TLA⁺ Conf 2024 on April 15th in Seattle!

Technical difficulties

The main TLA⁺ language tools comprise a mass of 25-year-old Java code with lots of static global variables, largely written by people decades-gone from the project. Unit testing is spotty: some places have excellent coverage, others have essentially no coverage whatsoever. The static global variables ensure writing unit tests is very difficult, as it is never clear what part of the system needs to be initialized in order for the unit to function.

The code I wanted to change - the parser, called SANY - had essentially no test coverage. Moreover, much of the code was generated by an ancient version of the JavaCC parser generator and then subsequently modified, meaning the code was in a sort of deadlock. Trying to synchronize the JavaCC grammar with the hand-modified generated code meant lots of dangerous changes to untested foundational code, but further modifying the generated code meant it would be harder to synchronize with the JavaCC grammar in the future. The solution? Write lots and lots and lots of tests! Ain’t nothing to it but to do it.

My prior work on the TLA⁺ tree-sitter grammar had produced a respectable corpus of syntax tests using tree-sitter’s standard test corpus format. Here I saw an opportunity for a dual win: if I adapted (and expanded) these tests for use with SANY, not only would I get excellent test coverage but I would solve the problem of conformance between the various TLA⁺ parsers! This idea of a universal TLA⁺ syntax test corpus powered me through the unimaginable slog of writing translation code for all 230 (no joke!) types of named syntax nodes in the TLA⁺ language. It took over a month, but I got it done, and the feeling of finally seeing all 182 new test cases turn green was incredibly gratifying and made it all worth it. I slumped back in my chair, gazed at the wall for a while, submitted the PR, then turned off my computer and went for a celebratory walk to my favorite burrito place!

Once I’d cultivated the legacy code wasteland into the most comprehensively-tested part of the entire codebase, it was smooth sailing. Mostly!

FOSS has social problems to match technical problems. This was my first major FOSS project involving other people; everything prior had been either simple bugfixes or fully greenfield. In my post about writing the TLA⁺ tree-sitter grammar I spoke of the psychological transition associated with becoming a free software developer. There was more to come! Working collaboratively with others in a volunteer-driven environment requires a completely different approach compared to development as an employee.

While employed I was rewarded for taking ownership of a project. This entailed generating lots of noise and diffs and generally treating commits as an instrument to bludgeon the codebase into the desired shape. If all this commotion broke something, no worries - a feedback loop existed! My ass was on the line. If I messed up bad enough maybe management would hold a contentious meeting, or I would lose out on bonus/pay raise/promotion, or any number of other material consequences ensuring the thing gets fixed and is perhaps less likely to break in the future. No such feedback loop exists in a FOSS project! If code gets merged then breaks something, there is some social obligation for me to fix it but I could just as easily disappear and never be heard from again. So, the threshold for accepting a contribution to an actively-used FOSS project is much higher than in corporations, where coworkers pester each other to give rubber-stamp PR signoffs.

I slowly became acquainted with the use of git commits as tools of communication. When you submit a PR, its constituent commits shouldn’t just be your occasional development checkpoints. Each commit should be logically contained. At minimum, the build & tests should pass on each commit. This has a practical aspect: git-bisect is an incredibly useful tool, and its usefulness must be preserved. Slicing, dicing, and recombining working commits into publishable form required me to finally move my knowledge beyond the git basics. I glimpsed its real power and even began thinking - is git… good actually???

I managed to boil my understanding of good FOSS contribution behavior down to a handful of points:

Be surgical! Play code golf - what’s the smallest diff you can make that still accomplishes your goal?
If your PR can be split up into multiple PRs, each with some individual value, then split it up
Build consensus on the need, scope, and method for a change before working on it
Write lots and lots of tests, then submit them before writing your main changes

I’ve been claiming these are unique to FOSS, but after my talk at the conference an audience member came up and said these principles were, almost word-for-word, the practices he tries to instill in his all-remote team at his job. So who knows! Maybe I didn’t learn how to work with others in FOSS, I just learned how to work with others in general!

The 2024 TLA⁺ conference was a sub-conference of OSSNA/ELC, and later in the week I saw many of these sentiments echoed by Dmitry Baryshkov of Linaro in his talk:

Successfully merged

The final PR adding Unicode support to SANY was merged a week before I gave my talk. There is some cleanup yet to be done but the big achievement is complete. I can now watch the slow percolation of these changes through the community, as more tooling adapts or is created to take advantage of Unicode TLA⁺ and more published specs use Unicode symbols.

Beyond the immediate, all the work I did writing tests for the parser opens some exciting possibilities for the future. The TLA⁺ Foundation is now a thing, so control over the evolution of the language has passed to the community - and now with these tests the language can actually be changed, instead of having RFCs languish for years! I’ve already proposed one modification that has caught some interest.

There is a lot more I would like to do to improve the TLA⁺ tools. However, I face the classic FOSS funding problem. I attempt to square this circle by working contracts most of the year to tide me over as I work on FOSS the rest of the year, so - if your company has need of someone to write a TLA⁺ spec or analyze & develop a distributed system, consider getting them to hire me! If you want to be more directly charitable, you can also join the TLA⁺ Foundation to fund myself or others’ work on the language through a grant process!

Discussions

tlaplus formal-methods

TLA⁺ Unicode support

Learning to work with others in open source

TLA⁺ Unicode support

Learning to work with others in open source

Technical difficulties

Successfully merged

Discussions

Technical difficulties

Social difficulties

Successfully merged

Discussions