Andrew Helwer
https://ahelwer.ca/
Recent content on Andrew HelwerHugo -- gohugo.ioThu, 10 Dec 2020 00:00:00 +0000Two pictures of quantum computation
https://ahelwer.ca/post/2020-12-06-sum-over-paths/
Thu, 10 Dec 2020 00:00:00 +0000https://ahelwer.ca/post/2020-12-06-sum-over-paths/<p>Interpretations of quantum mechanics are boring. Boring!
Maybe the universe has a strict partition between quantum and non-quantum.
Maybe there are a bunch of parallel universes with limited crosstalk.
Or maybe it’s whatever the Bohmian mechanics people are talking about.
Shut up and calculate, I think.
I don’t say this out of some disdain for idle philosophizing or to put on airs of a salt-of-the-earth laborer in the equation mines.
It’s just there are so, <em>so</em> many interesting things you can learn about in quantum theory without ever going <em>near</em> the interpretation question.
Yet it’s many peoples' first & last stop.
Rise above!</p>
<p>This one is about quantum interference.
A humble phenomenon, usually first encountered in high school physics when learning about whether light is a particle or a wave.
Maybe your physics teacher took your class down to the dark, empty, disused school basement and fired a laser through a grating.
You saw an interference pattern on the wall that kinda resembled what you saw, with water, in the ripple tank back upstairs.
And that was that.</p>
<p>Now, years later, maybe you’re a programmer who watched a video online where a nice man walks you through how quantum computers work (slides: <a href="https://ahelwer.ca/files/qc-for-cs.pdf">pdf</a>, <a href="https://ahelwer.ca/files/qc-for-cs.pptx">pptx</a>):</p>
<figure><a href="https://youtu.be/F_Riqjdh2oM" target="_blank"><img src="https://ahelwer.ca/img/common/quantum-video-preview.png"/></a>
</figure>
<p>It’s a state machine! Easy! Qbits are just pieces of state hopping around the unit circle.
With a bit of linear algebra, the vault door protecting the treasures of quantum physics is flung open.
<a href="https://ahelwer.ca/post/2018-12-07-chsh/">Entanglement</a>! Teleportation! <a href="https://ahelwer.ca/post/2019-12-21-quantum-chemistry/">Simulating physical reality</a>! Breaking RSA! How polarized sunglasses work!
Nothing withstands your intellectual onslaught.
Until one day you’re reading a pop-science article about quantum computing, just to laugh at it of course, no tortured analogies for you because you KNOW. THE. MATH., and you see a learned physicist quoted as saying quantum computers make use of quantum interference.
Quantum interference! That thing you learned about in high school… let’s see how it really works, with this unit circle state machine of ours.
You run through a couple computations, looking for it.</p>
<p>It is nowhere to be found.</p>
<h2 id="textbook-ambition">Textbook ambition</h2>
<p>There’s a blog post out there called <a href="http://aurellem.org/thoughts/html/sussman-reading-list.html">Prof. Sussman’s Reading List</a>, where the professor has some fairly <em>ambitious</em> reading recommendations for high schoolers.
If you have a university degree and have read them all I’ll count you a brighter person than I.
One choice pick is <em><a href="https://en.wikipedia.org/wiki/Quantum_Computing_Since_Democritus">Quantum Computing Since Democritus</a></em>, a book by well-known complexity theorist Scott Aaronson.
I started reading it after striking out with the Mermin relativity text.
It’s a great read if you have a computer science degree - you know when you pick up a brand new textbook, all enthused & with the best of intentions, power through the first few chapters filled with material you already know, then take a break (never resumed) upon encountering the first concept requiring actual thought?
Imagine if you could read a book that kept the feeling of those glorious early chapters most of the way through - that’s this book.
It isn’t a book that’s good for learning the material.
But if you already <em>know</em> the basic material, the book injects an enthusiasm, playfulness, and perspective that will have you thinking maybe complexity theory could really be for you.</p>
<figure><img src="https://ahelwer.ca/img/sum-over-paths/qcsd.jpg"/>
</figure>
<p>The book starts off with a fun dialogue from Democritus I can’t resist reproducing here:</p>
<p><em>Intellect: By convention there is sweetness, by convention bitterness, by convention color, in reality only atoms and the void.</em></p>
<p><em>Senses: Foolish intellect! Do you seek to overthrow us, while it is from us that you take your evidence?</em></p>
<p>Anyway. I held my own while reading this book up until <a href="https://www.scottaaronson.com/democritus/lec9.html">chapter nine</a> - actually I remember the exact paragraph & figure which shut down the run; talking about applying two Hadamard-like transformations to a qbit, it says:</p>
<p><em>“Intuitively, even though there are two “paths” that lead to the outcome \(|0\rangle\), one of those paths has positive amplitude and the other has negative amplitude. As a result, the two paths interfere destructively and cancel each other out. By contrast, the two paths leading to the outcome \(|1\rangle\) both have positive amplitude, and therefore interfere constructively."</em></p>
<figure><img src="https://ahelwer.ca/img/sum-over-paths/interference.gif"/>
</figure>
<p>Needless to say I didn’t find this intuitive at all, whatsoever.
It took me a long time to figure out exactly what was meant by this aside.
And now I will share this hard-won knowledge with you, dear reader!</p>
<h2 id="negative-probability-in-my-universe">Negative probability? In my universe?</h2>
<p>Aaronson presents quantum mechanics as a generalization of probability theory.
We’re all familiar with how classical probability works - all the possible outcomes of an event have an associated probability between \(0\) and \(1\), and the sum of all of those probabilities must equal \(1\).
Imagine for a second you’re a supreme being designing your own universe.
Maybe you want to tweak a few parameters.
Perhaps - just to see what happens - you fiddle with your probability rules.
Instead of probabilities summing to one, <em>the sum of the squares of the probabilities</em> sums to one.
This is, essentially, how quantum mechanics works.</p>
<p>What are some implications of this rule change?
The main one we’re interested in is how “probabilities” (let’s call them <em>amplitudes</em> from now on) can now be negative, since the square of a negative number is a positive number (if this confuses you go read the extended explanation in the <a href="https://www.scottaaronson.com/democritus/lec9.html">QCSD lecture notes</a>).
What could it possibly mean for an event to have negative amplitude?
As we’ll see below, an event with negative amplitude can cancel out an event with positive amplitude, ensuring it never happens!
This is the basic idea underpinning destructive interference in quantum mechanics.</p>
<p>If you’re used to thinking about quantum computing the usual way, where vectors of amplitudes are transformed by matrix multiplication, you’ll miss this phenomenon.
We need to see things differently!
We need to use a different <em>picture</em> of quantum computation called <em>sum-over-paths</em>.
As opposed to the standard (also called Schrödinger) approach where we keep a full vector of \(2^n\) amplitudes around for \(n\) qbits, in the sum-over-paths (also called Feynman) approach we focus on the final value of a single amplitude and trace all the different paths through the computation that contributed to it.
How does it work?</p>
<h2 id="show-me-the-math">Show me the math</h2>
<p>Let’s look at a very basic example - applying the Hadamard operator twice to a single qbit, initialized to state \(|1\rangle\).
First, the standard approach with which we’re all familiar:</p>
<p>$$
HH|1\rangle =
\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \end{bmatrix}
\begin{bmatrix} 0 \\ 1 \end{bmatrix} =
\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix} 0 \\ 1 \end{bmatrix} =
|1\rangle
$$</p>
<p>Quantum interference is hidden somewhere in the calculations above, we just can’t see it!
Let’s reveal it with the sum-over-paths method, which is easiest to understand recursively.
Say we want to know the final value of amplitude \(|0\rangle\) (the topmost one) after having applied the two Hadamard operators; we don’t care about any other amplitudes.
Take a look at the final step in the computation again:</p>
<p>$$
\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix} 0 \\ 1 \end{bmatrix}
$$</p>
<p>Zooming in on the only value we care about (the topmost element of the final vector), by the rules of matrix multiplication we can write this as follows:</p>
<p>$$
\frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} \cdot \frac{-1}{\sqrt{2}} = 0
$$</p>
<p>Because of how matrix multiplication works, the final value of our amplitude \(|0\rangle\) depends both on the previous value of amplitude \(|0\rangle\) <em>and</em> the previous value of amplitude \(|1\rangle\); both contribute.
We can think of these two contributions as two paths coming together and (in this case) destructively interfering and canceling each other out!
Also, our two paths really branch into four paths since this is the second of two Hadamard operators; here’s the full calculation linking all the way back to the initial state:</p>
<p>$$
\frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} \cdot 0 + \frac{1}{\sqrt{2}} \cdot 1 \right) + \frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} \cdot 0 + \frac{-1}{\sqrt{2}} \cdot 1 \right) = 0
$$</p>
<p>We could imagine, given a much larger series of gates acting on many qbits, how these paths could fan out recursively many times until they hit the initial state - then bubble back up, coming together at each fork to interfere constructively or destructively, accumulating in the final value of the amplitude we care about.
Pretty neat!
Okay, so we’ve found the missing interference.
Now what?
All we’ve done is rewrite matrix multiplication at greater length!
Ah, but interference is just one of the prizes we’ve acquired.</p>
<h2 id="exponential-burdens-of-another-kind">Exponential burdens of another kind</h2>
<p>With the standard picture of quantum computation we have to keep a \(2^n\)-sized vector of amplitudes around to model the behavior of \(n\) qbits.
Sure you can be clever and keep the qbits factored as long as possible, but once things get maximally entangled there’s no way around it.
You need an exponential amount of space.
This is a problem when you’re trying to simulate quantum computation on a classical computer.
Classically simulating quantum computers is of interest not only to complexity theorists, but anyone who cares about the promise of quantum computation!
If we figure out a way to efficiently (as in, with polynomial time/space overhead) simulate quantum computation on a classical computer, then we don’t have to bother with all the exotic nanofabrication and near-absolute-zero vacuum environments and whatnot.
Just run your quantum program on the efficient classical simulator and send your condolences to the resultant horde of despondent grad students.</p>
<p>Now, currently there’s no good reason to believe that efficient quantum simulation is possible.
That doesn’t mean we don’t have options.
Like the one we’ve just learned about!
It turns out the sum-over-paths picture is a classic time/space tradeoff.
By recursively tracing these paths back through the computation, we never have to keep the entire state vector in memory; the (big!) downside is that our computation takes exponentially more time.
Given \(n\) qbits and \(m\) gates, the standard picture uses \(O(m2^n)\) time and \(O(2^n)\) space, while the sum-over-paths picture uses \(O(4^m)\) time and \(O(n+m)\) space.
How does it work?</p>
<p>Let’s define a recursive function \(f(|\psi\rangle, U, i)\), where:</p>
<ul>
<li>\(|\psi\rangle\) is the initial state vector</li>
<li>\(U\) is a list of operators to apply to \(|\psi\rangle\), ordered from last to first</li>
<li>\(i\) is the index of the amplitude we want to calculate</li>
</ul>
<p>We’ll use linear algebra indexing conventions:</p>
<ul>
<li>Indices count from \(1\)</li>
<li>For a vector, \(|\psi\rangle[n]\) is amplitude \(n\) in \(|\psi\rangle\)</li>
<li>For a matrix, \(U[m,n]\) is the element in row \(m\) and column \(n\) of \(U\)</li>
</ul>
<p>We have the following three recursive cases for \(f(|\psi\rangle, U, i)\):</p>
<ol>
<li>Base case: \(U = [\ ]\), return \(|\psi\rangle[i]\)</li>
<li>General case: \(U = [U_m, \ldots, U_{1}]\), where \(U_m\) is a \(2 \times 2\) matrix acting on 1 qbit; in the \(i\)th row of the full matrix \(M = \mathbb{I}_2 \otimes \ldots \otimes U_m \otimes \ldots \otimes \mathbb{I_2}\) (which would be multiplied against the state vector), there will be at most two nonzero entries at some indices \(a\) and \(b\); this means we need the amplitudes at indices \(a\) and \(b\) from the state prior to \(U_m\) being applied, so return:</li>
</ol>
<p>$$
M[i,a]\cdot f(|\psi\rangle, Tail(U), a) + M[i,b] \cdot f(|\psi\rangle, Tail(U), b)
$$</p>
<ol start="3">
<li>General case: \(U = [U_m, \ldots, U_{1}]\), where \(U_m\) is an operator acting on two qbits; in the \(i\)th row of the full matrix \(M\) (which would be multiplied against the state vector), there will be at most four nonzero entries at some indices \(a\), \(b\), \(c\), and \(d\); this means we need the amplitudes at indices \(a\), \(b\), \(c\), and \(d\) from the state prior to \(U_m\) being applied, so return:</li>
</ol>
<p>$$
M[i,a]\cdot f(|\psi\rangle, Tail(U), a) + M[i,b] \cdot f(|\psi\rangle, Tail(U), b) +
$$$$
M[i,c] \cdot f(|\psi\rangle, Tail(U), c) + M[i,d] \cdot f(|\psi\rangle, Tail(U), d)
$$</p>
<p>We can apply this recursive algorithm to our example with the Hadamard operators up above:</p>
<p>$$
f(|1\rangle, [H, H], 1) = H[1,1] f(|1\rangle, [H], 1) + H[1,2] f(|1\rangle, [H], 2)
$$
$$
f(|1\rangle, [H, H], 1) = \frac{1}{\sqrt{2}} f(|1\rangle, [H], 1) + \frac{1}{\sqrt{2}} f(|1\rangle, [H], 2)
$$
$$
f(|1\rangle, [H, H], 1) = \frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} f(|1\rangle, [\ ], 1) + \frac{1}{\sqrt{2}} f(|1\rangle, [\ ], 2) \right) + \frac{1}{\sqrt{2}} f(|1\rangle, [H], 2)
$$
$$
f(|1\rangle, [H, H], 1) = \frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} \cdot 0 + \frac{1}{\sqrt{2}} \cdot 1 \right) + \frac{1}{\sqrt{2}} f(|1\rangle, [H], 2)
$$
$$
f(|1\rangle, [H, H], 1) = \frac{1}{2} + \frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} f(|1\rangle, [\ ], 1) + \frac{-1}{\sqrt{2}} f(|1\rangle, [\ ], 2) \right)
$$
$$
f(|1\rangle, [H, H], 1) = \frac{1}{2} + \frac{1}{\sqrt{2}} \left( \frac{1}{\sqrt{2}} \cdot 0 + \frac{-1}{\sqrt{2}} \cdot 1 \right)
$$
$$
f(|1\rangle, [H, H], 1) = 0
$$</p>
<p>It’s easy to see how this algorithm would take \(O(4^m)\) time; if there were \(m\) 2-qbit operators, there would be four additional recursive calls at each level.
An <a href="https://arxiv.org/abs/1612.05903">algorithmic hybrid</a> of these two pictures exists which tries to use the best of each approach.
It’s also fun to think about how you’d combine the pictures yourself, for example using memoization to avoid recalculating recursive calls with the same parameters.</p>
<h2 id="implementation-in-the-microsoft-qdk">Implementation in the Microsoft QDK</h2>
<p>Microsoft’s Quantum Development Kit includes a very interesting <a href="https://www.linkedin.com/pulse/implementing-quantum-simulator-q-c-andr%C3%A9s-paz/">simulation framework</a> enabling developers to write their own classical simulator, then run their Q# programs on it!
The QDK ships with three built-in simulators: a standard <a href="https://docs.microsoft.com/en-us/quantum/user-guide/machines/full-state-simulator">full-state simulator</a>, a <a href="https://docs.microsoft.com/en-us/quantum/user-guide/machines/qc-trace-simulator/">trace simulator</a> to estimate resource usage (number of qbits etc.), and a <a href="https://docs.microsoft.com/en-us/quantum/user-guide/machines/toffoli-simulator">Toffoli simulator</a> to run reversible classical programs.
We can write our own sum-over-paths simulator following <a href="https://docs.microsoft.com/en-us/samples/browse/?languages=qsharp&terms=simulator">the QDK simulator code samples</a>.
Unfortunately I ran out of time to actually write the simulator before this post’s publish date, but it seems like a nice Christmas holiday project so check back on this section in next couple of weeks!
You can follow along with development at my repo <a href="https://github.com/ahelwer/quantum-experiments/tree/master/SumOverPaths">here</a>.</p>
<h2 id="what-about-the-picture">What about the picture?</h2>
<p>Only one mystery remains.
What’s the deal with this diagram?
How do we semantically map our sum-over-paths algorithm to its tree structure?</p>
<figure><img src="https://ahelwer.ca/img/sum-over-paths/interference.gif"/>
</figure>
<p>I actually had a chance to <a href="https://news.ycombinator.com/item?id=17427119">ask Scott Aaronson about this directly</a> in a HN AMA about two-and-a-half years ago, to give you an idea of how long this topic has been stewing in the back of my head.
I didn’t really understand the answer; this diagram sort of works the exact opposite of how I understand sum-over-paths, because it branches out from the initial state instead of all the paths converging at the final state.
I think we can probably rewrite our recursive algorithm as a bottom-up iterative algorithm based on depth-first search, in which case diagram makes a bit more sense.
If you think you can explain this, I have an <a href="https://quantumcomputing.stackexchange.com/q/15054/4153">as-yet unanswered question on QCSE</a> for this very topic.
Any help would be much appreciated!</p>
<h2 id="further-reading">Further reading</h2>
<p>If you enjoyed this content, I have two past posts on quantum computing that I think are pretty good:</p>
<ul>
<li><a href="https://ahelwer.ca/post/2018-12-07-chsh/">On entanglement and the CHSH game</a></li>
<li><a href="http://localhost:1313/post/2019-12-21-quantum-chemistry/">On simulating physical reality with a quantum computer</a></li>
</ul>
<p>If you don’t yet understand quantum computing but somehow made it to this section of the post, I made a lecture aimed at computer scientists that I wish I’d had access to when struggling through the material initially (slides: <a href="https://ahelwer.ca/files/qc-for-cs.pdf">pdf</a>, <a href="https://ahelwer.ca/files/qc-for-cs.pptx">pptx</a>):</p>
<figure><a href="https://youtu.be/F_Riqjdh2oM" target="_blank"><img src="https://ahelwer.ca/img/common/quantum-video-preview.png"/></a>
</figure>
<p>The <a href="https://docs.microsoft.com/en-us/quantum/tutorials/intro-to-katas">Quantum Katas</a> are also a very neat approach to interactively learning quantum computing by solving short Q# problems in a series of jupyter notebooks.</p>
<p>And of course, I recommend <a href="https://www.cambridge.org/core/books/quantum-computing-since-democritus/197A4CD13738E10AAD787DBB78D8E92C">Quantum Computing Since Democritus</a> - the book that kicked this whole thing off.
Thanks for reading!</p>
<p><em>This blog post is part of the <a href="https://devblogs.microsoft.com/qsharp/q-advent-calendar-2020/">2020 Q# Advent Calendar</a>.</em></p>
How do you reason about a probabilistic distributed system?
https://ahelwer.ca/post/2020-04-15-probabilistic-distsys/
Fri, 11 Sep 2020 00:00:00 +0000https://ahelwer.ca/post/2020-04-15-probabilistic-distsys/<h2 id="in-which-i-am-stunted-upon-by-coin-flips">In which I am stunted upon by coin flips</h2>
<p>Wasn’t too long ago that I felt pretty good about my knowledge of distributed systems.
All someone <em>really</em> needed in order to understand them, I thought, was a <a href="https://www.youtube.com/watch?v=JEpsBg0AO6o">thorough understanding of the paxos protocol</a> and a willingness to reshape your brain in the image of TLA+.
Maybe add a dash of conflict-free replicated datatypes, just so you know what “eventual consistency” means.
Past that it’s just some optimizations and mashups which come easily to your TLA+-addled brain.</p>
<p>This belief proved surprisingly robust over a number of years, even surviving an aborted attempt at analyzing the <a href="https://github.com/ahelwer/tla-experiments/blob/master/Nano.tla">Nano cryptocurrency</a>.
It was only after encountering <a href="https://muratbuffalo.blogspot.com/2018/06/snowflake-to-avalanche-novel-metastable.html">the snowflake family of consensus protocols</a> that I realized my theory just wasn’t up to the challenge.
The issue was <em>probability</em>: snowflake protocols reach consensus by iteratively polling sets of other nodes at random, and the argument that consensus is eventually reached is a statistical argument deriving an upper bound on the probability of failure.</p>
<p>I didn’t <em>dislike</em> probability & statistics, I just tried to keep my distance as much as possible.
All the algorithms in distributed systems I’d encountered so far involved <em>nondeterminism</em>, sure, but not probability.
I’d assumed nondeterminism was just a more flexible way of reasoning about probability.
This idea of mine would prove to be a source of great unnecessary confusion as I learned the art of reasoning about probabilistic distributed systems, so I’ll do you a favor and give you the core lesson of this entire post in one sentence:</p>
<p><strong>You cannot model probability with nondeterminism, and you cannot model nondeterminism with probability.</strong></p>
<h2 id="models-theyre-good-folks">Models: they’re good, folks!</h2>
<p>Have you ever been writing some multithreaded code, happily plugging in a mutex here, a semaphore there, or even just using some nice message-passing primitives to make your threads all get along?
Maybe you’ll be familiar, then, with what often comes next.
A scratch at the back of your mind, a thought - <em>“oh, wait…"</em> - as you realize something weird will happen if thread \(A\) manages to reach some step before thread \(B\) has finished its assigned task.
No worries! Slap on another WaitHandle, problem solved.
Except the problem wasn’t solved. Not really.
You consider it a bit more - what if thread \(C\) comes in with a message at this inopportune time?
You realize with dawning horror you’re actually tracing cracks in the foundation.
Patch them with mutexes! Semaphores! Anything!
Alas, you are beyond help. It’s around this time that your brain, catching a glimpse of the infinite plane of combinatorial state explosion, wisely ducks its head back down for the day and leaves you with a woozy, fuzzy, clenching feeling for having the gall to ask it to fix all this.</p>
<p>I’ve felt like this many times, and formal models are the only cure I’ve ever found.
Your brain isn’t built to hold massive state spaces in its working memory, so don’t even try.
Let a model checking program churn through all those states to find the bugs.
At this point I won’t even touch a multithreaded program or distributed system without whipping up a quick TLA+ spec of its desired workings.
I just specify all the possible events in the system, how those events affect the system state, what things I always want to remain true (the invariants), then let the model checker rip.
In TLA+, we model concurrency with nondeterminism; in a concurrent system, we have no idea whether thread \(A\) will execute a step before thread \(B\).
We can represent this with a nondeterministic state machine as follows:</p>
<figure><img src="https://ahelwer.ca/img/probabilistic-distsys/nondeterministic.svg" width="10000"/>
</figure>
<p>So you’ll be in state \(s_3\) if thread \(A\) executes its step before thread \(B\), and state \(s_4\) if thread \(B\) executes its step before thread \(A\).
Maybe \(s_3\) and \(s_4\) are even the same state, who knows.
The model checker will explore both of these possible execution orders, and <strong>in a well-designed concurrent system we should <em>never</em> end up in a bad state just because of a certain order of execution</strong>.</p>
<p>Readers might wonder how exactly this models concurrency, where steps can happen uh, concurrently.
The short answer is you have to ensure all the steps in your model are atomic or independent: either impossible in the real world for two of your steps to happen at the exact same time (for example, by assuming use of a lower-level hardware synchronization primitive) or impossible for execution of one step to directly affect the same variables as another step (for example, if the steps are executed on different computers within a timespan less than the network latency between them).
If the steps in your model satisfy this requirement, checking all possible execution orders accurately models concurrency.
If they don’t, you need to break the steps down further so they do.
This model nicely captures & exposes all that is difficult about concurrency.</p>
<p>What questions can we ask about this sort of model?
The most important questions are <em>reachability queries</em> - can we reach a <em>bad state</em> (two caches disagreeing on a value, deadlock, dogs & cats living together, etc.) from the starting state?
These questions are called <em>safety properties</em>, and if they are answered in the negative then the system is safe.
Another type of query is something like “are we always guaranteed to eventually end up in a good state?”
These are called <em>liveness properties</em>.
Turns out these two types of questions can get you pretty far in concurrent & distributed systems.
Definitely far enough to make a whole career out of writing rock-solid software in places others would falter.
However, these questions also have a drawback: their answers are absolute.
True or false.
No probability involved, no room for nuance.</p>
<p>What if one of the threads flips a coin, and if it’s heads it does one thing, tails another?
Entire state spaces, bifurcated by a probabilistic event.
Maybe those state spaces contain further coin flips, or other types of randomness.
In this system your questions might change from the form “is it possible to reach a bad state” to “what is the probability of reaching a bad state?”
Unfortunately these types of questions just cannot be answered within the nondeterministic model used above.
<strong>You cannot model probability with nondeterminism.</strong>
We must use a new type of model, a state machine that handles probability directly.</p>
<h2 id="leaving-the-beautiful-pure-discrete-realm">Leaving the beautiful pure discrete realm</h2>
<p>TLA+ can’t handle probability at this time, so we’d have to use a specialized modeling language like <a href="http://www.prismmodelchecker.org/">PRISM</a> which handles probabilistic state machines.
Let’s look at the standard hello-world example for probabilistic state machines: the <a href="http://www.prismmodelchecker.org/bibitem.php?key=KY76">1976 Knuth-Yao method</a> for simulating a fair six-sided die with a series of coin flips.
This is really quite a neat problem and I encourage you to ponder it for a second before seeing how they did it!
Any sequence of \(n\) coin flips will give you an event which has probability \(\frac{1}{2^n}\) of occurring.
Simulating a fair six-sided die requires generating an event with probability \(\frac{1}{6}\) of occurring.
You might then reason this problem is impossible, because you cannot evenly divide \(2^n\) by \(6\) for any \(n\) (this follows from the uniqueness of prime factorization).
Indeed, there is no way to simulate a six-sided die with a finite number of coin flips.
We have to use an algorithm which is not guaranteed to ever terminate, although vanishingly unlikely not to do so.
Here it is:</p>
<figure><img src="https://ahelwer.ca/img/probabilistic-distsys/knuth-yao.svg" width="10000"/>
</figure>
<p>You can see that if you somehow only flip heads, or only flip tails, you’ll never reach one of the accepting states (here labeled with the die number they represent).
There are some fun ways to contextualize the probabilities of you only flipping heads or tails a certain number of times in a row.
For example, there are only <a href="https://www.popularmechanics.com/space/a27259/how-many-particles-are-in-the-entire-universe/">around \(2^{268}\) subatomic particles in the observable universe</a>; if you manage to flip heads 268 times in a row, that’s the same as picking the correct subatomic particle out of a universe-wide random draw.
Maybe go look at the <a href="https://en.wikipedia.org/wiki/Hubble_Ultra-Deep_Field">Hubble Ultra-Deep Field</a> as you ponder this probability.
Another way is assuming you’re between the ages of 25-34 and live in the USA, your annual all-cause mortality rate is <a href="https://www.cdc.gov/nchs/products/databriefs/db355.htm">about 129/100,000</a>.
Assuming deaths are uniformly distributed throughout the year, this means your chances of dying today are about 1 in 283,000.
This is just 18-19 all-heads or all-tails coin flips in a row.
What I’m saying is that you really, really shouldn’t worry about having to flip the coin very many times.</p>
<p>This probabilistic state machine model we’ve created is called a <em>Discrete-Time Markov Chain</em>, or DTMC.
In DTMCs, every transition has an associated probability and the probabilities of all out-flowing transitions must sum to one for every state (accepting states can be thought to have a loopback with probability 1).
The above rumination on termination probabilities is summed up in <em>the long run theorem</em>: in the long run, every path in a finite Markov chain ends in an absorbing state, which is a state (or group of states) from which there is an entrance but no exit.
What questions can we ask of DTMCs?
The most interesting one - the reason why we’re here - is “what is the probability of eventually reaching a certain state?”
The long run theorem tells us we have a 100% chance of eventually reaching <em>one</em> of the Knuth-Yao state machine’s accepting states.
What about the probability of ending up in a specific accepting state?
It should be \(\frac{1}{6}\). Is it?</p>
<p>Let’s try to reason this out with basic probability.
What are the chances of ending up in accepting state \(1\)?
Well, you can get there by flipping \(HHT\).
The probability of that happening is \(\frac{1}{2} \cdot \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{8} \).
But you can also get there by flipping \(HHHHT\).
The probability of <em>that</em> happening is \(\frac{1}{2^5} = \frac{1}{32} \).
We have to add this to the first probability, so now our probability is \(\frac{1}{8} + \frac{1}{32} = \frac{5}{32}\).
But we can <em>also</em> get there by flipping \(HHHHHHT\), with probability \(\frac{1}{2^7} = \frac{1}{128}\).
I’m sure you can see where this is going.
We’re dealing with something truly horrific, an infinite sum of infinite products.
If we repeat this process a few more times we can see it numerically converging to \(\frac{1}{6}\), or \(0.16666\ldots\) but how do we get a nice closed-form solution?
To avoid sending my readers through a math-heavy meatgrinder from which few would emerge, I’ve pushed the algorithm’s explanation to this post’s appendix (or you can just <a href="http://www.prismmodelchecker.org/tutorial/die.php">get PRISM to calculate it</a> and never think about this again).
For now just know you can write the DTMC as a matrix and the problem reduces to solving a simple system of linear equations.
The Knuth-Yao state machine indeed accurately simulates a fair six-sided die.</p>
<p>Okay, so now we can model probabilistic systems.
Are we done? Sadly no.
Remember the other half of this post’s lesson: <strong>you cannot model nondeterminism with probability</strong>.
Let’s go back to our concurrent system where either thread \(A\) or thread \(B\) can take a step; we don’t know which will execute first.
How do we model this in a DTMC?
“Easy!” you might say. “Each thread has a \(\frac{1}{2}\) chance of going first, or a \(\frac{1}{n}\) chance if there are \(n\) threads in the system. Plug in these probabilities and fire up the model checker!”
Bzzt. Wrong. This is probably the most conceptually-difficult part of this post.
Basically, by modeling your threads in this way, you are making an <em>assumption about the thread scheduler</em>.
Your assumptions are the foundation of your model; they <em>must</em> accurately correspond to the system you’re reasoning about.
If they don’t, your model is useless.
It’ll be able to generate a lot of nice-looking numbers that hold absolutely no relation to reality.
In this case we can’t assume, in general, that the thread scheduler will assign each thread processor time with uniform probability.
That assumption makes even less sense in a fully distributed system with processes running on separate computers connected by a network.</p>
<p>So what can we assume about the scheduler?
Well, nothing - that’s why we need nondeterminism.
It enables us to explore what happens under <em>every</em> possible scheduling system.
So how do we model that in a DTMC?
We can’t. We need something new. Something that combines probability with the power of nondeterminism.</p>
<h2 id="an-automata-to-surpass-dtmcs">An automata to surpass DTMCs</h2>
<p>Meet the <em>Markov Decision Process</em> (MDP).
It’s a prickly entity, prone to sucking up your mind’s comprehension ability as you muddle through treatises on probabilistic temporal logic, and your computer’s memory as the model checker explores its depths.
At first glance, it’s literally just a DTMC with nondeterministic steps.
The tricky part is how that changes what questions you can ask about the model.</p>
<p>Let’s think of a very simple system which uses both probability and nondeterminism.
We have two threads, \(A\) and \(B\), which are scheduled nondeterministically.
We also have two coins, an unfair one \(U\) with a \(\frac{3}{4}\) chance of landing on heads and a \(\frac{1}{4}\) chance of landing on tails, and a fair one \(F\) with a 50/50 chance of landing on heads or tails.
Whichever thread goes first grabs the unfair coin \(U\) and gives it a flip.
The thread going second grabs the remaining coin \(F\) and gives it a flip itself.
Here’s how this looks as a MDP; we label states with \(A_x B_y\), where \(x\) and \(y\) are one of \(\_\), \(H\), or \(T\) to represent not-yet-flipped, heads, or tails respectively for threads \(A\) and \(B\):</p>
<figure><img src="https://ahelwer.ca/img/probabilistic-distsys/mdp.svg" width="10000"/>
</figure>
<p>Each MDP state transition has two components: a label so the nondeterministic scheduler can pick (“decide” to take) that step, and some number of transitions forking off of this with associated probabilities (which must sum to 1).
We can see how this is a hybrid of DTMCs and nondeterministic state machines: if each transition only has a single fork with probability 1, the MDP reduces to our basic nondeterministic state machine; if each state only has a single outgoing transition with some associated probabilities, the MDP reduces to a DTMC.</p>
<p>Here’s how our MDP looks in PRISM code:</p>
<pre><code>mdp
module StrangeCoinGame
// Not-yet-flipped: 0, heads: 1, tails: 2
AFlip : [0 .. 2] init 0;
BFlip : [0 .. 2] init 0;
// Choose one of the threads to go first and flip the unfair coin
[] AFlip = 0 & BFlip = 0 -> 0.75 : (AFlip' = 1) + 0.25 : (AFlip' = 2);
[] AFlip = 0 & BFlip = 0 -> 0.75 : (BFlip' = 1) + 0.25 : (BFlip' = 2);
// The second thread flips the fair coin
[] AFlip != 0 & BFlip = 0 -> 0.5 : (BFlip' = 1) + 0.5 : (BFlip' = 2);
[] AFlip = 0 & BFlip != 0 -> 0.5 : (AFlip' = 1) + 0.5 : (AFlip' = 2);
// Loopback in accepting states
[] AFlip != 0 & BFlip != 0 -> (AFlip' = AFlip) & (BFlip' = BFlip);
endmodule
</code></pre><p>Now for the grand finale!
What questions can we ask of this model?
Let’s start with something simple, like “what is the probability of reaching state \(A_H B_T\)?”
I encourage readers to mull this one over for a bit.
If \(A\) goes before \(B\), the probability of reaching state \(A_H B_T\) is \(\frac{3}{4} \cdot \frac{1}{2} = \frac{3}{8}\).
However, if \(B\) goes before \(A\), the probability of reaching state \(A_H B_T\) is \(\frac{1}{4} \cdot \frac{1}{2} = \frac{1}{8}\).
Remember we don’t know/assume anything about the probability of \(A\) going before \(B\) or vice-versa.
How, then, are we supposed to answer the question “what is the probability of reaching state \(A_H B_T\)?”
We can’t!
It’s an invalid question!
<strong>The only questions we can ask about MDPs are questions about the maximum or minimum probabilities of reaching a state, across all possible execution orders.</strong>
This makes more sense when you consider how MDPs are model checked, which is by generating a DTMC for every possible order of execution (expensive!) then finding the reachability probability in each of those DTMCs and taking the global max or min.
So, the claims about your system become “it will reach a bad state with at most X% probability” or “it always has at least a Y% probability of success”.</p>
<p>Working with distributed systems requires a catastrophic mindset.
If there’s an ordering of events that could cause your system to fail, you must assume it will happen and evaluate your design within that regime (at scale, it’s a certainty the behavior will occur sooner or later).
So when dealing with probability, the ordering of events that gives the highest chance of failure <em>is</em> your chance of failure.
And that’s how you reason about a probabilistic distributed system.</p>
<h2 id="melting-the-snow">Melting the snow</h2>
<p>Actually using MDPs to analyze the Snowflake protocols is deserving of its own post, since this one is getting quite long and I still have to write the above-promised appendix.
Instead I’ll just throw a bunch of links at you.
<a href="https://ipfs.io/ipfs/QmUy4jh5mGNZvLkjies1RWM4YuvJh5o2FYopNPVYwrRVGV">Here</a> is the original paper presenting the Snowflake family of protocols, posted pseudonymously on IPFS by “Team Rocket” (almost certainly Emin Gün Sirer et al., let’s be real); it is being productized by a company called <a href="https://www.avalabs.org/">Ava Labs</a>.
<a href="https://muratbuffalo.blogspot.com/2018/06/snowflake-to-avalanche-novel-metastable.html">Here</a> is a good writeup & summary of the protocols by Murat Demirbas, a professor who researches distributed systems at SUNY Buffalo.</p>
<p><a href="https://sarahjamielewis.com/">Sarah Jamie Lewis</a> is working on analyzing the snowflake protocols with MDPs and also a type of model we didn’t cover called Continuous-Time Markov Chains (CTMCs) - perhaps CTMCs will be the subject of another post.
She’s developing a interesting attack called Snowfall using Byzantine response delays, detailed <a href="https://git.openprivacy.ca/sarah/formal-verification/raw/branch/master/snowfall.pdf">here</a>.
Her formal models can all be found in <a href="https://git.openprivacy.ca/sarah/formal-verification">this git repo</a>.</p>
<p>For myself I’ve modeled the most basic Snowflake protocol (called Slush) as a MDP in PRISM <a href="https://github.com/ahelwer/avalanche-analysis/blob/master/slush/slush.prism">here</a>.
Look forward to a future post on what I learned - model checking MDPs is very expensive and the model is difficult to scale!
CTMCs are apparently more scalable than MPDs, although I still don’t understand them very well so what they lose in model fidelity is unknown to me.</p>
<p>Finally, if you’re interested in the math & algorithms behind DTMCs, I can’t recommend this paper enough: <a href="http://i-cav.org/2015/wp-content/uploads/2015/07/mod12_katoen.pdf"><em>Model Checking Meets Probability: A Gentle Introduction</em></a> by Joost-Pieter Katoen, a professor at RWTH Aachen University.
Without this paper I would never have been able to understand this material & write this post.
Alternatively if you’re in search of an enormous tome containing all that is known about formal models and the checking thereof, see the <a href="https://link.springer.com/book/10.1007/978-3-319-10575-8">Handbook of Model Checking</a>, just published in 2018.</p>
<h2 id="corrections">Corrections</h2>
<p><a href="https://twitter.com/pressron">Ron Pressler</a> correctly points out <a href="https://old.reddit.com/r/tlaplus/comments/j06ohw/how_do_you_reason_about_a_probabilistic/g6owlxy/?utm_source=reddit&utm_medium=web2x&context=3">here</a> that it isn’t nondeterminism per se which fails at modeling probability, but rather our inability to express properties over the domain of all system behaviors (beyond \(\forall\) and \(\exists\)).
If we could write a TLA+ function that sums the value (or takes the max/min) of variables across every single possible system behavior, we could reason usefully about probability in TLA+.
Unfortunately we cannot do that at this time.</p>
<h2 id="appendix-calculating-reachability-probabilities">Appendix: calculating reachability probabilities</h2>
<p>What follows will be a condensed & simplified version of the algorithm presented in the paper <a href="http://i-cav.org/2015/wp-content/uploads/2015/07/mod12_katoen.pdf"><em>Model Checking Meets Probability: A Gentle Introduction</em></a>.
Recall our Knuth-Yao DTMC:</p>
<figure><img src="https://ahelwer.ca/img/probabilistic-distsys/knuth-yao.svg" width="10000"/>
</figure>
<p>Let’s try to calculate the probability of reaching accepting state \(2\) from the starting state.
We use a simple recursive algorithm to convert this into an easily-solved system of linear equations.
First, some definitions:</p>
<ul>
<li>\(P(s, t)\) is the probability associated with the transition between state \(s\) and \(t\)</li>
<li>\(G\) is the set of goal states for which we want to calculate the reachability probability</li>
<li>\(x_s\) is the probability of reaching \(G\) from a specific state \(s\)</li>
</ul>
<p>Our objective is to find \(x_s\) where \(s\) is the start state and \(G\) is a set \(\{2\}\) containing only the accepting state \(2\), but in order to do that we have to find \(x_s\) for every state \(s\) in the DTMC.
We do it with three simple rules:</p>
<ol>
<li>Base case 1: if \(s \in G\), then \(x_s\) = 1</li>
<li>Base case 2: if \(G\) is not reachable from \(s\), then \(x_s\) = 0</li>
<li>Recursive case: otherwise, \(x_s = \sum_{t \notin G} P(s, t) \cdot x_t + \sum_{u \in G} P(s, u)\)</li>
</ol>
<p>The equation in the recursive case looks fairly horrific, but worry not - we’ll get there.
For the base cases it’s trivial to mark all the states in \(G\) as 1, and we can run a breadth-first search backwards from the states in \(G\) to find all the states from which \(G\) is reachable and mark the others as 0.
For the recursive case it’s easiest to think about splitting this into two sub-cases: for a given \(s\), the first \(\sum_{t \notin G}\) is the probability of reaching \(G\) in a roundabout way by going through some intermediate state(s) (this is recursive since it depends on the probability of reaching \(G\) from those states).
The second \(\sum_{u \in G}\) is the probability of reaching \(G\) directly, in a single step.
Add them together and you get \(x_s\).</p>
<p>Let’s apply this algorithm to our example; here are the base cases:</p>
<ol>
<li>\(x_2 = 1\)</li>
<li>\(x_{s_2} = x_{s_5} = x_{s_6} = x_1 = x_3 = x_4 = x_5 = x_6 = 0\)</li>
</ol>
<p>For \(x_{s_0}, x_{s_1}, x_{s_3},\) and \(x_{s_4}\) we have:</p>
<ul>
<li>\(x_{s_0} = \frac{1}{2} x_{s_1} + \frac{1}{2} x_{s_2}\)</li>
<li>\(x_{s_1} = \frac{1}{2} x_{s_3} + \frac{1}{2} x_{s_4}\)</li>
<li>\(x_{s_3} = \frac{1}{2} x_{s_1} + \frac{1}{2} x_{1}\)</li>
<li>\(x_{s_4} = \frac{1}{2} x_3 + \frac{1}{2} \)</li>
</ul>
<p>This is a system of linear equations!
Using Gaussian elimination to solve for \(x_{s_0}\), we see that it indeed equals \(\frac{1}{6}\).
The above paper explains how to mechanically translate this into a matrix that can be solved with a quick call to a linear algebra library, but I hope you see how this algorithm works conceptually.</p>
Meditation
https://ahelwer.ca/post/2020-08-29-meditation/
Sat, 29 Aug 2020 00:00:00 +0000https://ahelwer.ca/post/2020-08-29-meditation/<p>This isn’t going to be a post about how adopting a several-thousand-year-old practice <a href="https://www.independent.co.uk/life-style/health-and-families/healthy-living/mindfulness-sells-buddhist-meditation-teachings-neoliberalism-attention-economy-a8225676.html">can make you a better servant of capital</a>.
Instead, let’s talk about when I feel the lowest of the low.
It comes after spending any number of hours on my computer, maybe even a full day, endlessly circling around different websites in search of stimulation, the quick jolt that comes with learning an interesting fact or watching a funny short video or seeing someone get dunked on for having a bad political opinion.
I’ll eventually wake into a state that could be identified as conscious thought, look back on the many (genuinely exciting!) things I wanted to learn or do that day, and contrast it with what I actually did.
What did I actually do? I honestly don’t even know. I doubt I could list even three things I’d read or experienced the entire preceding eight hours.
At this moment I feel very bad. I know one of the ~30,000 days in my life is gone.
And I didn’t get anything out of it.
Actually, I’m worse off, because spending your time in this way just begets even more time spent in this way.</p>
<p>I don’t have any social media apps on my phone.
I deleted my twitter account.
I deleted my reddit account.
I don’t have an instagram account.
My facebook account exists only to be part of some groups.
I use LeechBlock.
I redirect websites to 0.0.0.0 in my hosts file.
Everyone I know does most of these things.
Everyone I know still has trouble spending their time how they actually want to spend it.
If I were to write down a list of the things I find most important in life, endlessly circling around a handful of websites would not even be in consideration.</p>
<p>If I get stuck in the loop for more than a week or so my brain starts to eat itself.
There’s a visceral knawing feeling.
My thoughts dwell on how to defeat this thing which is deleting hours from my life.</p>
<p>So far the only thing which has helped is mindfulness meditation.
After sitting for 10 or 20 minutes and focusing on my breath, my mind becomes quiet.
Traffic on the <a href="https://en.wikipedia.org/wiki/Default_mode_network">default mode network</a> dies down a little bit.
I’m no longer instantly pulled into the online vortex upon encountering the slightest speedbump in my work.
Instead my mind just… stays quiet, and I keep the problem in my head.
This doesn’t last forever.
I could probably meditate for 10 minutes out of every hour and still have spent more time on the things I actually wanted to spend time on, at the end of the day.</p>
<p>Deciding to meditate is hard.
The idea of spending ten minutes intentionally doing nothing runs against a lot of messaging I’ve internalized.
It’s also very difficult to keep my attention on my breath instead of running away to consider whatever figment wanders through my thoughts.
There’s an interesting metaphor I’ve heard.
Your mind is like a train station.
Thoughts think themselves - they are trains pulling into the station.
A train of thought will shortly pull out of the station, but you don’t have to be on it.
Or maybe you just boarded it without thinking, then don’t realize what you’ve done until you’re a mile down the track.
No worries; you can always return to the station.</p>
<p>I decided to try mindfulness meditation after listening to the audio version of Robert Wright’s book <a href="http://whybuddhismistrue.net/">Why Buddhism Is True</a> while on a cross-country drive.
It’s about how many beliefs in Buddhism - specifically Western Atheistic Buddhism - have support from modern psychology & neuroscience.
I wasn’t motivated to start meditating by thinking it would help me in this struggle for my attention; the discovery was quite accidental.
I use the <a href="https://www.wakingup.com">Waking Up</a> app.
Friends of mine also report good experiences with <a href="https://www.headspace.com/">Headspace</a>.</p>
<p>It’s troubling I have to go to such lengths to live my life the way I think I want to live it.
Maybe this struggle is uniquely modern, maybe it’s one for the ages.
I still have issues sometimes, especially before bed when my defenses are lower and I can get sucked into the loop instead of reading a good book or just going to sleep.
Still, for the first time in my life I’ve found an actually effective tool for resisting the attention economy.</p>
<h2 id="further-reading--watching">Further Reading & Watching</h2>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/wf2VxeIm1no" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
Taking my home work setup seriously
https://ahelwer.ca/post/2020-08-09-home-ergonomics/
Fri, 14 Aug 2020 00:00:00 +0000https://ahelwer.ca/post/2020-08-09-home-ergonomics/<p>The headlines don’t lie.
<a href="https://www.crn.com/news/mobility/microsoft-extends-work-from-home-option-through-january">Microsoft</a>, <a href="https://www.cnn.com/2020/07/27/tech/google-work-from-home-extension/index.html">Google</a>, <a href="https://www.theverge.com/2020/7/15/21326017/amazon-work-from-home-extend-january-2021-corporate">Amazon</a>, <a href="https://www.cnbc.com/2020/08/06/facebook-will-allow-employees-to-work-remotely-until-july-2021.html">Facebook</a>, and a whole host of other tech companies have announced employees will be working from home until early-mid 2021.
There are reasons to believe this will be pushed back; even if the <a href="https://blogs.sciencemag.org/pipeline/archives/2020/04/15/coronavirus-vaccine-prospects">staggeringly ambitious</a> timelines for vaccine development are met, <a href="https://www.cbc.ca/player/play/1772035651845">a vaccine might not be a silver bullet</a> and the pandemic could require management for the next 2-3 years.
As a software engineer in big tech’s orbit, this means it’s time to settle in for the long haul and take my home work setup seriously.</p>
<p>I’ve been very concerned with ergonomics ever since early-career wrist issues had me mousing with my non-dominant hand for six months.
One of my university friends had such terrible pain that he became a pioneer in <a href="https://livestream.com/internetsociety3/hopeconf/videos/131671656">open-source speech recognition for programming by voice</a> (although his issues didn’t turn out to be from repetitive stress injuries).
All of which is to say I haven’t used a non-ergonomic keyboard or mouse in about a decade.
Still, my home setup, while decent, fell short in a few crucial areas.
It was time to make some improvements - I’m in this career for the long run.
Join me on this somewhat self-indulgent journey!</p>
<h1 id="the-objective">The Objective</h1>
<p>My top-level aims were as follows:</p>
<ol>
<li>Create a setup where I’m happy & comfortable working for long periods of time</li>
<li>Create a setup which will not cause injury in the mid to long term</li>
</ol>
<p>To facilitate these aims, I chose these constraints:</p>
<ol>
<li>Focus on basic uncomplicated ergonomics, anchored in <em>adjustability</em></li>
<li>Prioritize <em>frugality</em> by buying used, and only buying features I definitely need</li>
<li>Avoid falling into the <em>consumerist perfection trap</em></li>
</ol>
<h3 id="ergonomics--adjustability">Ergonomics & Adjustability</h3>
<p>When it comes to ergonomics, the science isn’t complicated; here’s how you get the biggest bang for your buck:</p>
<ol>
<li>Set the relative height of your chair & desk so that you type & mouse with straight wrists (not flexed up or down)</li>
<li>Reduce wrist pronation (inward twisting) by using a split/tented keyboard and vertical mouse or trackball</li>
<li>Ensure monitors are placed high enough so you can see them without sagging your neck forward (this is a big drawback with laptops)</li>
</ol>
<p>All the rest of it - standing desks, exotic chair designs, desk treadmills, mechanical keyswitches, keyboard trays, gas spring monitor arms - are basically a huge waste of time & money if you can’t nail these three basic things.
I’m not going to spend time on good posture because that falls in the domain of flossing & working out every day: if you want to do it, you’ll do it.
Programmer’s slouch can even be accomodated to a degree:</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/LXYLVbt7j7o" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
<p>Regarding adjustability, we’ve all probably heard the tale of how the USAF designed its airplane cockpits for “average” body dimensions, with the end result that <a href="https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html">literally nobody exactly fit those designs</a>.
This informs our approach here: prioritizing adjustability over other concerns like “quality” or whatever. Everyone reading this post will have body dimensions purely unique to themselves, and mere inches can make a difference.</p>
<h3 id="frugality">Frugality</h3>
<p>Buy used. Not complicated.
Computing equipment depreciates like milk left out in the sun, so you can snag perfectly good five-year-old equipment for 75-80% off MSRP.
Check Craigslist first (or Facebook Marketplace), then eBay.
I’ve been doing this for years and haven’t been burned once.
Single-core CPU performance <a href="https://www.cpubenchmark.net/year-on-year.html">has been flat for almost a decade now</a>; you probably don’t really need the latest & greatest to do your work.
Also, most ergonomic equipment is built like a tank and can easily stand the test of time & use.</p>
<p>The second component to frugality is much more subjective, but basically amounts to not buying what you don’t really need.
A great example here is a <a href="https://youtu.be/__K4V8pFhf4">$25 generic metal bolt monitor stand vs. $300 gas spring monitor arms</a> from Ergotron:</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/__K4V8pFhf4" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
<p>If you’re like me, once your monitors are in the appropriate position they’re basically never touched again.
Buying a fancy gas spring stand would be a complete waste of money.
The example makes it sound simple, but it’s really easy to forget this as you’re watching flashy product videos.
Buying stuff is fun!
Which leads into our next constraint…</p>
<h3 id="the-consumerist-perfection-trap">The Consumerist Perfection Trap</h3>
<p>This is best illustrated by a YouTube comment I found under <a href="https://www.youtube.com/watch?v=LALQsqZP1nA">a Linus Tech Tips video</a> reviewing the very expensive Ergodox split mechanical keyboard:</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/ergodox-youtube-comment.PNG"/>
</figure>
<p>Humans are endlessly adaptable, for better or worse.
Before setting out, know this: you might sink days of research and thousands of dollars into your work setup, but it will never be quite perfect.
Just accept this. Live with it.
It’s called <a href="https://en.wikipedia.org/wiki/Hedonic_treadmill">hedonic adaptation</a>.
Even if your monitors end up an inch too close to your face, remember that the whole setup is <em>massively</em> better than sitting at your kitchen table or couch with a laptop.
Buying things is fun, and spending money begets spending more money.
It’s a good idea to let a setup sit for a couple weeks or a month before rushing off with further tweaks & improvements.</p>
<p>Inside us all there is a void.
People want to complete themselves and fill this void with spirituality, or hedonistic pursuits, or material things.
If you’ll indulge a metaphor, this is not a void that can be filled - its nature is more akin to a black hole of the cosmic variety.
Feeding it things - for example, expensive ergonomic equipment - will simply add to its mass and pull.
Only if left alone might it slowly evaporate.
You must learn to live with it.
Materialism is the belief that something outside yourself will finally bring you permanent satisfaction, and we don’t want to be materialistic.</p>
<p>Just remember why you’re doing this: to facilitate work and avoid injury.</p>
<h1 id="the-setup">The Setup</h1>
<p>The main event. Here’s what I ended up getting.</p>
<h3 id="desk-uplift-v2-motorized-adjustable-standing-desk">Desk: UPLIFT V2 motorized adjustable standing desk</h3>
<p>This one hurt the most: it cost $835!
I’d been spoiled with a motorized adjustable standing desk while working at Microsoft, and was loathe to give it up.
I ended up springing for an uplift desk because it drives me crazy when things are wobbly, and many standing desks are allegedly quite wobbly if you’re a taller person.
I bought a <a href="https://www.upliftdesk.com/uplift-v2-standing-desk-v2-or-v2-commercial/">very basic uplift desk model</a>: 60x24" white laminate with just the memory pad and power grommet added on.
No way was I going to find something like this on the used market, sadly.
Buy once, cry once.
It came with a very nice cushy mat which really does improve the standing experience.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/uplift-desk.jpeg"/>
</figure>
<p>If you’re looking to save money here and don’t mind buying the motorized legs & desk top separately, there are allegedly <a href="https://lobste.rs/s/fvnhyd/taking_my_home_work_setup_seriously#c_sj3n6z">good deals to be had on Monoprice</a>.</p>
<h3 id="chair-haworth-zody">Chair: Haworth Zody</h3>
<p>Here’s where I got a pretty good deal.
Forget Herman Miller: their chairs still go for crazy prices used, but one can easily find a gently-used <a href="https://www.haworth.com/na/en/products/stools/zody-1.html">Haworth Zody</a> for $200 (vs. $1400 new) on Craigslist in any large-ish metropolitan area.
I also grew used to these while working at Microsoft, and it’s a truly wonderful high-quality chair - and very adjustable!
You can change seat height, seat depth, tilt tension (or lock it from leaning back), lumbar support height, and armrest position along all axes (height, side-to-side, forward/backward).</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/haworth-zody.jpg"/>
</figure>
<p>If you’re buying used you should test two things.
One, put it on a level surface and extend seat height to maximum; check whether it’s developed a wobble.
Two, test whether the tilt tension control still works (adjusted with a crank on the right side).
I use it locked all the time anyway, but you might care about this feature - I’d just use its absence to secure a further discount!</p>
<p>The only other advice here is not to buy one of those <a href="https://youtu.be/x3DpVTyQOyM">ergonomic kneeling chairs</a>.
They’re cheap & compact, but if you’re like me your back will get tired and you’ll just revert to a horrible forward-slouching posture.
If you really think you’ve got what it takes, try sitting on an inexpensive yoga ball first - at least you might be able to use it for other things when the whole venture goes sideways.</p>
<h3 id="keyboard-goldtouch-gtn-0099">Keyboard: Goldtouch GTN-0099</h3>
<p>Let’s start this section with a hot take: mechanical keyswitches are wholly unnecessary for a good keyboard experience.
I had a <a href="https://www.daskeyboard.com/model-s-professional/">Das Keyboard</a> long ago (without keylabels for <em>extra hacker cred</em>), and it was nice for a bit but you quickly tune out the clickiness (although your roommates don’t).
In the end it just isn’t worth the huge extra cost & noise (yes I’ve heard of Cherry MX Silent Red switches, leave me alone you fanatics).
Mechanical keyswitches provide very dubious ergonomic benefit.
I will say it’s humorous seeing tons of love & effort dumped into custom mechanical keyboards that still use a flat slab layout like they’re some throwaway Dell keyboard from 2006.
Show some love toward your wrists, you barbarians!</p>
<p>For a number of years I used the <a href="https://www.microsoft.com/en-us/p/microsoft-sculpt-ergonomic-desktop/8xk02kz6k69w?activetab=pivot%3aoverviewtab">Microsoft Sculpt Ergonomic Keyboard</a>, which was nearly perfect except for the build quality being complete crap - I burned through two of them in four years (this seems to be a common story).
After the second one died I started using the <a href="https://www.microsoft.com/accessories/en-us/products/keyboards/natural-ergonomic-keyboard-4000/b2m-00012">Microsoft Natural Ergonomic Keyboard 4000</a>, an enormous dinosaur which can reliably be found for $20 used.
Unfortunately it has a full number pad on the right side, meaning you’re either reaching way out toward your mouse or pushing the keyboard to your left and typing with asymmetric hand positions:</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/ms4k-hand-position.jpg"/>
</figure>
<p>I wanted a keyboard that was split/tented, no number pad, bombproof build quality, and less than $120 or so.
It came down to the <a href="https://kinesis-ergo.com/shop/freestyle2-for-pc-us/">Kinesis Freestyle2</a> and the <a href="https://shop.goldtouch.com/products/goldtouch-v2-adjustable-comfort-keyboard-pc-only">Goldtouch GTN-0099</a>.
In the end I chose Goldtouch because it’s much less expensive ($60 on eBay!) and looks decidedly less cheap than the Kinesis and its weird sold-separately tenting kit.
I couldn’t be happier!
I love the Goldtouch’s chunkiness & solidity.
The keyswitches are sturdy & pleasant to type on.
Also, it is very adjustable - the keyboard halves are connected by a lockable ball joint you can set to any reasonable combination of split and tent angle.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/goldtouch-keyboard.jpg"/>
</figure>
<p>It strikes me as odd that I so rarely saw mention of Goldtouch in any of the (many) discussions of ergonomic keyboards I’ve read in my time online.
Maybe it’s because of their business-centric marketing, or the website design & name giving the impression it’s run by a group of humorless physiotherapists who’ve never pondered the glory of a unicorn-vomit RGB lighting setup.
Realistically though, most online keyboard discussion is just driven by mechanical keyswitch enthusiasts who have no interest in their product line.
It should be noted that pretty much the only mechanical keyboards meeting my requirements were the <a href="https://kinesis-ergo.com/shop/freestyle-pro/">Kinesis Freestyle Pro</a> ($210 with tenting kit), the <a href="https://kinesis-ergo.com/shop/advantage2/">Kinesis Advantage2</a> ($330), and the <a href="https://ergodox-ez.com/">Ergodox EZ</a> ($350).
All of these are beautifully-designed tools, but do they really justify a price multiple of up to 5x a non-mechanical Goldtouch keyboard?
I think not.
Goldtouch makes excellent keyboards and I hope they get the love they deserve.</p>
<h3 id="mouse-logitech-mx-vertical-ergonomic-mouse">Mouse: Logitech MX Vertical Ergonomic mouse</h3>
<p>Before this I had a cheap <a href="https://www.anker.com/products/variant/anker-24g-wireless-vertical-ergonomic-optical-mouse/A7852011">$25 vertical mouse from Anker</a>, which unsuprisingly died within a year or two of purchase.
After that I bought the more expensive <a href="https://www.logitech.com/en-us/product/mx-vertical-ergonomic-mouse">$90 vertical mouse model from Logitech</a> hoping for better build quality.
It’s worked out well so far!
Mousing was the source of my wrist injuries a decade ago and they haven’t yet reoccurred.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/logitech-mouse.jpg"/>
</figure>
<p>My friend David (the aforementioned who programs via speech) recommends using a trackball instead of a vertical mouse.
I’m considering this route myself; of my meagre remaining ergonomic complaints, the most prominent is the very slight outer wrist discomfort I get from prolonged mousing.
It makes sense that rotating a ball would produce less stress than moving your entire hand/wrist/arm.
Supposedly Kensington is a good brand here.
I’d probably keep my vertical mouse around in case I want to play a video game with shooting mechanics, though.</p>
<h3 id="monitors-2x-lg-24ud58-b-24-4k-ips-monitors">Monitors: 2x LG 24UD58-B 24" 4k IPS monitors</h3>
<p>This (along with the uplift desk) is the only place I really splurged, after reading the post “<a href="https://tonsky.me/blog/monitors/"><em>Time to upgrade your monitor</em></a>” by Nikita Prokopov.
I was really captured by the idea that since 4k monitors are essentially just four 1080p monitors stuck together, using a 4k monitor with 2x scaling is like having a single really, really high quality 1080p monitor.
And it’s true!
I love my 4k monitors.
They really do make everything - text included - look very beautiful.
Maybe even in an enduring way; I’ve had them for a month now and the effect hasn’t worn off yet.
They were $300 each, which was surprisingly inexpensive because 4k has always felt like that new thing that hasn’t quite landed yet, like 3D displays.
Nope, 4k panels are a commodity now!
This also means used 1080p monitors are basically given away for free if you want to live the many-monitor/low-DPI life.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/dual-monitors.jpg"
alt="Wallpaper sourced from here."/><figcaption>
<p>Wallpaper sourced from <a href="https://www.reddit.com/r/wallpapers/comments/711kc9/dual_4k_nebula76802160/">here</a>.</p>
</figcaption>
</figure>
<p>Monitor manufacturers don’t seem to have caught on to the 2x scaling use case.
The <a href="https://www.lg.com/us/monitors/lg-24UD58-B-4k-uhd-led-monitor">LG 24UD58-B</a> was the smallest 4k monitor I could find, at 24".
Everything else was at least 27", which is enormous!
Even 24" is quite big; ideally they’d be around 20", the size of the 1080p monitors in my setup back at the office.
Most progress in the monitor world seems to be along axes relevant for gaming & entertainment rather than productivity (refresh rate, response time, high dynamic range, wide color gamut) but I’m sure a smaller 4k monitor will be released eventually.</p>
<p>I was originally leaning toward buying an ultrawide monitor since they’re essentially just a 16:9 dual-monitor setup without the bezel interrupting the middle - curved, too!
Sadly these all max out at 1440p vertical resolution and are intensely expensive: around $1500.
Maybe in five years I’ll be ready to upgrade my monitor setup again, and a 7680x2160 ultrawide will be on the market for a reasonable price (although <a href="https://www.reddit.com/r/ultrawidemasterrace/comments/d33atd/any_talk_of_7680x2160/">this</a> thread claims such resolutions are beyond DisplayPort’s current capabilities so I’m not optimistic about the last point).</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/jBhQrGXYyw4" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
<p>Some commenters <a href="https://lobste.rs/s/fvnhyd/taking_my_home_work_setup_seriously#c_vdwxl7">mentioned</a> the ergonomic drawbacks of the standard symmetrical dual-monitor setup: your neck always has to be twisted off-center when looking at a screen.
I’ll be experimenting with some different monitor setups to see what works; the idea of putting one monitor flat in the middle and another tilted to the side (even oriented vertically!) sounds like a good place to start.</p>
<h3 id="videoconferencing--laptop-first-gen-microsoft-surface-book--surface-pen">Videoconferencing & Laptop: First-gen Microsoft Surface Book + Surface Pen</h3>
<p>These regularly show up used on Craigslist for $450-$550.
They’re really <a href="https://support.microsoft.com/en-us/help/4488969/surface-book-tech-specs">quite an incredible deal</a>, you get a good-quality 1080p webcam plus a decent microphone & speakers and a built-in <em>digitizer</em>!
You can use the Surface Pen with the digitizer to diagram your ideas in real time for others in the call.
It’s also handy to have this videoconferencing platform in laptop format, so you can choose where to take your call.
Having a laptop in general is also just convenient, verging on necessary.
This one is more than adequate as a mobile dev machine.
Its 3000x2000 screen & tablet mode also makes reading papers & textbooks pleasant, and it’s very useful for marking up & signing PDFs (a greater quality-of-life capability than it sounds!)</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/surface-book.jpg"/>
</figure>
<p>If you don’t have a pre-existing workstation you can also buy a <a href="https://www.microsoft.com/en-us/p/surface-dock/8qrh2npz0s0p">Surface Dock</a> (<em>definitely</em> buy this used) and use the Surface Book as your main dev machine.
I’d look for one of the higher-spec’d Surface Book models if you plan to go this route.
Also, a note on the dock: it was infamously buggy on release, so be sure to <a href="https://support.microsoft.com/en-us/help/4023478/surface-update-your-surface-dock">update its firmware</a> before use.</p>
<h3 id="miscellaneous">Miscellaneous</h3>
<p>For a monitor stand I bought one of the indistinguishable $25-$35ish <a href="https://www.tykesupply.com/Dual_Monitor_Stands-Dual_LCD_Monitor_Stand.html">metal desk clamp monitor stands</a>; the only real requirement I had was it being tall enough to hold the monitors at eye level.
I might swap this out for a pair of single-monitor stands of the same desk clamp design, as the arms of the dual-monitor model push the monitors out a bit further from the wall than I’d like.
But then I’d probably have to buy some longer DisplayPort cables.
Something to let sit for a bit.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/monitor-arms.jpg"/>
</figure>
<p>For my workstation I’m just continuing to use the small form factor machine I built nearly seven years ago driven by an i5-4670k with 16 GB DDR3 RAM.
In fact, if you want to be really frugal, I actually endorse building this PC today!
The i5-4670k can easily be overclocked and the only real constraint is the expense of larger RAM sizes with DDR3.
I saw a very expensive (for the time) motherboard + i5-4670k combo go on eBay for $40 (!!!) not too long ago.
There’s been <a href="https://news.ycombinator.com/item?id=23223147">a lot of talk</a> about how the new AMD Ryzen 5 3600 CPU has incredible price/performance, but it’s basically blown out of the water by buying something used.
Remember single-threaded CPU performance has been <a href="https://www.cpubenchmark.net/year-on-year.html">basically flat for a decade now</a>.</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/workstation.jpg"/>
</figure>
<p>I was lucky enough to get a pair of Bose QC35 noise-canceling bluetooth headphones from work, which I use to listen to music & also as a microphone during calls (although the Surface Book’s microphone array works just fine).
If I didn’t have these I would probably purchase a used pair of wired Sennheiser or Audio-Technica headphones.
Truth be told my QC35s annoyingly drop/cut out regularly despite trying several different USB bluetooth adapters.
Sometimes I miss the wired life.</p>
<p>One last notable quality-of-life purchase was a <a href="https://www.amazon.com/gp/product/B01FWGK8QC">flat cat6 ethernet cable</a> to connect my workstation to the router directly instead of through wi-fi.
It’s surprisingly easy to hide this cable under the baseboard and even run it beneath carpet across door thresholds, if you have some tent poles lying around!</p>
<h1 id="future-improvements">Future Improvements?</h1>
<p>At this point I’ve ticked the box on nearly every ergonomic gadget there is, except a desk treadmill (which I did previously own!) and a keyboard tray.
Keyboard trays, for those not in the know, put your keyboard & mouse on an adjustable tray attached to an arm that retracts along a track on the underside of the desk.
The big buzzword capability here is <em>negative tilt</em>, where you can tilt the tray down away from you to keep your wrists straight without bending your elbows so much.
Unfortunately keyboard trays tend to be (1) <a href="https://www.youtube.com/watch?v=KX5CVnafwcc&t=49s">wobbly as hell</a>, and (2) <a href="https://www.humanscale.com/products/product-buy.cfm?group=KeyboardSystems">expensive as hell</a>.
They also make the underside your desk not so roomy if you’re using it for something other than typing.
There’s a brand called Humanscale that seems sturdier than most, but it’s very expensive (allegedly there are excellent deals on eBay).
Their flagship tray also has a very neat feature where you can elevate your mouse independently of your keyboard, useful for boards like the Goldtouch which raise your hand position as you tent them more.
If you’re looking for the ultimate in adjustability you really can’t beat a keyboard tray, despite their drawbacks.
I’m going to skip out on them for a while; perhaps in a year or two I might look into them again.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/zddoDlxQa-g" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
<p>There’s also an ergonomic issue which doesn’t affect me personally but is so common it bears mention: people whose legs are too short to comfortably reach the ground no matter how they adjust their desk or chair.
The solution here is very simple, and also cheap - buy a footrest!
Stop mucking around trying to find a desk that goes low enough for your needs.
Bring the floor to you!
Alternatively, a keyboard tray would also help with this problem.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/lnnlRKXodQc" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
<h1 id="tallying-the-damages">Tallying the Damages</h1>
<table>
<thead>
<tr>
<th style="text-align:center">Item</th>
<th style="text-align:center">Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center"><a href="https://www.upliftdesk.com/uplift-v2-standing-desk-v2-or-v2-commercial/">Uplift Desk</a></td>
<td style="text-align:center">$835</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://www.haworth.com/na/en/products/stools/zody-1.html">Haworth Zody Chair</a></td>
<td style="text-align:center">$200</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://shop.goldtouch.com/products/goldtouch-v2-adjustable-comfort-keyboard-pc-only">Goldtouch Keyboard</a></td>
<td style="text-align:center">$80</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://www.logitech.com/en-us/product/mx-vertical-ergonomic-mouse">Logitech Vertical Mouse</a></td>
<td style="text-align:center">$85</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://www.lg.com/us/monitors/lg-24UD58-B-4k-uhd-led-monitor">2x LG 4k IPS Monitors</a></td>
<td style="text-align:center">$600</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://support.microsoft.com/en-us/help/4488969/surface-book-tech-specs">Microsoft Surface Book</a></td>
<td style="text-align:center">$450</td>
</tr>
<tr>
<td style="text-align:center"><a href="https://www.tykesupply.com/Dual_Monitor_Stands-Dual_LCD_Monitor_Stand.html">Monitor Stand</a></td>
<td style="text-align:center">$35</td>
</tr>
<tr>
<td style="text-align:center">Miscellaneous*</td>
<td style="text-align:center">$100</td>
</tr>
<tr>
<td style="text-align:center">Total</td>
<td style="text-align:center">$2,385</td>
</tr>
</tbody>
</table>
<p><em>*Display & ethernet cables, cable management supplies, mousepad, etc.</em></p>
<p>It adds up quick.
You could also include the hypothetical cost of recreating my existing workstation with used components, which would add around $300 or so.
We might then subtract the $350 I got from selling my existing desk & monitor on the used market.
I’m currently an independent contractor so I think this can all be deducted on my taxes (pending confirmation by a tax professional) but it still stings a bit.
If I hadn’t splurged on the Uplift desk & 4k monitors I could have cut this down by $1000 or more.
Still, I’m happy with how it all turned out!</p>
<figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/6-complete-home-office.jpg"
alt="Poster from Ars Obscura Bookbinding &amp; Restoration in Seattle."/><figcaption>
<p>Poster from <a href="https://www.arsobscurabookbinding.com/?p=143">Ars Obscura Bookbinding & Restoration</a> in Seattle.</p>
</figcaption>
</figure>
<h1 id="other-reading--watching">Other Reading & Watching</h1>
<ul>
<li><a href="https://news.ycombinator.com/item?id=24169729">Hacker News</a> and <a href="https://lobste.rs/s/fvnhyd/taking_my_home_work_setup_seriously">Lobsters</a> discussions for this post - the latter has some great advice on the drawbacks of dual-monitor setups and how to save money on a motorized adjustable standing desk.</li>
<li><a href="https://www.troyhunt.com/building-the-ultimate-home-office-again/"><em>Building the Ultimate Home Office (Again)</em></a> by Troy Hunt, a somewhat more maximalist approach to this same problem; inspired me to write this! [<a href="https://news.ycombinator.com/item?id=23938124">HN Discussion</a>]</li>
<li><a href="https://www.nealstephenson.com/news/2015/03/09/notes-on-416-days-of-treadmill-desk-usage/"><em>Notes on 416 Days of Treadmill Desk Usage</em></a> by Neal Stephenson (the author), the post that originally inspired my purchase of a desk treadmill five years ago (later sold during a move) and may one day inspire it again. [<a href="https://news.ycombinator.com/item?id=9174746">HN Discussion</a>]</li>
<li><a href="https://tonsky.me/blog/monitors/"><em>Time to upgrade your monitor</em></a> by Nikita Prokopov, this post had quite an effect on me for reasons I’ve been unable to discern. [<a href="https://news.ycombinator.com/item?id=23551983">HN Discussion</a>]</li>
<li><a href="https://www.youtube.com/user/TheErgonomicsGuy"><em>The Ergonomics Guy</em></a>: a very valuable if underproduced YouTube channel run by a physical therapist named Steve Meagher; features general advice, overviews of ergonomic device categories, and specific product reviews.</li>
<li><a href="https://www.youtube.com/user/LinusTechTips"><em>Linus Tech Tips</em></a>: another YouTube channel that reviews many, many pieces of computer equipment with lots of context & detail.</li>
</ul>
<h1 id="bonus-some-home-improvement">Bonus: Some Home Improvement</h1>
<p>Upon moving into this Atlanta apartment (sight unseen after driving from Seattle) I saw the nook I’d planned to use for my home office was occupied by a permanent desk.
Since all the units in the building were being renovated and ours was still in the old style, I got permission from building management to tear out the desk & do with the nook as I pleased.
After consulting my wonderful & talented sister Carolyn Helwer (<a href="https://www.blockinc.ca/carolyn-helwer-bio">who works as an interior designer!</a>) I painted it a nice navy colour ("<a href="https://www.sherwin-williams.com/homeowners/color/find-and-explore-colors/paint-colors-by-family/SW9177-salty-dog">Salty Dog</a>" from Sherwin-Williams) and ended up loving the result.
It took a lot of coats of paint!</p>
<p><figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/1-permanent-desk.jpg"
alt="The desk before being torn out."/><figcaption>
<p>The desk before being torn out.</p>
</figcaption>
</figure>
<figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/2-desk-torn-out.jpg"
alt="Success! Wall required a good bit of drywall mud &amp; sanding."/><figcaption>
<p>Success! Wall required a good bit of drywall mud & sanding.</p>
</figcaption>
</figure>
<figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/3-paint-prep.jpg"
alt="Prepped for paint."/><figcaption>
<p>Prepped for paint.</p>
</figcaption>
</figure>
<figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/4-partially-painted.jpg"
alt="After one coat. It took three coats + primer to get a good solid colour!"/><figcaption>
<p>After one coat. It took three coats + primer to get a good solid colour!</p>
</figcaption>
</figure>
<figure><img src="https://ahelwer.ca/img/ergonomics/home-improvement/5-final-product.jpg"
alt="Proud of the result! Channeled my teenage summer job painting apartment interiors."/><figcaption>
<p>Proud of the result! Channeled my teenage summer job painting apartment interiors.</p>
</figcaption>
</figure>
</p>
Doing a math assignment with the Lean theorem prover
https://ahelwer.ca/post/2020-04-05-lean-assignment/
Sun, 05 Apr 2020 00:00:00 +0000https://ahelwer.ca/post/2020-04-05-lean-assignment/<p>Turn back the clock to 2009: a confused physics major newly infatuated with math and computer science, I enrolled in MATH 273: Numbers and Proofs at the University of Calgary.
This wasn’t my first encounter with mathematical proof; in first-year calculus I’d mastered rote regurgitation of delta-epsilon proofs.
Despite writing out several dozen, their meaning never progressed beyond a sort of incantation I can summon to this day (for every \( \epsilon > 0 \) there exists a \( \delta > 0 \) such that…).
We were told on the first day of MATH 273 that the purpose of proof is to compel belief.
This was a bright start but sadly marked the beginning of a deeply embarrassing semester reaching its nadir when I asked the professor “how do we prove a definition” fully two-thirds of the way through the course.
I got a B-.</p>
<p>My understanding of proofs improved over the course of the next decade, but I don’t think I <em>really</em> got them until encountering the Lean theorem prover this past year.
With Lean you have your hypotheses, you have your proof goal (the thing you’re trying to prove), and you have a set of <em>moves</em> you can make.
It’s very much like a game; in fact the best introduction to Lean is <a href="https://wwwf.imperial.ac.uk/~buzzard/xena/natural_number_game/">The Natural Number Game</a>, which is the best puzzle game of any type I’ve played in years.
Before I learned Lean, writing proofs was like playing chess without knowing that bishops existed or how knights moved and thinking pawns could teleport around the board.
Just knowing the rules, knowing what constitutes a valid move and knowing the space (or at least more of the space) of valid moves, was so powerful for my understanding of how to write a proof.
Writing proofs became like navigating an endlessly fascinating maze, using theorems to hop from place to place until finding the conclusion!</p>
<p>I wanted to re-examine some MATH 273 assignments in light of my newfound proof powers.
One candidate came easily to mind - the closed-form equation for the sum of the first \(n\) natural numbers:</p>
<p>$$
\sum_{k = 0}^n k = \frac{n \cdot (n + 1)}{2}
$$</p>
<p>This seemingly arbitrary equation has an intuitive explanation.
Consider a list of all the numbers in the sum, from \(1\) to \(n\) (exclude 0 because it doesn’t change the result).
Pair up the numbers on the outside edges of the list, moving inward: \(1\) and \(n\), \(2\) and \(n - 1\), \(3\) and \(n - 2\), etc.
In cases where the number in the middle of the list lacks a pair, put it to the side:</p>
<table>
<thead>
<tr>
<th style="text-align:center">\(n\)</th>
<th style="text-align:center">List of Numbers</th>
<th style="text-align:center">Pairs</th>
<th style="text-align:center">Remainder</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">\(0\)</td>
<td style="text-align:center">\(\{\}\)</td>
<td style="text-align:center">\(\{\}\)</td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:center">\(1\)</td>
<td style="text-align:center">\(\{1\}\)</td>
<td style="text-align:center">\(\{\}\)</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">\(2\)</td>
<td style="text-align:center">\(\{1, 2\}\)</td>
<td style="text-align:center">\(\{(1, 2)\}\)</td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:center">\(3\)</td>
<td style="text-align:center">\(\{1, 2, 3\}\)</td>
<td style="text-align:center">\(\{(1, 3)\}\)</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td style="text-align:center">\(4\)</td>
<td style="text-align:center">\(\{1, 2, 3, 4\}\)</td>
<td style="text-align:center">\(\{(1, 4), (2, 3)\}\)</td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:center">\(5\)</td>
<td style="text-align:center">\(\{1, 2, 3, 4, 5\}\)</td>
<td style="text-align:center">\(\{(1, 5), (2, 4)\}\)</td>
<td style="text-align:center">3</td>
</tr>
<tr>
<td style="text-align:center">\(6\)</td>
<td style="text-align:center">\(\{1, 2, 3, 4, 5, 6\}\)</td>
<td style="text-align:center">\(\{(1, 6), (2, 5), (3, 4)\}\)</td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:center">\(7\)</td>
<td style="text-align:center">\(\{1, 2, 3, 4, 5, 6, 7\}\)</td>
<td style="text-align:center">\(\{(1, 7), (2, 6), (3, 5)\}\)</td>
<td style="text-align:center">4</td>
</tr>
<tr>
<td style="text-align:center">\(\ldots\)</td>
<td style="text-align:center">\(\ldots\)</td>
<td style="text-align:center">\(\ldots\)</td>
<td></td>
</tr>
</tbody>
</table>
<p>If you add the numbers in each pair together you’ll always get \(n + 1\), and if \(n\) is even then there are \(n/2\) such pairs - voila!
Things are a bit more complicated when \(n\) is odd; there are \((n - 1)/2\) pairs, each summing to \(n + 1\), with a single remainder \((n + 1)/2\):</p>
<p>$$
\frac{n - 1}{2} \cdot (n + 1) + \frac{n + 1}{2}
$$
$$
\frac{(n - 1) \cdot (n + 1) + (n + 1)}{2}
$$
$$
\frac{((n - 1) + 1) \cdot (n + 1)}{2}
$$
$$
\frac{n \cdot (n + 1)}{2}
$$</p>
<p>This chain of reasoning might compel belief in our soft human brains, but things have changed and that’s no longer sufficient for the turbulent times in which we live.
What would a computer say?</p>
<h1 id="reasoning-about-numbers-like-a-computer">Reasoning about numbers like a computer</h1>
<p>You may or may not know the grand tale of how in the early 20th century logicians such as Bertrand Russell (and many others) fought to put all of mathematics on a solid axiomatic basis.
The idea was to have a fairly small number of simple axioms - things which are just <em>assumed</em> to be true - and build up all of mathematics following logically from this foundation.
This, basically, is how computers understand mathematics.</p>
<figure><img src="https://ahelwer.ca/img/lean-assignment/logicomix.jpg"
alt="If this topic at all interests you, read Logicomix! Seriously, it&rsquo;s great!"/><figcaption>
<p>If this topic at all interests you, read <a href="https://en.wikipedia.org/wiki/Logicomix">Logicomix</a>! Seriously, it’s great!</p>
</figcaption>
</figure>
<p>We don’t need to delve very deeply into this topic beyond explaining how the natural numbers are defined axiomatically (they were actually formalized two centuries earlier and called <a href="https://en.wikipedia.org/wiki/Peano_axioms">The Peano Axioms</a>).
There are nine axioms, but the ones we’re interested in are:</p>
<ol>
<li>There is a natural number, \(0\)</li>
<li>There is a <em>successor function</em>, \(S\), where \(S(n)\) is the number after \(n\) (basically \(n + 1\))</li>
</ol>
<p>In this formalization every natural number is just some number of nested successor functions applied to zero.
For example:</p>
<table>
<thead>
<tr>
<th style="text-align:center">Conventional notation</th>
<th style="text-align:center">Peano notation</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">\(0\)</td>
<td style="text-align:center">\(0\)</td>
</tr>
<tr>
<td style="text-align:center">\(1\)</td>
<td style="text-align:center">\(S(0)\)</td>
</tr>
<tr>
<td style="text-align:center">\(2\)</td>
<td style="text-align:center">\(S(S(0))\)</td>
</tr>
<tr>
<td style="text-align:center">\(3\)</td>
<td style="text-align:center">\(S(S(S(0)))\)</td>
</tr>
<tr>
<td style="text-align:center">\(4\)</td>
<td style="text-align:center">\(S(S(S(S(0))))\)</td>
</tr>
<tr>
<td style="text-align:center">\(\ldots\)</td>
<td style="text-align:center">\(\ldots\)</td>
</tr>
</tbody>
</table>
<p>This is somewhat similar to counting in unary.
We also need to define addition; this is done recursively with a base case and general case:</p>
<ul>
<li>Base case: \(n + 0 = n\)</li>
<li>General case: \(n + S(m) = S(n + m)\)</li>
</ul>
<p>It’s difficult to see how this defines addition; here’s how it works when calculating \(1 + 2\):</p>
<table>
<thead>
<tr>
<th style="text-align:center">Recursion level</th>
<th style="text-align:center">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">\(0\)</td>
<td style="text-align:center">\(S(0) + S(S(0))\)</td>
</tr>
<tr>
<td style="text-align:center">\(1\)</td>
<td style="text-align:center">\(S(S(0) + S(0))\)</td>
</tr>
<tr>
<td style="text-align:center">\(2\)</td>
<td style="text-align:center">\(S(S(S(0) + 0))\)</td>
</tr>
<tr>
<td style="text-align:center">\(3\)</td>
<td style="text-align:center">\(S(S(S(0)))\)</td>
</tr>
</tbody>
</table>
<p>The final ingredient we need is <a href="https://en.wikipedia.org/wiki/Mathematical_induction">mathematical induction</a>, which is a tactic used to prove that a certain proposition \(P\) holds for all natural numbers.
It suffices to show that:</p>
<ul>
<li>Base case: \(P(0)\) is true</li>
<li>Inductive case: assuming \(P(n)\) is true, \(P(S(n))\) is true</li>
</ul>
<p>The usual analogy here is an infinite line of dominos: if you knock down the first domino (show the base case holds) and know that each domino will knock down the domino after it (show the inductive case holds), then you know that all dominos will be knocked down (the proposition will hold for all natural numbers).</p>
<p>With these primitive tools we can use computers to construct many interesting theorems about the natural numbers!</p>
<h1 id="stating-the-problem-in-lean">Stating the problem in Lean</h1>
<p>To follow along in this section, you can use the <a href="https://leanprover-community.github.io/lean-web-editor/">Lean web editor</a> or <a href="https://github.com/leanprover-community/mathlib/blob/master/docs/install/project.md">create a new Lean project on your own computer</a>.</p>
<p>Recall the theorem we want to prove:</p>
<p>$$
\sum_{k = 0}^n k = \frac{n \cdot (n + 1)}{2}
$$</p>
<p>It turns out that integer division is annoyingly difficult to reason about, so let’s rewrite this as follows:</p>
<p>$$
2 \cdot \sum_{k = 0}^n k = n \cdot (n + 1)
$$</p>
<p>The first thing we want to do in Lean is recursively define the sum of the first \(n\) natural numbers (you can think of this as the left-hand-side of the equation):</p>
<pre><code>def sum_of_first_n_nat : ℕ → ℕ
| 0 := 0
| (nat.succ n) := (nat.succ n) + sum_of_first_n_nat n
</code></pre><p>In Lean, the symbol ℕ (for the natural numbers) is expressed with <code>\nat</code>.
The successor function \(S\) is defined as <code>nat.succ</code>.
We’ve defined here a function which takes an \(x \in \mathbb{N}\) and returns the sum of all the natural numbers from \(0\) to \(x\).
The <code>|</code> character pattern-matches on the value of \(x\); the top line is the base case and the bottom line is the general case.
You can test this function as follows (place your cursor at the end of the line to see the result in the Lean goal window):</p>
<pre><code>#eval sum_of_first_n_nat 4
</code></pre><p>Now for the main event!
Let’s write our theorem in Lean, then prove it:</p>
<pre><code>theorem closed_eq_sum_of_first_n_nat (n : ℕ) :
2 * (sum_of_first_n_nat n) = n * (nat.succ n) :=
begin
end
</code></pre><p>Putting your cursor between the <code>begin</code> and <code>end</code> markers will bring up the current proof goal in the Lean goal window.</p>
<h2 id="the-base-case">The base case</h2>
<p>The <code>⊢</code> character denotes the thing we are currently trying to prove; we’re going to prove it inductively. Type:</p>
<pre><code>induction n with d hd,
</code></pre><p>The Lean proof window will now show the following two goals, the base case and the inductive case:</p>
<pre><code>2 goals
case nat.zero
⊢ 2 * sum_of_first_n_nat 0 = 0 * 1
case nat.succ
d : ℕ,
hd : 2 * sum_of_first_n_nat d = d * nat.succ d
⊢ 2 * sum_of_first_n_nat (nat.succ d) = nat.succ d * nat.succ (nat.succ d)
</code></pre><p>We will now make use of the workhorse rewrite tactic, <code>rw</code>.
This tactic modifies our proof goal according to an existing definition or equation.
Here, we can rewrite <code>sum_of_first_n_nat 0</code> to its value by definition (<code>0</code>):</p>
<pre><code>rw sum_of_first_n_nat,
</code></pre><p>By default Lean applies all statements to the topmost proof goal, so our base case will be rewritten as:</p>
<pre><code>case nat.zero
⊢ 2 * 0 = 0 * 1
</code></pre><p>We can now make use of the basic theorems <code>nat.mul_zero</code>, which says <code>n * 0 = 0</code>, and <code>nat.zero_mul</code>, which says <code>0 * n = 0</code>:</p>
<pre><code>rw nat.mul_zero,
rw nat.zero_mul,
</code></pre><p>This proves the base case!</p>
<h2 id="the-inductive-case">The inductive case</h2>
<p>Only the inductive case remains:</p>
<pre><code>case nat.succ
d : ℕ,
hd : 2 * sum_of_first_n_nat d = d * nat.succ d
⊢ 2 * sum_of_first_n_nat (nat.succ d) = nat.succ d * nat.succ (nat.succ d)
</code></pre><p>It’s worth taking some time to break down what everything means here.
<code>d</code> and <code>hd</code> are our hypotheses: things which we are assuming for the purpose of proving the conclusion, denoted by <code>⊢</code>.
We can use our hypotheses in rewrites of the conclusion.
Similar to the base case, we’ll first rewrite this expression according to the definition of the general case of <code>sum_of_first_n_nat</code>:</p>
<pre><code>rw sum_of_first_n_nat,
</code></pre><p>This transforms our proof goal to:</p>
<pre><code>d : ℕ,
hd : 2 * sum_of_first_n_nat d = d * nat.succ d
⊢ 2 * (nat.succ d + sum_of_first_n_nat d) = nat.succ d * nat.succ (nat.succ d)
</code></pre><p>We now want to multiply out <code>2</code>, with the aim of being able to rewrite our proof goal with our inductive hypothesis <code>hd</code>.
Use the <code>nat.left_distrib</code> theorem:</p>
<pre><code>rw nat.left_distrib,
</code></pre><p>Now we have:</p>
<pre><code>d : ℕ,
hd : 2 * sum_of_first_n_nat d = d * nat.succ d
⊢ 2 * nat.succ d + 2 * sum_of_first_n_nat d = nat.succ d * nat.succ (nat.succ d)
</code></pre><p>We can now rewrite <code>2 * sum_of_first_n_nat d</code> in the proof goal using our inductive hypothesis <code>hd</code>!</p>
<pre><code>rw hd,
</code></pre><p>to get:</p>
<pre><code>⊢ 2 * nat.succ d + d * nat.succ d = nat.succ d * nat.succ (nat.succ d)
</code></pre><p>At this point, these are clearly equal; we just need some simple algebraic manipulations to prove it.
To save space I’ll do them all in one shot, with comments.</p>
<pre><code>--rewrites nat.succ n to n + 1:
rw nat.succ_eq_add_one,
--rewrites nat.succ (n + m) to n + nat.succ m (from defn of addition)
--note rw usually rewrites from left to right over an equality; ← (\l) does right to left
rw ← nat.add_succ,
--rewrites nat.succ 1 to 2 (for clarity)
rw (show nat.succ 1 = 2, by refl),
--multiplies out d + 1
rw left_distrib (d + 1) d 2,
--moving things around with commutativity
rw mul_comm 2 (d + 1),
rw mul_comm d (d + 1),
rw add_comm,
</code></pre><p>And we’re done!
Our proof is formally verified in Lean!
In case you lost track at some point, <a href="https://leanprover-community.github.io/lean-web-editor/#code=import%20tactic%0A%0Adef%20sum_of_first_n_nat%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%95%0A%7C%200%20%3A%3D%200%0A%7C%20%28nat.succ%20n%29%20%3A%3D%20%28nat.succ%20n%29%20%2B%20sum_of_first_n_nat%20n%0A%0A%23eval%20sum_of_first_n_nat%204%0A%0Atheorem%20closed_eq_sum_of_first_n_nat%20%28n%20%3A%20%E2%84%95%29%20%3A%0A%20%20%20%202%20*%20%28sum_of_first_n_nat%20n%29%20%3D%20n%20*%20%28nat.succ%20n%29%20%3A%3D%0Abegin%0Ainduction%20n%20with%20d%20hd%2C%0A%20%20rw%20sum_of_first_n_nat%2C%0A%20%20rw%20nat.mul_zero%2C%0A%20%20rw%20nat.zero_mul%2C%0Arw%20sum_of_first_n_nat%2C%0Arw%20nat.left_distrib%2C%0Arw%20hd%2C%0A--rewrites%20nat.succ%20n%20to%20n%20%2B%201%3A%0Arw%20nat.succ_eq_add_one%2C%0A--rewrites%20nat.succ%20%28n%20%2B%20m%29%20to%20n%20%2B%20nat.succ%20m%20%28from%20defn%20of%20addition%29%0A--note%20rw%20usually%20rewrites%20from%20left%20to%20right%20over%20an%20equality%3B%20%E2%86%90%20%28%5Cl%29%20does%20right%20to%20left%0Arw%20%E2%86%90%20nat.add_succ%2C%0A--rewrites%20nat.succ%201%20to%202%0Arw%20%28show%20nat.succ%201%20%3D%202%2C%20by%20refl%29%2C%0A--multiplies%20out%20d%20%2B%201%0Arw%20left_distrib%20%28d%20%2B%201%29%20d%202%2C%0A--moving%20things%20around%20with%20commutativity%0Arw%20mul_comm%202%20%28d%20%2B%201%29%2C%0Arw%20mul_comm%20d%20%28d%20%2B%201%29%2C%0Arw%20add_comm%2C%0Aend%0A%0Atheorem%20closed_eq_sum_of_first_n_nat_with_ring%20%28n%20%3A%20%E2%84%95%29%20%3A%0A%20%20%20%202%20*%20%28sum_of_first_n_nat%20n%29%20%3D%20n%20*%20%28nat.succ%20n%29%20%3A%3D%0Abegin%0Ainduction%20n%20with%20d%20hd%2C%0A%20%20rw%20sum_of_first_n_nat%2C%0A%20%20ring%2C%0Arw%20sum_of_first_n_nat%2C%0Arw%20nat.left_distrib%2C%0Arw%20hd%2C%0Aring%2C%0Aend">here</a> is a link to the Lean web editor with the full proof (move your cursor to the end of each line to see the proof state at that position).
Now, some of you might be rolling your eyes at the last section.
Do we really want to be wasting all this time multiplying things out line by line like we’re in grade school?
Fortunately Lean provides some more advanced tactics which take care of the busywork.
The entire final sequence of commands after <code>rw hd</code> can be replaced with just one tactic: <code>ring</code>.
This tells Lean to intelligently search through the <a href="https://en.wikipedia.org/wiki/Ring_(mathematics)">basic equalities underpinning the natural numbers</a> in an attempt to make both sides of the equation equal each other.
It’s part of the community-maintained MathLib, and can be accessed by putting <code>import tactic</code> at the top of your <code>.lean</code> file. This extensibility with powerful tactics is one of Lean’s great features.</p>
<h1 id="in-the-end">In the end</h1>
<p>If this exercise appealed to you, you’ll greatly enjoy <a href="https://wwwf.imperial.ac.uk/~buzzard/xena/natural_number_game/">The Natural Number Game</a>, a gamified introduction to Lean.
I myself first heard about Lean from this excellent lecture by Dr. Kevin Buzzard of Imperial College London:
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe src="https://www.youtube-nocookie.com/embed/Dp-mQ3HxgDE" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe>
</div>
You can see the <a href="https://news.ycombinator.com/item?id=21200721">Hacker News comments</a> and <a href="https://lobste.rs/s/5vdr9e/future_mathematics">Lobste.rs comments</a> for interesting discussion.</p>
<p>I believe learning Lean has brought great clarity to my understanding of mathematical proofs.
In a way it’s like the perfect scratch pad, focusing your mind on the goal while keeping track of all your assumptions and checking your thinking.
Maybe I would’ve had an easier time learning proofs if I’d used Lean in the first place, although I have no way of knowing.</p>
<p>For now, I’m hoping to use Lean to formalize some results in quantum information processing.
Maybe once coverage of existing theorems becomes great enough, my work will be of use to real QIP theoreticians.
The idea of Lean being the basis for a mathematical search engine is also particularly interesting.
A nice way for people who know more about computer science than math, like myself, to play a small part in the development of the field.</p>
Simulating physical reality with a quantum computer
https://ahelwer.ca/post/2019-12-21-quantum-chemistry/
Sat, 21 Dec 2019 00:00:00 +0000https://ahelwer.ca/post/2019-12-21-quantum-chemistry/<h2 id="quantum-computers-not-just-for-breaking-rsa">Quantum Computers: Not Just for Breaking RSA</h2>
<p>There’s no denying it, Shor’s algorithm was a blockbuster result.
The thought of an exotic new computer breaking all widely-used public-key crypto plays well with the public imagination, and so you’d be forgiven for believing quantum computing is ultimately a sort of billions-dollar make-work project for software engineers: forcing our profession to relive a Y2K-like mass upgrade of old systems to new, <a href="https://blog.cloudflare.com/the-tls-post-quantum-experiment/">quantum-safe encryption algorithms</a>.
Great for consultants, bad for people who want resources invested toward actual social good.</p>
<p>In reality, breaking public-key crypto - while <em>interesting</em> - is more of an unfortunate side-effect than quantum computing’s <em>raison d’être</em>.
The original motivation for building a quantum computer was simulating quantum-mechanical systems: particles, atoms, molecules, proteins.
Simulating quantum-mechanical systems with a classical computer requires resources scaling exponentially with the size of the system being simulated.
By contrast, quantum computers get a “native” speedup: the parts of the problem requiring exponential classical resources are things that quantum computers get for free.
Classically intractable simulations move within reach, hopefully revolutionizing material design, pharmaceuticals, and even high-temperature superconductors.</p>
<p>Simulating physical reality with a quantum computer is obviously appealing on several levels, but how is it actually done?
Let’s be clear from the outset: in order to understand how physics is simulated, you must understand the physics being simulated.
This post isn’t the place to learn quantum mechanics, so all I will do is set up an extremely simple quantum system, explain how it works in a very “spherical chickens in a vacuum” type way, and walk through how we’d describe that system to a quantum computer.</p>
<h2 id="the-problem">The Problem</h2>
<p>All simulations, quantum or classical, follow the same general structure:</p>
<ol>
<li>A physical system of interest is described, usually in some kind of modeling program</li>
<li>The system’s initial state is described, usually also in the same program</li>
<li>The system description and initial state are compiled into a form understood by the simulation program</li>
<li>The simulation program is executed, usually to derive the state of the system (within a margin of error) after some period of time has passed</li>
<li>The simulation program’s output state is translated back to the modeling program, or some other such human-usable form</li>
</ol>
<p>The goal is to have a correspondence like this:
<img src="https://ahelwer.ca/img/hamsim/simulation.PNG" alt=""></p>
<p>When we simulate quantum-mechanical systems on a quantum computer, we call this <em>Hamiltonian Simulation</em>.</p>
<h3 id="describing-the-system">Describing the system</h3>
<p>Consider a very simple quantum-mechanical system: a single electron sitting in an infinite uniform magnetic field.
Electrons have a property called <a href="https://en.wikipedia.org/wiki/Spin_(physics)">spin</a>, which is <em>very loosely</em> analogous to the spin of a tennis ball (although the particle is not literally spinning in a physical sense): when the electron is fired through a magnetic field, it will curve in a direction corresponding to its spin (spin-up or spin-down) as demonstrated in the famous <a href="https://en.wikipedia.org/wiki/Stern%E2%80%93Gerlach_experiment">Stern-Gerlach experiment</a>.
Note this phenomenon is different from the <a href="https://en.wikipedia.org/wiki/Lorentz_force">Lorentz force</a> which affects all charged particles moving through an electromagnetic field:</p>
<p><img src="https://ahelwer.ca/img/hamsim/cyclotron.jpg" alt="">Cyclotron demonstrating Lorentz force (source: <a href="https://commons.wikimedia.org/wiki/File:Cyclotron_motion_wider_view.jpg">Wikimedia Commons</a>)</p>
<p>The magnetic field does more than just alter the electron’s path through space: it also affects the spin itself!
We want to describe how the electron spin changes over time in a magnetic field; how?
Physicists use something called a <em>Hamiltonian</em> to describe these systems.
A Hamiltonian, in the context of quantum mechanics, is a matrix; for example:</p>
<p>$$
H =
\begin{bmatrix}
0 & 1 \\
1 & 0
\end{bmatrix}
$$</p>
<p>The Hamiltonian of a system describes its energy.
How exactly does a matrix encode the energy of a system?
Sadly, that’s outside the purview of this post; you’ll just have to accept that it does.
It can be very complicated to derive the Hamiltonian for a given system, but luckily in our particular case the Hamiltonian is quite famous!
Depending on the direction of the magnetic field, our Hamiltonian is one of the <a href="https://en.wikipedia.org/wiki/Pauli_matrices">Pauli spin operators</a>:</p>
<p>$$
\sigma_x =
\begin{bmatrix}
0 & 1 \\
1 & 0
\end{bmatrix},
\sigma_y =
\begin{bmatrix}
0 & -i \\
i & 0
\end{bmatrix},
\sigma_z =
\begin{bmatrix}
1 & 0 \\
0 & -1
\end{bmatrix}
$$</p>
<p>If the magnetic field points along the \( x \) direction the Hamiltonian is \( \sigma_x \), along the \( y \) direction it’s \( \sigma_y \), and along the \( z \) direction it’s \( \sigma_z \) (multiplied by the magnetic field strength, which we will ignore here for simplicity).
Let’s say our magnetic field points in the \( z \) direction, so our Hamiltonian is \( \sigma_z \).
With this Hamiltonian, we have a complete description of how our magnetic field affects the electron’s spin over time.</p>
<p>Incidentally, rotating particle spins with a magnetic field is the mechanism behind <a href="https://en.wikipedia.org/wiki/Magnetic_resonance_imaging">Magnetic Resonance Imaging</a> (MRI).</p>
<h3 id="describing-the-initial-state">Describing the initial state</h3>
<p>Like any property of the quantum realm, spin isn’t limited to concrete values like up & down.
Particles can also be in a <em>superposition</em> of spin-up and spin-down.
Let’s cut past the handwaviness here; superposition has a precise mathematical definition, and that’s <em>linear combination</em>.
The state of our quantum system is expressed as a vector.
A spin-up particle has this state (the \( |\uparrow\rangle \) is using something called <a href="https://en.wikipedia.org/wiki/Bra%E2%80%93ket_notation">bra-ket notation</a>):
$$
|\uparrow \rangle =
\begin{bmatrix}
1 \\ 0
\end{bmatrix}
$$</p>
<p>A spin-down particle has this state:
$$
|\downarrow \rangle =
\begin{bmatrix}
0 \\ 1
\end{bmatrix}
$$</p>
<p>A particle in exactly equal superposition of spin-up and spin-down has this state (by convention called the \(|+\rangle\) state):
$$
|+\rangle =
\frac{1}{\sqrt{2}} \cdot
\begin{bmatrix}
1 \\ 0
\end{bmatrix}
+
\frac{1}{\sqrt{2}} \cdot
\begin{bmatrix}
0 \\ 1
\end{bmatrix} =
\begin{bmatrix}
\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}
\end{bmatrix}
$$</p>
<p>The values in this vector are Complex numbers called <em>amplitudes</em>.
The sum of the squares of the absolute values of the vector values must equal one:
$$
\begin{bmatrix}
a \\ b
\end{bmatrix} : a,b \in \mathbb{C},
||a||^2 + ||b||^2 = 1
$$</p>
<p>When we measure the spin of our particle (for example by firing it through the Stern-Gerlach experiment mechanism), it <em>collapses</em> probabilistically to spin-up or spin-down.
The probability of collapsing to spin-up is given by the absolute-value-squared of the top vector value, and the probability of collapsing to spin-down is given by the absolute-value-squared of the bottom vector value.
So, a particle in the \(|+\rangle\) state has a 50/50 chance of collapsing to spin-up or spin-down:</p>
<p>$$
|+\rangle =
\begin{bmatrix}
\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}
\end{bmatrix}
$$</p>
<p>For our one-electron system, let’s say our particle starts out in the spin-up state:
$$
|\uparrow \rangle =
\begin{bmatrix}
1 \\ 0
\end{bmatrix}
$$</p>
<p>A particle in a magnetic field will have its spin rotated around the state space, through different superpositions of spin-up and spin-down.
Our goal is to track how our particle’s spin changes over time.</p>
<h3 id="compiling-to-the-quantum-computer">Compiling to the quantum computer</h3>
<p>We have our Hamiltonian and initial state vector, but how do we translate them into a form understood by a quantum computer?
This is really the most complicated step of all; we want to take our Hamiltonian matrix and compile it into a series of quantum logic gates.</p>
<p><em>If you don’t know much about quantum logic gates or how a quantum computer works in general, I gave a talk aimed at computer scientists here (slides: <a href="https://ahelwer.ca/files/qc-for-cs.pdf">pdf</a>, <a href="https://ahelwer.ca/files/qc-for-cs.pptx">pptx</a>):</em></p>
<figure><a href="https://youtu.be/F_Riqjdh2oM" target="_blank"><img src="https://ahelwer.ca/img/common/quantum-video-preview.png"/></a>
</figure>
<p>First, the good news: no compilation is necessary for our state vector.
It works as-is.
Things are quite a bit different for our Hamiltonian, however.
We compile it by solving for \( U(t) \) in the following <a href="https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics)#Schr%C3%B6dinger_equation">variant of the Schrödinger equation</a>:
$$
U(t) = e^{-i H t}
$$
Where:</p>
<ul>
<li>\( U(t) \) is a matrix that takes time \( t \) as a parameter</li>
<li>\( e \) is <a href="https://en.wikipedia.org/wiki/E_(mathematical_constant)">Euler’s number</a></li>
<li>\( i \) is the imaginary number such that \( i^2 = -1 \)</li>
<li>\( H \) is our Hamiltonian</li>
<li>\( t \) is time (in seconds), a variable</li>
</ul>
<p>There’s one very strange thing about this equation: it has a matrix as an exponent!
What does that even mean?!
How can you raise something to the power of a matrix?
You can’t, at least not directly - you have to use an identity for \( e^x \) involving the <a href="https://en.wikipedia.org/wiki/Exponential_function#Computation">Taylor series</a>:</p>
<p>$$
e^x = \sum_{n=0}^{n=\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots
$$</p>
<p>Plugging our Hamiltonian \( H \) into that identity moves it out of the exponent and into the (somewhat) more comfortable waters of an infinite sum:</p>
<p>$$
U(t) = e^{-iHt} =
\sum_{n=0}^{n=\infty} \frac{(-iHt)^n}{n!} =
1 + (-iHt) + \frac{(-iHt)^2}{2!} + \frac{(-iHt)^3}{3!} + \ldots
$$</p>
<p>which expands to:</p>
<p>$$
U(t) =
1 - i(Ht) - \frac{(Ht)^2}{2!} + \frac{i(Ht)^3}{3!} + \frac{(Ht)^4}{4!} - \frac{i(Ht)^5}{5!} - \frac{(Ht)^6}{6!} \ldots
$$</p>
<p>Note that raising \(-1\) and \(i\) to successive powers follows the cycle \(1 \rightarrow -i \rightarrow -1 \rightarrow i \rightarrow 1\).
Note also that for \(H = \sigma_z\), \(H^2 = \mathbb{I}\), the identity matrix, which is equivalent to the scalar \(1\).
Thus the above series becomes:</p>
<p>$$
U(t) =
1 - iHt - \frac{t^2}{2!} + \frac{iHt^3}{3!} + \frac{t^4}{4!} - \frac{iHt^5}{5!} - \frac{t^6}{6!} \ldots
$$</p>
<p>To translate this into something we can use, we need the <a href="https://en.wikipedia.org/wiki/Trigonometric_functions#Power_series_expansion">Taylor series expansions for sine and cosine</a>:</p>
<p>$$
\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \ldots
$$</p>
<p>$$
\cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \ldots
$$</p>
<p>You can then extract \(\cos(x)\) from the \(U(t)\) series directly:</p>
<p>$$
U(t) = \cos(t) -iHt + \frac{iHt^3}{3!} - \frac{iHt^5}{5!} + \frac{iHt^7}{7!} + \ldots
$$</p>
<p>From which it’s easy to extract \(\sin(x)\):</p>
<p>$$
U(t) = \cos(t) - iH \left( t - \frac{t^3}{3!} + \frac{t^5}{5!} - \frac{t^7}{7!} + \ldots \right) = \cos(t) - iH\sin(t)
$$</p>
<p>Almost there! Now multiply both sides of the equation by the identity matrix and expand \(H\) to get:</p>
<p>$$
\begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}U(t) = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix} \left( \cos(t) - i \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\sin(t) \right)
$$
$$
U(t) =
\begin{bmatrix}\cos(t) & 0 \\ 0 & \cos(t)\end{bmatrix} - \begin{bmatrix} i\sin(t) & 0 \\ 0 & -i\sin(t) \end{bmatrix}
$$
$$
U(t) = \begin{bmatrix}\cos(t) - i\sin(t) & 0 \\ 0 & \cos(t) + i\sin(t)\end{bmatrix}
$$</p>
<p>Recall the famous identity for Euler’s formula:</p>
<p>$$e^{ix} = \cos(x) + i\sin(x)$$</p>
<p>Plugging it in to our matrix, we finally have:</p>
<p>$$U(t) = e^{-iHt} = \begin{bmatrix} e^{-it} & 0 \\ 0 & e^{it} \end{bmatrix}$$</p>
<p>We successfully solved for \(U(t)\)!
I wish we could say we are done, but not quite.
\(U(t)\) must be the cumulative effect of our quantum logic circuit, but how do we actually build a circuit with that effect?
Fortunately here it’s very simple; there’s a fundamental quantum logic gate called the <a href="https://docs.microsoft.com/en-us/qsharp/api/qsharp/microsoft.quantum.intrinsic.rz?view=qsharp-preview">Rz gate</a> which almost exactly matches our \(U(t)\)! Here it is:</p>
<p>$$
Rz(\theta) =
\begin{bmatrix}
e^{-i \theta/2} & 0 \\
0 & e^{i \theta/2}
\end{bmatrix}
$$</p>
<p>At long last, we end up with our compiled Hamiltonian!</p>
<p>$$
U(t) = Rz(2t)
$$</p>
<p>Can you compile the Hamiltonian when it’s \(\sigma_x\) or \(\sigma_y\)?</p>
<h3 id="running-the-simulation">Running the simulation</h3>
<p>All the pieces are in place.
Flip the switch and let it rip!
Let’s see what our electron’s spin will be after, say, three seconds in the magnetic field:</p>
<p>$$
U(3)\begin{bmatrix} 1 \\ 0 \end{bmatrix} =
Rz(2 \cdot 3)\begin{bmatrix} 1 \\ 0 \end{bmatrix} =
\begin{bmatrix}
e^{-i 6/2} & 0 \\
0 & e^{i 6/2}
\end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} =
e^{-i3}\begin{bmatrix} 1 \\ 0 \end{bmatrix}
$$</p>
<p>Very exciting! I guess! What does this even mean?
Actually, we’ve stumbled upon a very interesting phenomenon; to understand it, you’ll need to understand <em>phase invariance</em>.
Basically, if two quantum states differ only by a <em>phase</em> (a scalar multiplier \(e^{i\theta}\)) then they are actually the exact same state.
Multiple ways of writing the same thing.
So, after three seconds in the magnetic field, our electron’s spin is <em>unchanged</em>.
You may think this is a letdown, but actually we’ve discovered something very important: an eigenstate of our system!</p>
<p>Eigenstates are the system’s stable states.
As an analogy, think of a compass in Earth’s magnetic field.
The compass spins freely until pointing toward magnetic north.
Once it points north, it’s in an eigenstate and won’t spin any further.
That’s the situation here, except our “compass” (the electron’s spin) was already pointing “north”.
Eigenstates are so-called because the state is an <em>eigenvector</em> of the Hamiltonian: multiplying the eigenvector by the Hamiltonian matrix is the same as just multiplying it by a scalar.
We have the useful and important theorem that if something is an eigenvector of our Hamiltonian \(H\), it’s also an eigenvector of \(U(t) = e^{-iHt}\) (and vice-versa).</p>
<p>Eigenstates are very important.
An electron in a specific atomic orbital is in an eigenstate.
A fully-folded protein is in an eigenstate.
Finding eigenstates is one of the main motivations for using Hamiltonian simulation.
We found an eigenstate by accident, but there are algorithms you can use to search for them on purpose.
Can you find a second eigenstate?</p>
<p>Enough about that - what about if we run our simulation on a start state that isn’t an eigenstate?
How about running it for \(\pi/2\) seconds on our 50/50 up-down superposition state?</p>
<p>$$
Rz(2 \pi/2)\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix}
e^{-i \pi/2} & 0 \\
0 & e^{i \pi/2}
\end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix}
-i & 0 \\
0 & i
\end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
-i \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \end{bmatrix}
$$</p>
<p>So our particle’s spin changes from the \(|+\rangle\) state to what’s called the \(|-\rangle\) state, picking up a global phase along the way. Fascinating!
Of course, in reality you can’t just peek at the quantum state vector; you must destructively measure it.
You can run this over and over to get a pretty good idea of the vector’s value, through a process called <a href="https://en.wikipedia.org/wiki/Quantum_tomography">quantum tomography</a>.</p>
<h2 id="upping-the-ante">Upping the ante</h2>
<p>Puzzled readers might be wondering where the promised quantum speedup comes in.
It appears when the system you’re analyzing grows in size: if you’re analyzing a system with \(n\) properties (for example, \(n\) electrons each with their own spin) then the state vector is of size \(2^n\).
If the properties in your system don’t affect each other then you can usually factor the exponential-sized state vector into a product of smaller vectors, but once the properties start affecting one another (entanglement) you have to bite the bullet and keep the full \(2^n\)-sized vector around.
Quantum computers get this \(2^n\)-sized vector for free, simply because of how quantum mechanics works.</p>
<h2 id="how-it-works-in-reality">How it works in reality</h2>
<p>Nobody ever stands at a chalkboard deriving the Hamiltonian of their system, outside of undergraduate physics classes.
The most complicated Hamiltonian you can derive by hand is probably that of elemental Hydrogen, so this is all done numerically; scientists create a model of their system in a program like <a href="https://nwchemgit.github.io/">NWChem</a>, then it sucks up a whole bunch of (classical) computing power before spitting out the Hamiltonian.
Similarly, nobody stands around writing down Taylor series expansions and doing weird algebraic tricks to compile their Hamiltonian; there are (very complicated) compilation methods that work given certain properties about your Hamiltonian (for example using <a href="https://docs.microsoft.com/en-us/qsharp/api/qsharp/microsoft.quantum.intrinsic.exp?view=qsharp-preview">Q#’s Exp operator</a>).
Both the process of deriving and compiling a system Hamiltonian are fast-moving, active areas of research.</p>
<h2 id="in-conclusion">In conclusion</h2>
<p>True credit to you if you made it this far, reader.
That was a whole lot of math just to get the most absolutely basic, hello-world type example out the door.
If you’re interested in learning more on this topic here are some resources I’ve found:</p>
<ul>
<li>The upcoming textbook <a href="https://www.manning.com/books/learn-quantum-computing-with-python-and-q-sharp"><em>Learn Quantum Computing with Python and Q#</em></a> by Sarah C. Kaiser and Christopher E. Granade has an entire chapter introducing quantum chemistry</li>
<li><a href="https://youtu.be/PerdRJ-offU">A talk</a> by Robin Kothari on Hamiltonian simulation from a pure computer science perspective</li>
<li>The review paper <a href="https://arxiv.org/abs/1808.10402"><em>Quantum computational chemistry</em></a> by McArdle et al. offers a significantly more technical overview of the material</li>
<li>Chris Kang has written <a href="https://christopherkang.me/blog/2020/12/24/qsharp-advent-calendar/">a post</a> examining how the Hamiltonian is itself derived, plus more detail on compiling the Hamiltonian into a series of gates</li>
</ul>
<p><em>Posted for the <a href="https://devblogs.microsoft.com/qsharp/q-advent-calendar-2019/">Q# Advent Calendar 2019</a></em></p>
<p><em>Credit to <a href="https://phys.washington.edu/people/nathan-wiebe">Nathan Wiebe</a> and <a href="https://www.cgranade.com/">Chris Granade</a> for patiently helping me with the basics of Hamiltonian Simulation.</em></p>
Walking the faster-than-light tightrope
https://ahelwer.ca/post/2018-12-07-chsh/
Fri, 07 Dec 2018 00:00:00 +0000https://ahelwer.ca/post/2018-12-07-chsh/<h2 id="measurement-and-signaling-in-the-nonlocal-world">Measurement and signaling in the nonlocal world</h2>
<p>Popular understanding of quantum mechanics usually focuses on three learning objectives:</p>
<ol>
<li>At small scales, particle properties (position, momentum, spin, etc.) are in <em>superposition</em> - they don’t have a definite value, but instead are “smeared” across multiple possible values.</li>
<li><em>Measuring</em> a superposed particle property makes it <em>collapse</em> probabilistically to a specific value. We don’t simply discover the property’s pre-existing value; rather the property is forced to take on a definite value by the act of measurement.</li>
<li>Particles can be <em>entangled</em>, which means operations on one affect the other instantaneously across arbitrarily-large distances (known as <em>nonlocality</em>); however, this has restrictions and cannot be used for faster-than-light (FTL) communication.</li>
</ol>
<figure><img src="https://ahelwer.ca/img/chsh/mob.jpeg"
alt="Enraged Bohmian Mechanics enthusiasts approach the comment section (Source: Wellcome Collection)"/><figcaption>
<p>Enraged Bohmian Mechanics enthusiasts approach the comment section (Source: <a href="https://wellcomecollection.org/">Wellcome Collection</a>)</p>
</figcaption>
</figure>
<p>This post focuses on the third point, specifically the part about FTL communication.
There’s something called the “no-communication theorem” or “no-signaling principle” which shows that it’s impossible for us to use a pair of entangled particles as a FTL communication channel (much to the chagrin of many, many works of science fiction).
Let’s be more precise: <em>communication</em> is a technical term which means I have some chosen bit (0 or 1) I can send to my counterpart on the other end of a channel.
The channel doesn’t have to be perfect: all that matters is the receiver can discern the sent bit with probability better than a coin flip.
A channel which enables the receiver to determine the correct bit only 51% of the time still involves communication, since we can re-send the same bit arbitrarily-many times to establish high levels of certainty of which bit was sent.
The no-communication theorem says entangled particles, while indeed affecting one another in a FTL way, can never beat a coin flip when determining which bit was sent.</p>
<p>The no-communication theorem is obviously disheartening, so let’s stay in the Denial stage for a bit and poke around.
Sending a full bit via a pair of entangled particles is, like many things humans want, too much to ask of the universe.
Maybe there’s a hidden consolation prize, though?
Clearly <em>some kind</em> of FTL interaction is taking place; surely it isn’t completely useless?
Generations of clever scientists have considered this problem, and come up with very interesting scenarios where entanglement gives us FTL… coordination?
Correlation?
It’s difficult describe a phenomenon which isn’t communication, but it is <em>something</em>.
Here we’ll learn just what that <em>something</em> is.</p>
<h2 id="preliminaries-entanglement-and-measurement">Preliminaries: entanglement and measurement</h2>
<p><em>The rest of this post assumes basic familiarity with the bra-ket mathematical formalism of quantum computing; if you do not have this, you can watch a lecture I’ve created aimed at computer scientists here (slides: <a href="https://ahelwer.ca/files/qc-for-cs.pdf">pdf</a>, <a href="https://ahelwer.ca/files/qc-for-cs.pptx">pptx</a>):</em></p>
<figure><a href="https://youtu.be/F_Riqjdh2oM" target="_blank"><img src="https://ahelwer.ca/img/common/quantum-video-preview.png"/></a>
</figure>
<p>Review of the actual vector values of the common quantum states \(|0\rangle\), \(|1\rangle\), \(|+\rangle\), and \(|-\rangle\):
$$
|0\rangle = \begin{bmatrix} 1 \\ 0 \end{bmatrix},
|1\rangle = \begin{bmatrix} 0 \\ 1 \end{bmatrix},
|+\rangle = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix},
|-\rangle = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \end{bmatrix}
$$</p>
<p>Here we’ll establish some knowledge required for the main event.
Recall two qbits are entangled when you cannot factor their product state into the tensor product of two individual qbit states:</p>
<p>$$
|\Phi^+\rangle =
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ 0 \\ 0 \\ \frac{1}{\sqrt{2}} \end{bmatrix} \neq
\begin{bmatrix} a \\ b \end{bmatrix} \otimes \begin{bmatrix} c \\ d \end{bmatrix}
$$
Try to factor this! You can’t; you get a system of four equations with no solution.</p>
<p>If you were to measure this state in the computational basis (collapsing each qbit to \(|0\rangle\) or \(|1\rangle\), it will only ever collapse to \(|00\rangle\) or \(|11\rangle\) with equal probability.
If you measure one of the qbits before the other, it <em>instantaneously</em> forces the other qbit into the same state as the first.
So if you measure one qbit and it collapses to \(|0\rangle\), you know the other qbit also collapsed to \(|0\rangle\).</p>
<p>The computational basis is not the only way of measuring qbits; you can also measure in another basis called the <em>sign basis</em>:</p>
<figure><img src="https://ahelwer.ca/img/chsh/1-classic-and-sign-basis.svg"
alt="The computational basis on the left, and the sign basis on the right. A qbit (itself a vector on the unit circle) measured in these bases collapses probabilistically to one of the basis vectors. This is the unit circle, with top element of the 2-vector as the x-coordinate and bottom element of the 2-vector as the y-coordinate." width="10000"/><figcaption>
<p>The computational basis on the left, and the sign basis on the right. A qbit (itself a vector on the unit circle) measured in these bases collapses probabilistically to one of the basis vectors. This is the unit circle, with top element of the 2-vector as the x-coordinate and bottom element of the 2-vector as the y-coordinate.</p>
</figcaption>
</figure>
<p>After measuring in the sign basis, your qbit will be in state \(|+\rangle\) or \(|-\rangle\) instead of \(|0\rangle\) or \(|1\rangle\).
These types of measurements are called <em>projective measurements</em>, because what we’re doing is projecting a quantum state onto one of the two measurement basis vectors:</p>
<figure><img src="https://ahelwer.ca/img/chsh/2-projection.svg"
alt="The same state projected onto the computational and sign bases. The length of the projection on a basis vector is proportional to the probability of collapsing to that basis vector when measuring in that basis." width="10000"/><figcaption>
<p>The same state projected onto the computational and sign bases. The length of the projection on a basis vector is proportional to the probability of collapsing to that basis vector when measuring in that basis.</p>
</figcaption>
</figure>
<p>In addition to the computational & sign bases, we can use <em>any pair of orthonormal vectors</em> as a basis for qbit measurement!</p>
<figure><img src="https://ahelwer.ca/img/chsh/3-other-orthonormal-bases.svg"
alt="Any two vectors of length \(1\) forming a right angle at the origin represent a valid measurement basis." width="10000"/><figcaption>
<p>Any two vectors of length \(1\) forming a right angle at the origin represent a valid measurement basis.</p>
</figcaption>
</figure>
<p>Something very curious happens if we measure our entangled qbit state in the sign basis - instead of collapsing to \(|00\rangle\) or \(|11\rangle\), it collapses to \(|++\rangle\) or \(|- - \rangle\)!
In fact, regardless of the basis in which you measure, if one qbit collapses to a value then the other entangled qbit also collapses to that same value!
This is a property called <em>rotational invariance</em>, and we make much use of it below.</p>
<p>There’s one more detail to cover, and that’s how to calculate the probability of collapse in these other measurement bases.
A bit of high-school trigonometry does the trick:</p>
<figure><img src="https://ahelwer.ca/img/chsh/4-born-rule.svg"
alt="Measuring a quantum state in the sign basis. The state to be measured is \(|0\rangle\) rotated \(\pi/8\) radians counter-clockwise around the unit circle." width="10000"/><figcaption>
<p>Measuring a quantum state in the sign basis. The state to be measured is \(|0\rangle\) rotated \(\pi/8\) radians counter-clockwise around the unit circle.</p>
</figcaption>
</figure>
<p>The angle between the state vector and \(|+\rangle\) basis vector is \(\pi/8\) radians, and the angle between the state vector and \(|-\rangle\) basis vector is \(3\pi/8\) radians.
The distance from the origin to point \(x\) is given by \(\cos(3\pi/8)=0.38\), and the distance from the origin to point \(y\) is given by \(\sin(3\pi/8)=0.92\).
Square the absolute value of this distance to get the probability of collapse to that basis vector: \(0.85\) for \(|+\rangle\), and \(0.15\) for \(|-\rangle\).</p>
<h2 id="a-curious-game">A curious game</h2>
<p>Consider a game involving two people, Alice and Bob. Alice is given a random bit \(X\) and Bob a random bit \(Y\).
Alice then outputs a chosen bit \(A\) and Bob a chosen bit \(B\).
Their objective?
Satisfy the logical formula \(X \cdot Y = A \oplus B\).
The catch? Alice and Bob cannot communicate.</p>
<p>Let’s break down this formula; here’s the truth table for \(X \cdot Y\):</p>
<table>
<thead>
<tr>
<th style="text-align:center">\(X\)</th>
<th style="text-align:center">\(Y\)</th>
<th style="text-align:center">\(X \cdot Y\)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">0</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">1</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">0</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
</tr>
</tbody>
</table>
<p>Here’s the truth table for \(A \oplus B\):</p>
<table>
<thead>
<tr>
<th style="text-align:center">\(A\)</th>
<th style="text-align:center">\(B\)</th>
<th style="text-align:center">\(A \oplus B\)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">0</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">0</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">0</td>
</tr>
</tbody>
</table>
<p>Careful consideration of the above rules and truth tables gives rise to an important observation: Alice and Bob cannot possibly win the game every time.
All they can do is pick an approach which maximizes their probability of success.
The winning strategy is quite simple: Alice and Bob both always output \(0\), regardless of their input values.
This gives them a 75% chance of winning; they’ll only lose in the case where both \(X\) and \(Y\) are \(1\).
Proof omitted, this is the best possible classical strategy.</p>
<p>What if we bend the rules a little so Alice and Bob are each in possession of one half of a two-qbit entangled quantum state in addition to their random input bit?
Something remarkable happens: a strategy exists which enables them to win 85% of the time instead of 75%!
A 10% advantage doesn’t seem earth-shattering at first glance, but the implications are far-reaching (and exquisitely detailed in <a href="http://www.scholarpedia.org/article/Bell's_theorem">this Scholarpedia article</a>).
How does it work?</p>
<p>The high-level strategy has Alice and Bob measuring their entangled qbits in different bases depending on whether they’re given a \(0\) or a \(1\).
They use the results of those measurements to decide whether to output \(0\) or \(1\) themselves.
Amazingly, through clever choice of measurement bases we can ensure our \(\theta\) (the angle between the state vector and one of the measurement basis vectors) always equals \(\pi/8\) for the outcome we want.
What is \(\cos^2(\pi/8)\)?
Approximately \(0.85\), of course!</p>
<p>Let’s look at Alice’s strategy first:</p>
<table>
<thead>
<tr>
<th style="text-align:center">If Alice receives</th>
<th style="text-align:center">she measures in the</th>
<th style="text-align:center">and if she gets</th>
<th style="text-align:center">she outputs</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">computational basis</td>
<td style="text-align:center">\(|0\rangle\)</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">computational basis</td>
<td style="text-align:center">\(|1\rangle\)</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">sign basis</td>
<td style="text-align:center">\(|+\rangle\)</td>
<td style="text-align:center">0</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">sign basis</td>
<td style="text-align:center">\(|-\rangle\)</td>
<td style="text-align:center">1</td>
</tr>
</tbody>
</table>
<p>Simple so far?
Bob’s strategy is a bit more complicated.
If he receives a \(0\), he measures in the computational basis rotated \(\pi/8\) radians counter-clockwise around the unit circle (henceforth called the \(\pi/8\) basis):</p>
<figure><img src="https://ahelwer.ca/img/chsh/5-bob-input-zero.svg"
alt="Bob&rsquo;s measurement basis given input \(0\)." width="10000"/><figcaption>
<p>Bob’s measurement basis given input \(0\).</p>
</figcaption>
</figure>
<p>If Bob receives a \(1\), he measures in the computational basis rotated \(\pi/8\) radians <em>clockwise</em> around the unit circle (henceforth called the \(-\pi/8\) basis):</p>
<figure><img src="https://ahelwer.ca/img/chsh/6-bob-input-one.svg"
alt="Bob&rsquo;s measurement basis given input \(1\)." width="10000"/><figcaption>
<p>Bob’s measurement basis given input \(1\).</p>
</figcaption>
</figure>
<p>As marked on the diagrams, Bob outputs a \(0\) if his measurement results in the vector closest to horizontal.
If it results in the vector closest to vertical, he outputs a \(1\).
Curiously, these strategies result in an 85% chance of satisfying the logical formula \(X \cdot Y = A \oplus B\)!
It also works regardless of who measures their qbit first, ensuring no communication is required between Alice and Bob.</p>
<p>Let’s take an example and see how it plays out.
Consider the case when both Alice and Bob receive \(1\) as input.
Recall this is the case that trips up the classical strategy - we want Alice OR Bob to output \(1\) here, but not both.
Say Alice measures her qbit first, in the sign basis since she received a \(1\) as input.
She has a 50% chance of measuring \(|+\rangle\), and a 50% chance of measuring \(|-\rangle\).
Whichever outcome she measured, Bob’s qbit is now also in that state:</p>
<figure><img src="https://ahelwer.ca/img/chsh/7-bob-after-alice-measurement.svg"
alt="The two possible states of Bob&rsquo;s qbit after Alice measures hers." width="10000"/><figcaption>
<p>The two possible states of Bob’s qbit after Alice measures hers.</p>
</figcaption>
</figure>
<p>Bob will then measure his qbit in the \(-\pi/8\) basis, since he also received a \(1\) as input:</p>
<figure><img src="https://ahelwer.ca/img/chsh/8-bob-projective-measurements.svg"
alt="Bob&rsquo;s measurement on his two possible states." width="10000"/><figcaption>
<p>Bob’s measurement on his two possible states.</p>
</figcaption>
</figure>
<p>Here’s how the cases break down:</p>
<table>
<thead>
<tr>
<th style="text-align:center">If Alice outputs</th>
<th style="text-align:center">then Bob’s qbit is</th>
<th style="text-align:center">so Bob outputs 0 with probability</th>
<th style="text-align:center">and 1 with probability</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">\(|+\rangle\)</td>
<td style="text-align:center">\(\cos^2(3\pi/8)=0.15\)</td>
<td style="text-align:center">\(\sin^2(3\pi/8)=0.85\)</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">\(|-\rangle\)</td>
<td style="text-align:center">\(\cos^2(-\pi/8)=0.85\)</td>
<td style="text-align:center">\(\sin^2(-\pi/8)=0.15\)</td>
</tr>
</tbody>
</table>
<p>We see that in the case where Alice measured \(|+\rangle\) (and will thus output \(0\)) Bob has an 85% chance of outputting \(1\), and in the case where Alice measured \(|-\rangle\) (and will thus output \(1\)) Bob has an 85% chance of outputting \(0\).
So, there’s an 85% chance overall that exactly one of Alice and Bob will output \(1\), satisfying \(X \cdot Y = A \oplus B\)!
You can similarly analyze the cases where the inputs are \(00\), \(01\), and \(10\) (or when Bob measures before Alice) and see they always give an 85% chance of success.
Amazing!</p>
<h2 id="applications-and-implications">Applications and implications</h2>
<p>The game described above is called <em>the CHSH game</em>, after the initials of the physicists who first proposed it.
It’s a bit of a head-scratcher how we can actually make use of it.
We might consider two space admirals fighting separate space battles separated by light-years; the outcome of those battles is either a loss (input 0) or a win (input 1), and if they <em>both</em> win then <em>one of them</em> needs to proceed to a further objective (output 1) while the other returns home (output 0).
Or something. Or a <a href="https://youtu.be/_kLb1glm6EM">quantum twist on the prisoner’s dilemma</a>.
I’m sure others have put more thought & imagination into this than I have.</p>
<p>One concrete application of these <em>nonlocal games</em> as they’re called (there are others beyond the CHSH game; see <em><a href="https://arxiv.org/abs/quant-ph/0404076">Consequences and Limits of Nonlocal Strategies</a></em> by Cleve et al.) is <a href="https://en.wikipedia.org/wiki/Device-independent_quantum_cryptography">device-independent quantum cryptography;</a> see also <a href="https://quantumcomputing.stackexchange.com/q/4874/4153">this</a> QCSE question.
The basic idea is you’ll be able to execute cryptographic operations on untrusted quantum computers, by using nonlocal games to test that they’re honestly using quantum phenomena.</p>
<p>The implications of the CHSH game are extreme.
They form the core of Bell’s theorem, which has been called “the most profound discovery of science” and states that no physical theory of local hidden variables can ever reproduce all the predictions of quantum mechanics.
It’s all about locality, or rather how locality <em>isn’t true</em> - quantum entanglement really is faster-than-light, and doesn’t have any obvious medium through which it works!
Before Bell’s theorem, people suspected there was some mechanism whereby entangled particles decided how they would collapse at time of <em>entanglement</em>, not at time of <em>measurement</em>, then carried that decision (a “hidden variable”) with them until they were measured.
The CHSH game blows this theory out of the water.
Consider the following argument:</p>
<ol>
<li>If particles decide at time of entanglement how they will collapse, they must carry within them some information (the local hidden variable). This information can be represented as a string of classical bits.</li>
<li>Since the information is sufficient to completely describe the way in which the entangled qbits collapse, Alice and Bob could, if given access to that same string of classical bits, emulate the behavior of their qbits.</li>
<li>If Alice and Bob could emulate the behavior of their qbits, they could implement the quantum strategy with purely classical methods using the string of classical bits. Thus, there must exist some classical strategy giving an 85% success rate with some string of bits as input.</li>
<li>However, there exists no string of bits which enables a classical strategy with success rate above 75% (proof omitted, just take my word for it); by contradiction, the behavior of entangled particles is not reducible to a string of bits (local hidden variable) and thus the entangled particles must instantaneously affect one another at time of measurement.</li>
</ol>
<p>This isn’t just theoretical - the <a href="https://en.wikipedia.org/wiki/Bell_test_experiments">experiments have been done and the numbers are in</a>.
Nonlocality is fundamental to how our universe works. For a detailed treatment of this topic, see the <a href="http://www.scholarpedia.org/article/Bell%27s_theorem">Scholarpedia</a> or <a href="https://plato.stanford.edu/entries/bell-theorem/">SEP</a> articles.</p>
<p>In the end, the CHSH game is to me paradigmatic of the subtlety of quantum mechanics - and how exploring that subtlety yields rich rewards in a way no other field seems to match.</p>
<h2 id="implementation-in-q">Implementation in Q#</h2>
<p>It’s easy to distrust abstract reasoning, especially in unintuitive realms like probability; unless you’re a hardened mathematician, you want to see the results!
To that end, I wrote up the CHSH game in <a href="https://www.microsoft.com/en-us/quantum/development-kit">Q#, Microsoft’s quantum language</a>.
The game is played 10,000 times with random input bits given to Alice and Bob (plus another random bit controlling who measures first) to see whether they really win 85% of the time vs. 75% of the time with the classical strategy.
Lo and behold:</p>
<pre><code>PS C:\Users\ahelwer\source\quantum-experiments\chsh> dotnet run
Classical success rate: 0.7474
Quantum success rate: 0.8592
SPOOKY
</code></pre><p>Really should have published this blog post on/around Halloween.</p>
<p>You can find the source code <a href="https://github.com/ahelwer/quantum-experiments/tree/master/CHSH">here</a>.
The code is very simple except for one thing: measurement.
Here’s how we specify measurement in common bases like the computational or sign bases:</p>
<pre><code>// Two equivalent methods of measuring in the computational basis
Measure([PauliZ], [qubit]);
M(qubit);
// Measuring in the sign basis
Measure([PauliX], [qubit]);
</code></pre><p>The PauliZ and PauliX parameters refer to the <a href="https://en.wikipedia.org/wiki/Pauli_matrices">Pauli matrices</a>:</p>
<p>$$
\sigma_z = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix},
\sigma_x = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}
$$</p>
<p>It seems odd to specify measurement bases with matrices, but the two bases we want here (the computational and sign bases) are the <em>eigenvectors</em> of these matrices.
Recall an eigenvector of a matrix is a vector which, when multiplied by the matrix, is changed only by a constant multiplicative factor (called an <em>eigenvalue</em>):</p>
<p>$$
\sigma_z|0\rangle =
\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} =
\begin{bmatrix} 1 \\ 0 \end{bmatrix} =
1|0\rangle
$$
$$
\sigma_z|1\rangle =
\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} =
\begin{bmatrix} 0 \\ -1 \end{bmatrix} =
-1|1\rangle
$$
$$
\sigma_x|+\rangle =
\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
1|+\rangle
$$
$$
\sigma_x|-\rangle =
\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \end{bmatrix} =
\begin{bmatrix} \frac{-1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} =
-1|-\rangle
$$</p>
<p>It is thus convenient to identify common measurement bases in this way; we call the matrices <em>observables</em>.
You can think of these matrices as corresponding in some way to the measurement device, which forces the quantum state to collapse to one of its eigenvectors and displays the eigenvalue to the user - perhaps deflecting a needle in the positive direction if the state collapsed to the eigenvector with eigenvalue 1, and deflecting a needle in the negative direction if the state collapsed to the eigenvector with eigenvalue -1.</p>
<p>Things are slightly more complicated when measuring in Bob’s nonstandard bases.
To measure in the π/8 basis, we do the following:</p>
<ol>
<li>Rotate the quantum state π/8 radians <em>clockwise</em> around the unit circle</li>
<li>Measure in the computational basis</li>
</ol>
<p>This is a bit strange; here’s what we’re doing graphically:</p>
<figure><img src="https://ahelwer.ca/img/chsh/9-qsharp-positive-basis.svg"
alt="On the left is what we want to do; on the right is how we actually do it." width="10000"/><figcaption>
<p>On the left is what we want to do; on the right is how we actually do it.</p>
</figcaption>
</figure>
<p>This method is weird, but it works. To measure in the -π/8 basis, we do something similar:</p>
<ol>
<li>Rotate the quantum state π/8 radians <em>counter-clockwise</em> around the unit circle</li>
<li>Measure in the computational basis</li>
</ol>
<p>Again, here’s the graphical representation of what we’re doing:</p>
<figure><img src="https://ahelwer.ca/img/chsh/10-qsharp-negative-basis.svg"
alt="On the left is what we want to do; on the right is how we actually do it." width="10000"/><figcaption>
<p>On the left is what we want to do; on the right is how we actually do it.</p>
</figcaption>
</figure>
<p>In terms of actual Q# code, we write this as follows:</p>
<pre><code>// Measure in π/8 basis
let rotation = -2.0 * PI() / 8.0;
Ry(rotation, qubit);
return M(qubit);
// Measure in -π/8 basis
let rotation = 2.0 * PI() / 8.0;
Ry(rotation, qubit);
return M(qubit);
</code></pre><p>We use <a href="https://docs.microsoft.com/en-us/qsharp/api/qsharp/microsoft.quantum.intrinsic.ry?view=qsharp-preview">the Ry operator</a> to rotate the quantum state in the unit circle plane.</p>
<p>Credit to <a href="https://twitter.com/tcNickolas">Mariia Mykhailova</a> from the Microsoft Quantum team for <a href="https://quantumcomputing.stackexchange.com/q/4223/4153">explaining how to do this</a>!</p>
<h2 id="resources-and-further-reading">Resources and further reading</h2>
<ul>
<li>
<p>Dr. Umesh Vazirani’s <em>Quantum Mechanics & Quantum Computation</em> lectures [<a href="https://youtu.be/WP41P6fnGOY">1</a>, <a href="https://youtu.be/kET99ApqYKU">2</a>, <a href="https://youtu.be/r2oI2lF8wgw">3</a>, <a href="https://youtu.be/ErLGu8YuS6U">4</a>]</p>
</li>
<li>
<p><a href="https://quantumcomputing.stackexchange.com/">Quantum Computing Stack Exchange</a> - very friendly to newcomers!</p>
</li>
<li>
<p><em>Consequences and Limits of Nonlocal Strategies</em> by Richard Cleve, Peter Høyer, Ben Toner, and John Watrous [<a href="https://arxiv.org/abs/quant-ph/0404076v2">arxiv</a>]</p>
</li>
<li>
<p><em>Bell nonlocality</em> by Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio Scarani, and Stephanie Wehner [<a href="https://arxiv.org/abs/1303.2849">arxiv</a>]</p>
</li>
</ul>
<p><em>This post is part of the 2018 Q# Advent Calendar; see <a href="https://blogs.msdn.microsoft.com/visualstudio/2018/11/15/q-advent-calendar-2018/">here</a> for more!</em></p>
<p><em>Thanks to <a href="https://www.linkedin.com/in/samtgoodman/">Sam Goodman</a> and <a href="https://twitter.com/tcNickolas">Mariia Mykhailova</a> for their tremendous efforts editing this post.</em></p>
Checking Firewall Equivalence with Z3
https://ahelwer.ca/post/2018-02-13-z3-firewall/
Tue, 13 Feb 2018 00:00:00 +0000https://ahelwer.ca/post/2018-02-13-z3-firewall/<p>Lessons I’ve learned from software engineering are uniformly cynical:</p>
<ul>
<li>Abstraction almost always fails; you can’t build something on top of a system without understanding how that system works.</li>
<li>Bleeding-edge methods are a recipe for disaster</li>
<li>Everything good is hype and you’ll only ever get a small fraction of the utility being promised.</li>
</ul>
<p>Imagine my surprise, then, when the Z3 constraint solver from Microsoft Research effortlessly dispatched the thorniest technical problem I’ve been given in my short professional career.</p>
<h2 id="the-problem">The Problem</h2>
<p>Microsoft Azure has a <em>lot</em> of computers in its datacenters - on the order of millions.
For security, each of these computers is configured with a firewall which accepts communication from a comparatively small set of authorized servers.
These firewalls aren’t created by hand - they’re automatically generated during deployment.
We wanted to update the firewalls from a confused overlapping whitelist/blacklist system to a simple whitelist.
Any change in this domain carries substantial risk:</p>
<ul>
<li>Accidentally allowing connections from computers which should be blocked, a significant security issue.</li>
<li>Accidentally blocking connections from computers which should be allowed, a significant availability issue.</li>
</ul>
<p>Thus we wanted strong guarantees that firewalls generated with the new method blocked & allowed the exact same connections as firewalls generated with the old method.
This is very difficult; the naive solution of checking all 2^80 packets against both firewalls would take a computer 38 million years to finish at a brisk rate of one billion packets per second!
There’s another way: give the problem to the Z3 theorem prover from Microsoft Research, and it checks equivalence in a fraction of a second. How?</p>
<h2 id="indistinguishable-from-magic">Indistinguishable from magic</h2>
<p>Z3 is variously described as a theorem prover, SMT solver, and constraint solver.
I like to think of it as an Oracle.
Let’s think - if we had access to an Oracle, what question would we ask to solve the firewall equivalence problem?
First: we require an understanding of firewalls and packets.</p>
<p>Every piece of information sent over the network is encapsulated in a packet.
Like a proper piece of correspondence, packets contain two important pieces of information: where they came from, and where they’re going.
We’ll say each address is a single number, like 50 or 167.
So, the packet [23, 75] came from source address 23 and is heading to destination address 75 (in real life these numbers are IPv4 or IPv6 addresses, but these are just [very large] numbers and so the simplification works).</p>
<p>Firewalls are lists of rules saying which packets to block and which to allow.
Rules are expressed in terms of source and destination address ranges, plus a decision - block or allow.
We say a packet <em>matches</em> a rule if the packet’s source address is in the rule’s source range and destination in the destination range, in which case the decision is applied to that packet.
For example, we can write a rule to block any packets originating in the address range 100-150 and headed to an address in 60-70.
This rule would block the packet [125, 65].</p>
<p>Rules can overlap.
If a packet matches both a block and allow rule, the block rule ‘wins’ and the packet is blocked.
If a packet doesn’t match any rules, it is blocked by default.
A packet only gets through if it matches at least one allow rule and zero block rules.</p>
<p>Let’s return to the question of the question.
What should we to ask?
I submit the following: Oracle, what is a packet blocked by one firewall but allowed by the other?
If the Oracle replies there is no such packet, we know the firewalls are equivalent (hurrah!).
If it replies with an example of such a packet, we know the firewalls are not equal and have a really great lead on figuring out <em>why</em> they aren’t equal.</p>
<p>Z3, for all its amazing capabilities, can’t understand queries in plain English.
The problem now becomes stating our question in a form understood by Z3: first-order logic.</p>
<h2 id="the-right-question">The right question</h2>
<p>First-order logic is not scary.
We require only two logical operators: <em>and</em> and <em>not</em>.
To ask our question of Z3, we must do three things:</p>
<ol>
<li>Tell Z3 what a packet is.</li>
<li>Tell Z3 what a firewall is.</li>
<li>Tell Z3 we want to find a packet blocked by one firewall but allowed by the other.</li>
</ol>
<p>Z3 works with popular programming languages Java, C#, C++, and Python, but for simplicity we’ll use its native language.
You can follow along on the Z3 web demo here: <a href="http://rise4fun.com/Z3">http://rise4fun.com/Z3</a></p>
<p>The first task is easy.
Our simple packets have two fields: source, and destination.
We describe this to Z3 by declaring integer constants <em>src</em> and <em>dst</em>.
Z3’s mission is to find values for these constants - once all the wiring is in place, their values tell us a packet accepted by one firewall but not the other.
Here’s how you declare the constants in Z3:</p>
<pre><code>(declare-const src Int)
(declare-const dst Int)
</code></pre><p>The second task is the real meat of the problem: tell Z3 what a firewall is.
First, let’s define what it means for a packet to match a rule:</p>
<pre><code>(define-fun matches ((srcLower Int) (srcUpper Int) (dstLower Int) (dstUpper Int)) Bool
(and (<= srcLower src) (<= src srcUpper) (<= dstLower dst) (<= dst dstUpper))
)
</code></pre><p>This function is <em>true</em> if <em>src</em> is in the rule’s source address range and <em>dst</em> is in the rule’s destination range.
Otherwise it is <em>false</em>.</p>
<p>Now we define what it means for a firewall to accept or block a packet.
Let’s use a simple firewall with two rules, an allow rule and a block rule.
The firewall function returns true if the packet is allowed, and false if it is blocked.
Here’s how we state this to Z3, using the match function defined above:</p>
<pre><code>(define-fun firewall1 () Bool
(and
(matches 0 10 20 30)
(not (matches 5 10 25 30))
)
)
</code></pre><p>Z3 is now a firewall expert. On to the third task!</p>
<h2 id="satisfaction">Satisfaction</h2>
<p>The third task is to actually verify firewall equivalence. First, define a second firewall so we have something to check:</p>
<pre><code>(define-fun firewall2 () Bool
(and
(matches 1 10 20 30)
(not (matches 5 10 25 30))
)
)
</code></pre><p>It’s time! We have everything we need.
Let’s ask Z3 the question - what is a packet blocked by one firewall but allowed by the other?</p>
<pre><code>(assert (not (= firewall1 firewall2)))
(check-sat)
(get-model)
</code></pre><p>Click the run button in the web demo and… boom!
Z3 finds us a packet - for me, [0, 20] - that is accepted by firewall1 but blocked by firewall2.
This works for any two firewalls!
All we have to do is change the contents of the firewall1 and firewall2 functions.</p>
<p>This all seems a bit magical, so let’s break down the last step.
First, we assert the two firewalls are <em>not</em> equivalent.
Then we ask Z3 to check this assertion with the check-sat instruction!
This has two possible outcomes:</p>
<ol>
<li>The firewalls are <em>not</em> equivalent: check-sat returns <em>satisfiable</em>, and the get-model instruction provides a packet demonstrating firewall inequivalence.</li>
<li>The firewalls <em>are</em> equivalent: check-sat returns <em>unsatisfiable</em> and no packet is produced.</li>
</ol>
<p>Either way, we have our answer.
Z3 ruthlessly tracks down values of <em>src</em> and <em>dst</em> representing a packet accepted by one firewall but not the other.
This is very fast: clever logic manipulation rules enable Z3 to process 300-rule firewalls in a fraction of a second.</p>
<h2 id="in-the-real-world">In the real world</h2>
<p>Real packets don’t exactly correspond to our model.
Instead of simple numbers, they use IPv4 or IPv6 source & destination addresses, port numbers, and protocol numbers.
Z3 handles these with no real changes to the core logic; Z3 bitvectors are a drop-in replacement type for the address numbers in our model.
The actual firewall-checking code used inside Azure has been open-sourced, and is available <a href="https://github.com/Z3Prover/FirewallChecker">here</a>.</p>
<h2 id="beyond-the-firewall">Beyond the firewall</h2>
<p>This problem hardly taxes Z3’s ability, which lists nonlinear constraints on real numbers in its repertoire.
Despite its expansive set of use cases, Z3 significantly decreased problem complexity compared to other approaches.
The code was simple to write and easy to understand.
If you’re facing a thorny problem that seems like it could be stated in terms of satisfiability, I very much recommend giving Z3 a try.</p>
<p><em>For an in-depth whitepaper on this topic, see</em> “<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/nbjorner-icdcit2015.pdf">Checking Cloud Contracts in Microsoft Azure</a>” <em>by Nikolaj Bjørner and Karthick Jayaraman.</em></p>
Formal Verification, Casually Explained
https://ahelwer.ca/post/2018-02-12-formal-verification/
Mon, 12 Feb 2018 00:00:00 +0000https://ahelwer.ca/post/2018-02-12-formal-verification/<h2 id="why-are-we-here">Why are we here?</h2>
<p>What guarantees does formal verification provide?
This question rests at the apex of a hierarchy of inquiry extending all the way down to how we can know anything at all!</p>
<h2 id="what-do-we-mean-by-software-correctness">What do we mean by software correctness?</h2>
<p>There are precisely two different ways for a piece of software to be correct:</p>
<ul>
<li>The supreme deity of the universe descends from the heavens and decrees, with all the weight of Objective Truth, that a certain piece of software is correct.</li>
<li>We have a list of things we want the software to do, and use logic to prove the software does these things.</li>
</ul>
<p>Crafty readers will come up with, like, twenty different caveats to the second definition.
I encourage it! Take a minute to unearth some of the hidden assumptions. Good? Good.
I’ll talk about those hidden assumptions in greater detail below, but the point I want to make is this: <em>there is no such thing as inherently correct software</em>.
Your wacky program could say 1 * 1 = 2, and that isn’t incorrect in any objective sense - if that’s how you want your program to behave, heck, knock yourself out.
We call our list of things we want the software to do a <em>specification</em>.
A program is only correct (or incorrect) relative to its specification; if the program is correct relative to its specification, we say it <em>implements</em> or <em>refines</em> the specification.</p>
<h2 id="a-door-guard-to-the-law">A door guard to The Law</h2>
<figure><img src="https://ahelwer.ca/img/formal-verification/tao-spec.PNG"/>
</figure>
<p>It seems like we’re just kicking the can down the road.
We now have the means to say whether a piece of software is correct, but how do we know the specification itself is correct?
There are two ways: one, write an even higher-level specification and prove our original spec implements the higher-level spec (recursively do this as many times as you want), and two: admit the root of truth here is an idea germinated within a fallible human brain, transcribed to spec through fallible cognitive processes, with much lost in translation during transit over the fundamentally unbridgeable lacuna between mind and material world.
This is obviously depressing, but take heart!
Writing specifications is far better than the alternative, which is getting right down to business and spewing out a bunch of code that vaguely accomplishes whatever you remember your good idea to be.
Clearly a horrifying practice not to be attempted except as an exercise in learning the symbol keys on your unlabeled mechanical keyboard.
Furthermore, formal specification systems have numerous tools to perform sanity checks on your spec, from syntactic analysis to finite model checking or even formal proofs of desired properties.</p>
<h2 id="tracing-cracks-in-the-edifice">Tracing cracks in the edifice</h2>
<p>Let’s further exercise our radical doubt against this chain of truth we’re trying to build.
First, as we saw above, this chain is fundamentally without an anchor; epistemologically unsatisfying, but better than the alternative.
Second, proving a program implements the spec (or that a spec refines the spec) - how can we be sure the proof is without error?
Third, how do we account for all the bizarre behavior of the real world: meteorites hitting servers, the arbiter problem, overly-aggressive compiler optimizations, ship anchors cutting undersea cables, and obscure floating point arithmetic errors in old Intel CPUs?
Let’s talk about the third issue first; you can think of it as the far end of the proof chain, where the rubber hits the road and the program executes on a real computer shuffling real electric potentials around.
Unfortunately, this end is as unmoored as the other. All we can do is present an idealized model of the world and prove our program implements the spec under the assumptions of the model.
You can make this model as detailed as you like, all the way down to the electric potentials level (or beyond!) but with software it usually stops above the hardware level and safely assumes all is well below.</p>
<p>Since humans are endlessly fallible, even and especially in mathematical proof, we want a program to do the heavy lifting when proving implementation.
There’s the obvious question: how do we know this verifier program is itself correct?
It seems everywhere we turn it’s turtles all the way down.
Now, many computer scientists will be familiar with the concept of a bootstrapping compiler: a compiler which compiles itself.
Why, then, could a verifier not verify itself?
This is a very good question, and those possessing a little math knowledge will respond with some vaguely-articulated Godelian objection.
I’m intuitively skeptical of this objection, but very interested in what obstacles arise when bootstrapping a verifier; perhaps a future blog post will detail such an attempt.
Suffice it to say we can construct a verifier (also called a proof assistant or proof system) which provides extremely strong guarantees that our simple implementation proofs are correct or not.</p>
<h2 id="a-low-bar-to-clear">A low bar to clear</h2>
<p>After spending all that time tearing apart our foundations, let’s pause to consider the goal of formal verification.
We want very strong guarantees our software is correct, yes, but why?
Because correct software provides greater utility than incorrect software.
How do we ensure the correctness of conventional software?
Code review, automated testing, and real-world use.
It’s easy to see these provide significantly weaker guarantees of correctness.
For formal verification, we need only three weak assumptions for our program to be guaranteed correct:</p>
<ul>
<li>Our top-level specification accurately corresponds to our idea.</li>
<li>If our implementation proof contains errors, they are caught by our verifier program.</li>
<li>If an event occurs in the real world, its effect on our program is captured by our model.</li>
</ul>
<p>Contrast this with a selection of assumptions required for conventional software correctness:</p>
<ul>
<li>A specification exists, and all programmers have the same understanding of the specification.</li>
<li>Validation performed by automated tests conform to specification requirements.</li>
<li>If the specification is changed, the code & tests are changed, and vice versa.</li>
<li>Automated tests exercise every combination of code path and variable assignment (lol).</li>
<li>Automated tests do not contain bugs which incorrectly pass a failing condition.</li>
<li>The compiler does not contain bugs in its translation to executable code.</li>
</ul>
<p>And so on.
While formal verification may not be fully philosophically satisfying, it accomplishes our goal of writing correct software to a degree unimaginable in conventional software development.</p>
<h2 id="all-our-enterprise-brought-to-ruin">All our enterprise brought to ruin</h2>
<p>Now let’s get practical.
Can we formally verify a piece of software, today?
How difficult is it?
How much does it cost?
As of late 2016 (ed. note: now early 2018), these don’t all have pleasant answers.
Academics say formal verification is ready for prime time; this isn’t necessarily untrue, but the economics don’t favor widespread industry use just yet.
The most illustrative case study for large-scale state-of-the-art formally-verified software development is <a href="https://www.usenix.org/node/186162">Project Ironclad</a> from Microsoft Research.
The team created a full formally-verified software stack from scratch, plus apps; they saw overhead of five lines of verification code for every line of actual code.
Project costs were informally estimated an order of magnitude higher than equivalent conventional software.
This is within striking distance!
Formal verification is now a tooling problem - we need reliable pre-check-in validation tools supporting fast partial verification, human-friendly error messages when the verifier is confused, and (hardest of all) smarter verifiers which don’t require as much help.
I expect economics to favor formal verification within a decade, strongly so when factoring in maintenance costs.</p>
<h2 id="specifics">Specifics</h2>
<p>You have an idea in your head.
You’re sold on formal verification.
In which language should your spec be written?
I’ll argue your first spec should be in English (or your informal tongue of choice).
Writing, as they say, is nature’s way of letting you know how sloppy your thinking is.
Mathematics, in turn, is nature’s way of letting you know how sloppy your writing is - and your second specification will be written in the language of mathematics.
So, <a href="https://en.wikipedia.org/wiki/TLA%2B">TLA+</a>.</p>
<p>Then what?
TLA+ unfortunately lacks the ability to recursively refine your spec down to the level of executable code.
At some point it gets close enough to make the hop yourself, but this leaves a distasteful gap in our free-floating chain of perfection.
The <a href="https://github.com/Microsoft/dafny/wiki/INSTALL">Dafny</a> language (also from Microsoft Research) is billed as your one-stop shop for specification & verification, but Project Ironclad (and the extension Project Ironfleet) seems to be its only large-scale application.
<a href="https://www.fstar-lang.org/">F*</a> deserves mention, if only for someone to finally get around to writing its Wikipedia article.</p>
<p>Nothing has yet emerged as an obviously dominant solution.
To me this is exciting; there is an obvious gap that, with the right backing and community, could give rise to something new.
Something special.</p>
<h2 id="formal-specification-and-machine-learning">Formal specification and machine learning</h2>
<p>Nobody cares about this</p>
<h2 id="conclusion">Conclusion</h2>
<p>If you enjoyed reading this you should go research the DAO hack and write an opinionated analysis.</p>
<p><em>Credit to <a href="https://twitter.com/Hillelogram">@hillelogram</a> for encouraging me to just publish this, 1.5 years later</em></p>
About me
https://ahelwer.ca/page/about/
Mon, 01 Jan 0001 00:00:00 +0000https://ahelwer.ca/page/about/<p>My name is Andrew Helwer, and I’m a software engineer consultant specializing in formal methods & distributed systems.
I have nearly a decade of industry experience having worked for Microsoft Quantum, Microsoft Azure, and Acceleware.
I have a BSc in computer science from the University of Calgary, and am primarily interested in distributed systems, formal methods, and quantum computing.
I’m also a TLA+ enthusiast!
You can <a href="https://www.linkedin.com/in/ahelwer/">contact me on LinkedIn</a>, or email <a href="mailto:ahelwer@disjunctive.llc">ahelwer@disjunctive.llc</a> for consulting-related inquiries.
My personal email is <a href="mailto:ahelwer@protonmail.com">ahelwer@protonmail.com</a>.</p>
<p>Currently I am based in Atlanta, GA.</p>