Willow Remembers What You Did Last Summer
Let's talk about noise
A surface code has one job. Watch a qubit. Notice if something goes wrong. Tell you about it before the wrongness compounds into a logical error. The whole apparatus, the millions of dollars in dilution refrigerators, years of fabrication, decoder running on its own dedicated workstation, exists to keep a quantum bit alive a little longer than it would otherwise stay.
The thing the surface code most fears is a string. A line of errors stitched end to end across the code patch, sliding past detection until it has eaten through the encoding and flipped the logical state. The threshold theorems say you can prevent these strings from forming, provided the errors that make them up arrive independently. Errors at different times on different qubits, drawn from independent coin flips. Independent coins? exponentially rare strings. Correlated coins? these are fine so long as the correlations decay fast enough with time and distance. Decay slowly, the math falls over, and the qubit is useless.
For thirty years the field has been treating the chip’s coins as fast-decaying enough. Guess what, the chip has been making other plans.
In October 2024 five authors from CSIRO, Monash, and the University of Melbourne posted a paper to arXiv with the title “Detrimental non-Markovian errors for surface code memory.” The first author is a graduate student named John Kam. The senior author, Muhammad Usman, runs the Quantum Systems group at CSIRO. The paper is twenty pages long, contains hundreds of Monte Carlo simulations, each at ten million shots, and arrives at one finding stated three or four different ways for emphasis: if the noise on a syndrome qubit is correlated in time, in a manner the authors call streaky, the surface code loses its threshold.
Streaky is a technical term. It means an error that lasts. The syndrome qubit gets flipped for some number of rounds, then is left alone, then later something flips it again for another stretch. Streaks. The authors test what happens when those streaks have a polynomial tail. Two errors a few rounds apart are common; two errors many rounds apart are rare, though more common than independence would predict. This is the kind of noise that rain on a tin roof makes, where the loudest drops are the slowest to fall. Cosmic rays produce it. A coupler holding a stray excitation produces it. A two-level system drifting in and out of resonance with a qubit produces it. Which is to say: the noise a real superconducting chip has, as opposed to the noise the threshold theorems were proved against.
Under streaky correlations on syndrome qubits, the authors find, the logical error rate at distance fifteen is fifty-eight times worse than the Markovian model predicts. Same per-qubit error rate. Only the temporal correlation changes. Fifty-eight. At distance fifteen, the curve stops being exponential. The independent-noise version of the same chip would reach a one-in-a-trillion error rate — the level any useful quantum algorithm would need — at distance 37. The correlated version has no realistic distance projection at all. A bigger code still helps, but never enough.
The threshold theorems promised that bigger codes always make things better. They were proved against a different noise model.
Now turn to Nature, volume 638, page 920, February 2025. Two hundred and eighty-five authors. Google Quantum AI and Collaborators, “Quantum error correction below the surface code threshold.” This is the paper that made Willow famous and got into the Wall Street Journal. Distance-3, distance-5, distance-7 surface codes. Logical error rate cut roughly in half each time the distance climbs by two. The team calls it Λ ≈ 2.14. They show the curves. Extremely good curves.
Had the paper ended on page 922, it would be a clean victory.
It didn’t. On page 923 the team reports something else. They ran a distance-29 repetition code for five and a half hours, twenty billion cycles in total, expecting the logical error rate to keep going down as they made the code bigger. It went down beautifully, for a while. Then around distance fifteen it stopped. A floor at 10⁻¹⁰ per cycle. Below that floor the suppression dies. The team examined the data to figure out what was holding them up and identified two kinds of correlated events. The first: a single detector spikes, decays over hundreds of cycles, hypothesized to be a two-level system or a coupler doing something it shouldn’t. The second, and this is the one to slow down for: six events in five and a half hours, each one lighting up roughly thirty qubits at once, decaying exponentially with a time constant of 369 microseconds, of unknown origin. Quote, page 923: “We do not yet understand the cause of these events.”
Six times in almost six hours, the chip throws a small tantrum involving a third of its qubits, and the team that built the chip can only describe what they saw. They had ruled out cosmic rays already, having engineered against those, and these new events are three orders of magnitude rarer. Whatever the events are, they sit outside every model the team has.
This is the kind of finding that gets buried under its own framing. A team of almost three hundred people, employed by the largest technology company in the world, running an experiment for five and a half wall-clock hours on the most carefully engineered superconducting chip humans have ever built, watches their own apparatus produce six anomalous events whose mechanism remains opaque. The events are reported in a peer-reviewed Nature paper. A future paper that has yet to appear from anyone has to explain the mechanism behind them.
This is roughly the experimental physics equivalent of finding a goat in your kitchen. Elsewhere we have discussed the fact that the corporate announcement that followed the finding declared the kitchen goat-free.
Kam-Usman paper and Google Willow paper, posted to arXiv ten weeks apart in the autumn of 2024, are the simulation and the experiment of the same phenomenon. Kam and colleagues did the simulation. The Willow team produced the data. The simulation matched what the data showed: temporally correlated bursts on syndrome qubits produce a noise floor at high distance.
Kam and colleagues, who are honest scientists, stay inside their simulations and say what those show. The Willow team, who are also honest scientists, stay inside their measurements and say what they measured. Both are correct on their own terms. Put together, the two papers tell a story neither tells alone: the chip has a memory, the theorems assume otherwise, and the difference between those two assumptions becomes a wall somewhere between distance eleven and distance fifteen.
Why does the chip have a memory? Because it is a physical object embedded in another physical object that is itself embedded in the universe. A qubit lives in a cavity which sits on a chip in a copper can in a dilution refrigerator in a room. Rooms contain cosmic ray flux, vibrations from the building’s HVAC, and the kind of stray noise that turns a delivery truck idling outside the building into a measurement problem. Every one of these layers has a timescale. Two-level systems in the chip substrate diffuse on milliseconds to seconds. Phonon modes ring out over tens of microseconds. Quasiparticle bursts from a cosmic ray decay over tens of milliseconds. Whatever causes the six-events-per-six-hours mystery decays over 369 microseconds.
The first threshold theorem was proved in 1996 against an idealized noise model in which all these timescales collapse to zero. The bath forgets between gates. Robert Alicki and a small group of colleagues argued in 2001 and again in 2006 that this idealization is physically out of bounds, since thermodynamics will refuse it. More theorems were proved, extending the threshold to cases where the correlations decay sufficiently fast, and allowed the community to seek funding for the engineering. The skeptics’ point, that the decay rate of correlations in a real chip remained unmeasured, was filed away.
Willow paper measured it. Kam-Usman paper computed the consequences numerically: the surface code, as currently designed, hits a wall at the scale of the algorithms everyone actually want to run.
The industry is well aware of the correlated noise problem. The psychology of the industry may be a topic for study in its own right. One of the industry's major theorists, Scott Aaronson, has just given an interview to the Superposition Guy’s podcast (yes, there is such a thing) that was reproduced on the Quantum Insider webpage, in which he describes another famous quantum computing skeptic, Gil Kalai, as clinging to “some principle of correlated noise that comes on top of quantum mechanics and somehow screens off quantum computation.” But just two paragraphs down, Aaronson hedges: “If this is all that’s going on — simple uncorrelated noise — then quantum error correction is going to work. It’s merely a staggeringly hard engineering problem to build this at the scale where it works.”
Willow “below threshold” experiment just told us that this is not all that’s going on. As early as December 2024. The theorist is speaking, on the record, 18 months later. The problem is there. The community simply refuses to deal with it.
I have been following the field since 1993, my first year in college. Willow is a truly amazing piece of technical work, the kind I thought I might not be lucky enough to see in my lifetime, and the engineering is outstanding. Willow’s logical qubit lives more than twice as long as the best physical qubit in the chip that hosts it. Distance-7 produces 0.143% logical error per cycle, an actual measurement on an actual machine, performed by an actual team that has spent the better part of two decades getting good at making transmons behave. Within the regime the threshold theorem describes, things have improved dramatically and continue to improve. All of this stands.
The same paper reports, on page 923, that the regime the threshold theorems describe ends somewhere around distance fifteen. Beyond that distance there is a structure of correlated noise the theorem was built without, and which a different theorem, if anyone proves it, will need to absorb. Simulations have been done. Data have been produced. The theorem is, at this writing, still being applied at scales where the data and the simulation both indicate it has stopped applying.
Depending on your temperament, this is either an exciting open problem or a very expensive bug. Next round of error-correction papers will tell us which. My own bet is that the field will respond by developing decoders aware of streak correlations, a perfectly reasonable response, and that for some span of years these decoders will produce thresholds that look like they have closed the issue. Beneath the decoder, the actual physics will continue to be whatever it is and do whatever it does. The wall will move back. Whether it falls down depends on whether anyone goes and looks at the world around the qubit.
Don’t hold your breath.
And now, a bedtime story you can read to your kids:
The 105 Piglets
Once upon a time, on a hill above a quiet valley, and a single sycamore tree, there lived 53 piglets. Each piglet had one job. Hold a secret. The secret was a yes or a no. They could think about both, but when the Farmer asked them, they had to choose. Just one. Hold it tight.
The piglets were young and the wind was strong. Every time a leaf fell, every time a door slammed in the farmhouse, every time a star a billion years away thought of exploding — the piglets flinched and forgot their secret.
So the Farmer built them little houses, in groups of three, then five, then seven. “Look at your neighbor,” he said. “If you forget, your neighbor will remind you.” The piglets nodded. The Farmer nodded. The investors, who had come up the hill in shiny boots and stood below the sycamore tree, nodded hardest of all, and gave the Farmer lots of money.
It worked. One piglet would flinch, six would whisper, and the secret would be saved. With the money he got, the Farmer bought more piglets, more groups, built more houses. So many houses that they left the sycamore tree and reached a willow grove nearby. 105 piglets now, all on the hill, all whispering their secrets to one another, all safe from the falling leaves.
The Farmer did not know about the Wolf.
Well, not quite. He heard about it when he was your age, but he never saw one. So, he thought there is none, or maybe he forgot, got distracted building more houses for the piglets.
The Wolf does not eat piglets, not in this story at least. He is vegetarian. The Wolf does not knock houses down. The Wolf sometimes drives a truck three valleys over. The Wolf sometimes travels on a beam of light no one can see, that has been traveling since before there were hills, to arrive, this Tuesday, at exactly the wrong moment. The Wolf leans in close, very close, and whispers, very softly, to the piglets:
Forget
and thirty piglets, in five different houses on the same side of the hill, all forget the same secret at the same instant.
Their neighbors cannot remind them. Their neighbors forget too.
One summer night in August 2024, the 105 piglets were living on the richest farm, on the most carefully shielded hillside ever built surrounded by the willows. For five and a half hours, the piglets remembered. And then, halfway through, a Wolf showed itself on the hillside. The farmer scratched his head. He could not understand. Before the Wolf appeared, the bigger the group of piglets, the better they remembered. After that moment, the bigger groups still remembered better, but not nearly as much better as they used to. And with the Wolf on the hillside, after five and a half hours, no matter how big the group, the piglets forgot everything.
The Farmer told his Boss about everything. By then it was already winter. The Farmer’s Boss spoke on the news. He was really happy. He said the piglets remembered. Everyone clapped. He forgot to mention the Wolf. Instead he said something that no one understood, about many farms and many universes. The investors cheered and wrote more checks to the farm.
That same morning, the Farmer wrote a note to the other farmers in the valley. He told them about the Wolf. He said he didn't know yet where it came from, but he was sure they could figure it out together. They are still trying, or maybe not.
Five weeks before the Farmer’s Boss spoke on the news, a young shepherd in Melbourne (that’s in Australia), watching from a neighboring hill with four friends, had drawn the Wolf on the back of a napkin, asking very simple questions that the Farmer and all the other farmers in the valley should have asked.
What if every whisper brings more Wolves?
What if one whisper makes the next whisper louder?
And the young shepherd was right: the Wolf showed itself on the hillside exactly where the napkin said it would.
A year and a half later, the young shepherd drew the hillside on a napkin again. He found a place where the Wolf stands and the wind suddenly changes direction. Before it, bigger groups of piglets remember better. After it, bigger groups forget faster.
No one in the valley has spoken about that place.
Here is where this bedtime story ends, my dear, because the grown-ups have not finished it yet. The Wolf is still on the hill, making the piglets forget. Some grown-ups are pretending the Wolf is a rumor, because their careers depend on the Wolf being a little further away than it is. Other grown-ups, quieter ones, have gone outside to listen to the wind.
Sleep tight, my love. The Wolf has a name in the grown-up books. They call it correlated noise. The dangerous kind, the kind that is on the shepherd’s napkin, the kind that streaks — one forgetting pulling the next forgetting along with it, like the wind rattling the leaves of the willow trees on the hillside.
Now you know more than the investors.
Goodnight.





