expectation-field replay
When I first started building topicspace, the idea was not to generate more questions. It was to maintain continuity of understanding in a changing field. The expectation-field replay is the first version of that idea that actually feels like the original concept.
A lot of systems either reset their view every day, lock onto one thesis and quietly drift, or accumulate notes without showing how the underlying understanding changed. What I wanted instead was a system that could make its expectations explicit, carry them forward through time, and show when they weakened, split, or were retired.
The setup is simple. At T0, the system forms an initial expectation from the field as it exists that day. Then it walks forward day by day, updating that expectation deterministically against the evolving field. If the structural support holds, the expectation stays alive. If support weakens, it enters a weakening state. If the field clearly splits, narrower child expectations can spawn. If support breaks badly enough for long enough, the expectation is retired. The important point is that the system is no longer just producing daily prose. It is maintaining an explicit expectation object with a trajectory.
The two expectations being tracked
The replay runs against tracked expectations that topicspace already maintains. Each expectation has a stable handle, a cohort of actors it covers, and at any point in time a forward claim the system has formed about what the field implies next. The two families I ran this experiment on are the ones that have produced the most structural movement over the last six months.
E-001 · AI hardware continues to outperform software.
15 actors, mostly chip/server/infrastructure: NVDA, AMD, SMCI, DELL, INTC, ARM, AVGO, TSM, ASML, MU, MRVL, VRT, ANET, VST, CEG. The expectation tracks whether the hardware/software bifurcation measured in the R-001 case study is holding, widening, or reversing.
E-001 · T0 expectation · 2025-11-19Prediction: the hardware/software gap will persist, with hardware showing early bullish signs and software facing bearish pressures.Grounded in:bullish signals from ASML, AVGO, CEG, VRT, and VST; software under pressure with CRM showing downside confirmation; mixed overall outlook with a slight bullish tilt.
E-002 · The loud divergence cluster does not resolve downward.
A cohort of names where narrative pressure has been positive but price has been weak — the “story not being paid” cluster: ANET, VST, NFLX, TTD, MSFT, META, PLTR, CEG, plus two recent additions (ZETA, MELI) not yet in the backtest history. The expectation tracks whether this divergence eventually resolves with price moving down to meet the narrative, or whether the narrative deflates instead.
E-002 · T0 expectation · 2025-11-19Prediction: the divergence will not resolve downward in the near term; the narrative-price gap persists rather than snapping shut.Grounded in:a neutral to slightly bullish field — most cohort members in a neutral state and two showing bullish trends, with no clear downward shift in the narrative.
These are the two objects whose evolution the replay tracks. Each starts as one expectation. Over the next 61 trading days each gets re-checked daily against the structural field, can spawn narrower child expectations when the field splits, and can weaken or be retired when the structure no longer supports it.
What a single trajectory looks like
I ran this replay first on one family — the hardware/software gap (E-001) — over 61 trading days from November 19, 2025 to February 18, 2026. The result was not a noise cloud. It was one initial expectation, six child expectations, and a readable lifecycle.
Here is what the parent expectation lived through, one cell per trading day:
Stable days made up 44% of the window, weakened 33%, strengthened 17%, split 4%, and retired 3%. That is much closer to evolving understanding than to repeated rephrasing. The parent stayed alive through the full window, weakened in January, and partially recovered. Five of the six children were retired; one survived in weakening status at the end.
Broad expectations are durable; narrow ones are mostly noise
The expectation graph for E-001 makes the difference between the parent and its children visible. The parent runs the full window. Most children spawn, struggle, and fade.
Read down the list and a pattern surfaces: nearly every split was a slightly different attempt to name “the subset that is more bullish than the rest of hardware.” The system kept proposing narrower sub-cohorts of the same underlying claim — energy-and-infrastructure leadership, custom-silicon leadership, memory-and-server leadership — and the field refused to sustain any of those distinctions for long. The parent (hardware vs software) was real. The children were over-refinements.
That same pattern showed up across both families. In a forward-correctness overlay, the initial T0 expectations of both major families held up much better than the narrower derived children. E-001’s parent survived 60 days with weakening status but stayed alive. E-002’s parent survived 60 days as active and never weakened materially. The broadest framings the system formed from the full field were the most stable objects in the whole replay.
The narrower children fared much worse. Across E-001 and E-002, nine children were born and seven were retired within the replay windows. That matches the earlier candidate backtests: the system is good at seeing structural differences, but many of the narrower splits it can name do not remain durable for long. The replay makes that visible in a cleaner way than the question graph ever did. The parent holds the thread; the children are attempts at refinement, and most of those refinements fail.
The two families behave differently
Comparing the two parent expectations side by side shows that the hardware/software family is structurally noisier than the divergence-cluster family. The parent for E-002 spent more days stable, fewer days weakened, and survived the full window without ever materially weakening.
This is the kind of structural difference the replay can show that the daily briefing cannot. The two families look superficially similar on any given day, but their trajectories are markedly different. E-001 is an expectation whose underlying field legitimately changes a lot — it splits often, weakens often, recovers often. E-002 is more cohesive.
Weakening is an early-warning signal, not a label
A third finding is that weakening is not just a label. It appears to be an early-warning signal. For expectations that were eventually retired, the weakened event preceded retirement by a measurable amount: about 19.8 trading days on average in E-001 and 13.5 days in E-002.
The average lead time from weakened to retired was roughly two to three trading weeks, across both families and both starting dates.
That does not make weakening a perfect predictor. The observed lead-time distribution ranged from a few days to several weeks. But it does suggest the system is doing more than labelling a retirement after the fact. It is picking up deterioration before the expectation fully breaks.
The pattern held at a second starting date
The most important question after the first run was whether these results were artifacts of one T0. I reran the replay starting February 10, 2026 for both families. The broad patterns held.
| run | born | durable | weakened | retired | incomplete | % durable @ 30D |
|---|---|---|---|---|---|---|
| E-001 · T0 Nov 19 | 7 | 1 | 1 | 0 | 5 | 50% |
| E-001 · T0 Feb 10 | 9 | 2 | 1 | 0 | 6 | 67% |
| E-002 · T0 Nov 19 | 4 | 1 | 0 | 0 | 3 | 100% |
| E-002 · T0 Feb 10 | 3 | 0 | 0 | 0 | 2 | 0% |
E-001 consistently produced more expectations than E-002 across both T0s, which suggests the hardware/software family is structurally noisier while the divergence-cluster family is more cohesive. All four parent expectations across the two families and two T0s survived their replay windows alive. The children remained much noisier than the parents.
The architecture-level result appears reproducible: the system can maintain broad expectation continuity over time, but its narrower descendants are far less reliable.
What this is and isn't
That is the part that feels important to me. The original idea was never “can we generate a lot of clever follow-on questions?” It was “can we make the system’s evolving expectations observable?” The expectation-field replay is the first artifact that actually answers that question.
It lets me ask: what did topicspace believe about the hardware/software gap on each day in this window, when did that belief weaken, when did it split, which narrower interpretations survived, and which were retired? I could not ask that cleanly in the question-graph version, because the graph was centered on questions and candidates rather than on the expectation itself. Re-centering expectations as first-class objects was the necessary correction.
What this does not prove is that the expectations were externally right. The current evaluator tests structural persistence, not truth. It can tell me that an expectation stayed coherent, that weakening preceded retirement, and that broad frames survived where narrow ones did not. It does not yet tell me whether surviving expectations tracked reality well enough to be useful beyond internal coherence. That remains the next layer. The current system is better at maintaining an evolving belief trace than at proving the external correctness of that belief trace.
Still, that is enough to say something meaningful. topicspace now appears capable of maintaining continuity of understanding across changing conditions in a way that is explicit, inspectable, and falsifiable. The strongest objects are broad expectations formed at T0. They can weaken, recover, and spawn narrower children without being silently overwritten. The weaker objects are the children themselves, many of which do not last. That may not be a flaw. It may be the honest shape of belief in a noisy field: durable broad frames surrounded by many short-lived attempts at refinement.
The expectation-field replay suggests that topicspace is better at maintaining broad, evolving expectations over time than at producing many durable narrow refinements — and that is much closer to the original idea than the question graph ever was.
Epilogue · where these expectations stand today
Six months after the original T0, both initial expectations have now broken. More importantly, they broke in different enough ways that the difference is itself a finding. The Feb. 10 T0 ran forward into late April and early May; this is what the two cohorts look like as of 2026-05-12.
E-001 broke compositionally
The original framing was broad: hardware would outperform software. By the end of the Feb. 10 replay window, the expectation’s supporting share had fallen from 1.00 to 0.07. Only CEG still backed the original frame.
What happened is more interesting than simple failure. Hardware did not lose as a bloc. It split internally.
The names that anchored the bullish T0 framing — the AI-infrastructure complex of ASML, AVGO, ANET, VST, CEG, and MRVL — are now mostly in DIVERGENCE or REPRICING: positive narrative, negative price. Meanwhile the names that were neutral or weak at T0 — MU, INTC, AMD, DELL, and SMCI — are the ones actually leading on price.
So E-001 was directionally right but compositionally wrong. Hardware did outperform software on average. But the subset that led was not the one the original expectation implied. The actually-profitable slice turned out to be memory, CPU, and server names where the AI narrative attached late and weakly, while the most narratively confident AI-infrastructure names became the ones the market was least willing to pay for.
E-002 broke directionally
E-002 failed more cleanly. The original framing was that the divergence cluster would persist rather than resolve downward. Instead, the supporting share fell from 1.00 to 0.38, and every member of the cohort is now negative on relative return. The narrative-price gap closed by price falling to meet the narrative, not by the narrative being rewarded.
Software is now the cohort producing the clearest sustained negative-confirming pattern in the system. CRM has moved into NEG_CONFIRMATION. MSFT, META, NFLX, PLTR, and TTD all remain in DIVERGENCE with falling price. Whether that reflects structural weakness or an overshoot in sentiment is a separate question. What matters here is that the market has behaved for months as if software is the weaker side of the field. Persistence is not correctness. But it is not nothing either.
How the break matters more than the break itself
For the experiment, this is the cleanest possible outcome: not that the system was right, but that it stayed coherent long enough to be wrong in a specific, decomposable way.
We can now say how each expectation failed:
- E-001 broke compositionally — hardware outperformed in aggregate, but the wrong subset led
- E-002 broke directionally — the divergence resolved downward, exactly the path the original framing treated as unlikely
That is the value of an honest expectation system. It does not just tell you that a view failed. It tells you which part failed, when it failed, and what replaced it. Knowing how an expectation broke is more useful than knowing only that it broke.