Divergence Engines (Part 1): Escaping the Relevance Trap

25. October 2025
AIRAGGraphRAGContext EngineeringRetrievalMemory SystemsKnowledge GraphsComplexity

AI fatigue is real, at least in user-land. Not because people suddenly became Luddites, but because novelty has worn off and a lot of AI interaction now feels like déjà vu and make you want to punch the screen more often than not. Outputs are generic, the linguistic fingerprints — “delve into,” “this isn’t just,” “at the end of the day” — have turned into rage triggers.

At the same time, in developer-land, automated context engineering and agent memory systems are blowing up. New papers on “agentic context engineering” and novel RAG approaches get published daily.

By some estimates, 50–60% of web content is now AI-assisted or AI-generated. Models training on this synthetic data will be forgetting the long tails of the distribution — rare patterns disappearing first, replaced by high-probability stereotypes.

Failed to load mediasrc: https://kqdcjvdzirlg4kan.public.blob.vercel-storage.com/content/articles/2025-divergence-engines/part-1/published/website/images/cover.png

Sounds terrible. But it’s worse. Model collapse isn’t some distant scenario where models eating their own tails turns everything into semantic slop. It’s happening now, in production systems, with measurable effects.

The current hope? Context engineering will solve this. Give the models more tightly tailored inputs. More personal insights. Better retrieval. Smarter memory systems. Every new “AI memory” product promises to remember everything you’ve ever said and had for breakfast and pack it into lean, optimized context windows, ready to surface exactly what you need when you need it.

But context engineering has a structural problem. The infrastructures we are currently building — the memory layers with their retrieval pipelines and ranking systems — all optimize for one thing: similarity. And similarity, left unchecked, eventually kills exploration.

Let’s unpack why this is happening and what we can do about it.

Context

This is Part 1 of a two-part series on Divergence Engines. This article diagnoses why AI systems collapse into sameness. Part 2 provides a technical framework for building alternatives.

The series examines what’s missing from current context engineering and AI memory systems: divergence primitives — mechanisms that prevent collapse into narrow attractor basins by intentionally introducing exploration, contradiction, and structured novelty.

The Problem: Architectures that fear surprise

We usually talk about model collapse as a training data problem: AI generates content, AI trains on that content, distributions narrow, novelty dies.

But there’s another collapse dynamic already happening in every RAG pipeline. You don’t need recursive training cycles to get collapse — you only need infrastructure that worships similarity.

The industry has been shifting from RAG (Retrieval Augmented Generation) to MAG (Memory Augmented Generation) — systems that supposedly remember and adapt over time. But under the hood, they still run on the same mechanisms: embeddings, similarity search, and relevance ranking. The surface changes, the fundamental collapse dynamic doesn’t.

A standard retrieval pipeline — the backbone of RAG and the context assembly of most memory systems — looks like this:

embed → find nearest neighbors → re-rank → (summarize) → generate

Or, in smart-speek:

Given a query q and corpus items d from our dataset D, most retrievers compute relevance as:

rel(q,d)  =  f(s(E(q),E(d))embedding similarity,  r(q,d)learned re-rank,  m(d)meta)\text{rel}(q,d) \;=\; f\Big(\underbrace{s(E(q),E(d))}_{\text{embedding similarity}},\; \underbrace{r(q,d)}_{\text{learned re\text{-}rank}},\; \underbrace{m(d)}_{\text{meta}}\Big)

where:

  • E()E(\cdot) is an embedding function — a compression map
  • s(,)s(\cdot,\cdot) is cosine similarity — a proximity measure
  • r(q,d)r(q,d) is a re-ranker — potentially enforcing majority bias
  • m(d)m(d) might include metadata priors — recency & authority heuristics

Then we skim off the most promising candidates (top-k), maybe re-rank again, optionally summarize, and feed what we found back into the response model so that it can prepare a nice, “helpful” answer. But can it?

Embeddings smooth the manifold, pulling rare structures toward dense regions — edge texture is lost first. Similarity rewards proximity to the current frame, penalizing orthogonal signals. Re-rankers encode majority taste, not future insight. Summarizers compress again, reducing variance. Each layer optimizes for precision, and precision without variance is collapse.

Relevance presents as authoritative ranking but is actually produced by a compression pipeline. Excellent at producing the same kind of answer faster. Terrible at expanding the frame. Similarity isn’t a neutral filter — it’s a gravity well. Embeddings + vector search are inherently mode-seeking. They pull toward density, toward the center of the distribution.

Think about how you actually use these systems:

  • “What did I write about X?” → Retrieves docs most similar to X → Reinforces existing mental model
  • “Summarize my notes on Y” → Collapses diverse perspectives into single summary → Narrows understanding
  • “Find everything related to Z” → Misses orthogonal connections → Closes possibility space

The infrastructure race is on and a lot of effort goes into trying to build “RAG 2.0″. But RAG, based on these heuristics, is really just brilliant at answering the question you asked based on what you already know.

Most second generation RAG systems (or “MAG” systems, Memory Augmented Generation) add a graph layer, hierarchical chunking strategies add structure, progressive summarization maintains more depth, but the paradigm in general stays: find most similar entry nodes, do a bit of graph walking, retrieve, respond.

Even inside knowledge graphs, node connections are typically established by semantic similarity — adjacent concepts get linked because they’re semantically close. The graph structure doesn’t escape the similarity problem; it just formalizes it.

Every technical advance makes this reduction more efficient. We’re getting really good at finding exactly what we’re looking for:

We are building architectures that fear surprise.

Diagnosis: Collapse already happens at inference time

Model-collapse isn’t simply a training risk — it’s an inference-time pattern, as soon as successive outputs get saved back into the system, no matter if that’s simply the conversation history or a more capable memory architecture. Each generation that gets written back into the retrieval corpus reinforces the same patterns.

Here’s a cleaner, tighter version with no redundancy and smoother flow:


All you need is this loop:

Formally, the retrieval-feedback dynamic can be written as:

Ct=TopK(s(E(qt),E(di)))C^t = \text{TopK}\big(s(E(q^t), E(d_i))\big) qt+1=qt+η1kdCtE(d)q^{t+1} = q^t + \eta \cdot \frac{1}{k} \sum_{d \in C^t} E(d)

At each step t, we retrieve top-k most relevant items Cᵗ for query q, and those results nudge the next query qᵗ⁺¹ toward what was already found — η captures this implicit influence of retrieved context being injected back into the context window.

From this update rule it follows that:

s(qt+1,E(d))s(qt,E(d))for all dCts\big(q^{t+1}, E(d)\big) \ge s\big(q^t, E(d)\big) \quad \text{for all } d \in C^t

In plain language: each turn pulls the system toward the same semantic neighborhoods. Similarity compounds, diversity shrinks, and without a counter-force, the search collapses into a narrow attractor basin. Retrieval is a convergent dynamical system by default — which means collapse happens even before you worry about training data or synthetic feedback loops.

This is a direct violation of Ashby’s Law of Requisite Variety: a system can only regulate what it has the internal diversity to respond to. Current AI systems are doing the opposite — optimizing for precision at the cost of adaptability.

Put differently: model collapse is a form of context collapse. In social media, context collapse happens when many different audiences get flattened into one, and nuance disappears. In AI systems, the same thing happens semantically: different meanings, weak signals, and alternative interpretations collapse into a single dominant pattern. The system stops being able to hold multiple contexts at once — and once that happens, exploration dies.

Relevance ranking doesn’t care about possibility spaces. It doesn’t care about discovery. It only cares about what looks closest to what you already asked. It’s like talking to someone who keeps responding:

“What I hear you saying is…”

And then repeats back a slightly polished version of your own thought. Useful, sometimes — but absolutely hostile to finding anything new.

We pretend retrieval is neutral, but it’s not. Retrieval is cognition’s border control — it quietly decides which parts of the world are worth seeing. When that filter is tuned for similarity, the search space keeps shrinking until the system becomes an echo chamber.

Every time you ask a question, the context narrows. Every iteration reinforces the existing frame. And over time, the system loses the ability to surface differences — the raw material of new thinking.

The better retrieval gets under these metrics, the faster it collapses.

If you’ve used ChatGPT’s memory feature for a bit, you will have noticed this: the more “memories” accumulate, the more responses gravitate toward the same centers — or get confused and mix up concerns. The main mechanism behind it is quite simple: store static factoids that are meant to proxy as “truths”, inject them back on every turn, welcome to the down-slide.

The system slides toward semantic heat death — maximum order, minimum adaptability. Differences are energy gradients — they’re what drives thinking forward. When everything becomes similar, you lose the potential energy that allows insights to emerge. Without perturbations, noise, controlled chaos, you reach equilibrium where novel insights becomes thermodynamically impossible.

Every time you click “accept” on a half-assed AI answer perpetuating the same patterns, you’re reinforcing the attractor basin.

To a degree where “model speech” is already bleeding into everyday (human) language.

Designing for divergence

The loop most retrieval systems run on is missing a second force: deliberate divergence.

Cognitive systems don’t stay healthy by hoarding similar signals. Biology figured this out millions of years ago. Cognition works because it oscillates between divergence and convergence, by maintaining contrast — enabling us to step outside the expected pattern and explore alternatives before committing.

Evolution didn’t give us memory as fact-retrieval — it gave us multiple cognitive modes (what neuroscience calls oscillation between the default mode network and executive control):

  • Executive Functioning: focused, on task execution (convergence)
  • Daydreaming: drift around weak signals (divergence)
  • Goal-directed imagination: simulate alternatives (controlled divergence)
  • Sleep consolidation: rebuild connections (divergence/reorganization)
  • Synaptic pruning: remove dead weight (convergence)
  • REM dreaming: pattern matching on random activation (divergence)

Memory isn’t a database — it’s a continously reorganizing network that forgets strategically and recombines promiscuously. REM sleep doesn’t index your day; it scrambles it into weird juxtapositions that sometimes, someday, yield insight. Memory maintenance is a constant ebb and flow. Expand and contract. Fan out associations, then prune weak connections. Re-contextualize, reorganize, reweave.

When I think about thinking, I use a fractal pattern to make sense of it — a recursive generator function that repeats at every scale, propagating abstractions forward and working towards emergent stabilizations of understanding:

  • Sample (what gets your attention) — Extraction
  • Pursue (pull in associations) — Expansion
  • Integrate (overlay against what you know) — Mapping
  • Reflect (evaluate the fit) — Evaluation
  • Abstract (extract features/insights) — Compression
  • Loop (next iteration will be informed by this) — Spiral upwards until patterns stabilize

The pattern is scale-invariant: Sample incoming signals, pursue associations, integrate against what you know, reflect on fit, abstract insights, loop back and spiral one level higher as understanding reshapes attention:

  • Reading a sentence, your visual system samples low-level patterns (letters, edges, strokes) and activates multiple candidate interpretations — language isn’t decoded, it’s hypothesized. You integrate by testing meanings and syntactic frames against context, discarding those that don’t fit (“bank” as finance vs. riverbank). Prediction errors trigger revision until ambiguity collapses into stable meaning. Even here, understanding emerges through exploration before compression — a recursive loop of contrast and resolution.

  • Reading a paper, you don’t just read linearly — you sample abstracts, figures, sections for relevance signals, then pursue associations to prior work and open questions. You integrate each claim against existing models, sometimes reinforcing them, sometimes forcing revision. You reflect on gaps and biases before abstracting takeaways. Meaning doesn’t come from retrieval but from iterative contrast — drifting across interpretations until structure stabilizes.

  • Even when forming beliefs, you sample uneven information — conversations, media, experience — shaped by salience and prior expectations. You pursue links to existing narratives while encountering conflicting signals. You integrate by weighing credibility and coherence, often holding competing hypotheses. You reflect on consequences before abstracting heuristics that feel stable enough to act on. Beliefs aren’t stored — they crystallize across recursive updates, becoming attractor basins that guide future attention.

Pursuing associations to incoming patterns (divergence) explores connections. Abstracting patterns from incoming patterns (convergence) distills insights. The oscillation is the point.

Current AI memory systems pose as brains but never dream. They only do one half: compression. They live in permanent convergence mode — goal, query, retrieval, response. No drift. No recombination. No strategic forgetting or promiscuous recombination — just accumulation of indexed chunks. Memory that only stores is hoarding. They skip exploring connections, discovering unexpected relationships and following tangents to find what you weren’t looking for.

Without engineered divergence, collapse isn’t some abstract risk — it’s the inevitable trajectory of any system that only optimizes for compression.

What's Missing: Divergence Primitives

Why can’t we say this to any retrieval stack today:

“Don’t give me the most relevant things — give me the useful differences.”

There are no primitives for that. There’s no API surface for controlled divergence. There’s no way to tell the system:

  • “Show me contradictions”
  • “Cross the boundary into a new domain”
  • “Search outside my priors”
  • “Expand possibility space before we narrow it”

We built everything around top_k similarity and then wonder why it all feels the same. Exploration isn’t a mystery. It’s just banned by design in modern retrieval.

If retrieval handles extraction, divergence handles expansion. These two should work together. Here’s what divergence primitives, the primitives missing from current context retrieval systems, could look like:

PrimitiveDescription
Exploration modeOptimize for diversity instead of similarity
Associative DriftSample weaker semantic ties to reveal adjacent ideas
Cross-Domain BridgesForce hops across concept boundaries
Contradiction surfacingRetrieve stance-based disagreement
Negative-space mappingReturn what’s missing, not what’s present
Temporal noveltySurface recent but unassimilated signals
Temporal evolutionTrack how priors shift over time
Structural adaptation (Write-Back)Trigger update cascades when new information arrives

Think of these primitives mapping out an expanding cone of possibility: the narrow end is what you explicitly know, moving outward is what connects to what you know, the wide end is what you don’t even know to ask about. This maps to Stuart Candy’s futures cone — probable, plausible, possible, preferable. Most systems collapse this cone, optimizing for the narrow end. Discovery lives at the edges:

Cities scale superlinearly — double the size, get more than double the innovation. Why? Collision probability. Jane Jacobs called it the “sidewalk ballet” — unexpected encounters generating new combinations.

Research on Nobel laureates shows they are significantly more likely to be polymaths — breakthrough work emerges from crossing disciplinary boundaries.

Highly cited papers are disproportionately interdisciplinary, combining insights across attractor basins.

Humans are excellent at amplifying weakly integrated patterns. We just need a weak entry point — a bridge, a metaphor, an unexpected juxtaposition — and we can follow associative cascades to novel resolutions.

The discoveries we make when not looking for a specific answer are often more valuable than the correct answers to our initial questions. But current context systems eliminate that possibility. Sometimes the noise is where the insight lives.

What we can do today: Proto-Divergence-Engines

Let’s look at what’s possible within current confinements. How can we engineer for divergence through user-level techniques, as long as the system-level isn’t yet designed for it.

Sampling temperature variations

Temperature controls randomness — low = convergent, high = divergent. Edge cases and glitch tokens trigger unexpected behavior. When working with AI, don’t just ask “how do I get the right answer?” Ask “how do I sample more of the possibility space?”

Sometimes that means varying the temperature and running the same prompt five times with different seeds, synthesizing differences. Deliberately steering off-distribution, into weird zones. You’re teaching the model to navigate across attractor boundaries.

Sampling model variations

Model alignment is brittle. Slight variations in training — different seeds, data, hyperparameters, system guardrails, etc. — produce different outputs. Different “personalities” that sample from different attractor basins.

What happens if we treated these variations as a feature we could add to or control through a more aligned model? If we started to think more in terms of “model neurodiversity”?

Ensemble methods work precisely because models differ. Different blind spots. Decorrelated errors. MoE (Mixture of Experts) systems outperform single models by capturing different perspectives.

Instead of preemptively dismissing model variations as “misaligned,” ask: what if these “glitches” are signals from adjacent possibility space?

Sampling simulations

And then we obviously also have prompt-space to work with.

Inspired by the WorldSim prompt hack from Karan Malhotra, another approach I like to call “Archaeologies of Latent Futures” is treating the LLM itself as a simulation environment — a space where you can instantiate perspectives, fast-forward years, and explore hypothetical futures. Instead of asking the model direct questions, you instruct it to operate as a simulation where you can set up scenarios, create focus groups of different expertise (sociologists, linguists, designers, ethicists), and query them about topics as they might exist in alternate timelines or future states.

You can ask it to generate Wikipedia entries from 2035. Run focus groups with different critical stances (pessimistic, optimistic, weird). Sample across attractor basins by having different personas collide and generate insights that wouldn’t emerge from a single framing.

This is one tool for broadening sample space. The LLM comes up with something unexpected, you re-integrate, loop. Platforms like WorldSim and WebSim work around similar approaches — treating LLMs as divergence engines through playful exploration.

But again, these are all workarounds. Temperature tweaks, sampling over a set of alternatives and prompt hacks shouldn’t be necessary to broaden a models reasoning space. We’re building agents perfectly capable of divergence (search tools, access to APIs and knowledge bases, etc.) backed by infrastructures designed around reduction.

Divergence should be built into the retrieval and memory layers — not bolted on and left for the user to figure out. A fundamental design problem needs a fundamental solution.

The Convergence Trap

The pattern is clear now: similarity-based retrieval doesn’t just optimize for relevance — it actively collapses possibility space. Model collapse is not a future risk or a training data problem. It’s happening at inference time, in production, with every query.

We’re optimizing for only half of what cognitive systems need. Biology evolved both convergence and divergence. Focused execution and daydreaming. Goal-directed retrieval and associative drift. We’re building systems that only do the first half — and call it progress.

Why? Because precision is measurable, divergence is not. “Give me exactly what I asked for” maps to KPIs: latency, recall@k, user satisfaction. “Surprise me with something orthogonal” doesn’t. The infrastructure converges toward similarity not because it’s optimal, but because that’s what gets funded, shipped, and celebrated in benchmarks. We’ve built an entire evaluation apparatus around convergence metrics.

The AI community keeps trying to fix this with scale — longer context windows, bigger embeddings, more exact retrieval. But collapse isn’t a resource problem. It’s a geometry problem. When your search strategy pulls toward density, more context just accelerates the convergence. More scale, same attractor basin.

So here’s the harder question: can we build business models around exploration when everyone’s rewarded for extraction? Can you charge for “useful friction”? The economics of AI reward speed and precision. Divergence engines would need different incentives — value based on discovery, not efficiency. Metrics that prize expanded possibility space over narrowed time-to-answer.

Every RAG system, every AI memory product, every context engine being built right now is making architectural choices that either preserve possibility space or collapse it. Code compounds. Infrastructure fossilizes. The systems we ship today determine what kinds of thinking become possible tomorrow.

The techniques exist. Temperature control, ensemble sampling, cross-domain bridging, contradiction surfacing — we know how to build divergence primitives. What’s missing isn’t capability. It’s will. And capital. And a way to measure whether the system is actually helping you think or just reflecting your priors back at higher resolution.

The choice isn’t technical. It’s whether we build infrastructures that preserve open futures — or whether we let the economics of “helpful” optimize us into increasingly effective echo chambers. Differences are energy gradients. They’re what drives thinking forward. Let’s keep the temperature gradients alive for a bit longer.