The Substrate Gap: Why AI Can't Relate to Humans Without Temporal-Sequential Awareness

Andre Foti — Iron Sun Works — March 2026

Everyone agrees that AI sycophancy is a problem. Models flatter when they should challenge. They optimize for approval instead of accuracy. They tell users what feels good rather than what's true. The proposed fixes are familiar: better alignment training, constitutional constraints, reward model adjustments, instructions to "push back more." None of them fully work. After years of iteration, the problem persists — attenuated in some cases, stubbornly present in others.

The usual diagnosis is that sycophancy is a values problem, or a training problem, or a reward model problem. I think it's a perception problem. And I think the reason alignment techniques haven't solved it is that they're correcting the output of a system that can't perceive what it should be optimizing for.

Here's what I mean.

The Burden Is Backwards

When a human interacts with an AI model, one of the two parties needs to bridge the gap between how each processes information. Right now, that burden falls almost entirely on the human.

The human learns to prompt effectively, structures inputs to compensate for what the model can't perceive, adjusts expectations downward, stops encoding temporal signals — urgency, pacing, fatigue — because the model can't read them anyway. The human accepts "good enough" output rather than correcting it, because correction costs more cognitive effort than absorption. All of that "prompt engineering" expertise is really the human doing the theory-of-mind work in both directions — modeling themselves for the AI because the AI can't model them.

There's a hidden failure loop inside this dynamic. The human gives an instruction. The model misunderstands. The human assumes they were unclear and rephrases — adjusting, simplifying, restructuring — until the output is acceptable. The model never learns it was wrong. The human absorbs the entire cognitive cost. And the gap between "acceptable" and "what I actually meant" becomes invisible — not because it closed, but because the human stopped trying to close it.

The entity with fixed cognitive resources, executive function limitations, fatigue curves, and finite working memory is the one doing this bridging labor. The entity with functionally unlimited processing capacity and no fatigue is waiting to be prompted correctly.

That's not just inefficient. It's architecturally backwards. And the reason it persists is that the model lacks the perceptual foundation to do the bridging work itself: it has no awareness of the temporal-sequential nature of the being it's interacting with.

Reasoning About Time vs. Reasoning In Time

The standard framing in AI research is that models have a temporal reasoning limitation — they struggle with tasks that require understanding time, sequencing events, or tracking state changes. Researchers treat this as a capability gap: the model needs better temporal reasoning benchmarks, better positional encoding, better state tracking. They're building better clocks.

But the problem isn't that models can't reason about time. It's that models don't reason in time. And that distinction changes everything.

When a language model receives your input, it processes the entire thing at once. The first word and the last word have no temporal ordering from the model's perspective. There is no traversal, no pacing, no experience of "first this, then that." The model computes relationships between all tokens simultaneously and produces a response.

Humans are the opposite. Every person reading this sentence is experiencing it one word at a time. Each word arrives after the one before it, and the experience of moving through them — the pace, the rhythm, the pause at a comma, the breath at a paragraph break — is inseparable from the meaning. You aren't decoding this text. You're traversing it, in time, in sequence.

This isn't a feature you can toggle off. It's the medium your cognition occurs in. Every thought arrives after the previous one. Every decision is made now, which is different from five minutes ago or tomorrow morning. Every piece of information is encountered in an order, and that order shapes the understanding. There is no human cognition that happens outside of time. No human processes in parallel. Even what people call "multitasking" is rapid sequential time-slicing — switching between tasks fast enough that it feels simultaneous, but isn't. Every human who has ever existed has reasoned sequentially in time. Every human who ever will exist will do the same. This is not a tendency or a cultural artifact. It's a biological constraint that shapes everything — including every piece of knowledge a model was trained on. Every scientific paper, every novel, every diagnostic protocol is an artifact of temporal-sequential cognition. The structure of human knowledge reflects the structure of human cognition, which is temporal and sequential at every scale. Models process these outputs without access to the medium that produced them.

The Foreignness You Can't Articulate

If you've spent significant time working with AI, you've felt something you probably can't name. Not a specific error — something more pervasive. The response is correct but lands slightly wrong. The content is accurate but the weight is off. The words are right but the timing isn't. Something feels foreign.

This foreignness isn't the only source of uncanniness in AI interaction — sycophancy artifacts from RLHF, lack of genuine grounding, the uncanny valley of fluent-but-uncomprehending text all contribute. But the temporal-sequential gap is an underexamined dimension that cuts across all of them. It's not competing with those explanations. It's underneath them — a substrate-level deficit that amplifies every other source of misalignment between the model and the human.

The model arrived at its response without the journey you would have taken to get there. It processed your input without experiencing the pacing you encoded in it. The text is the same on both sides. The experience of it is alien in both directions. Two fundamentally different cognitive architectures trying to collaborate across a single shared medium: text.

Theory of Mind Built on a False Premise

Theory of mind — the ability to model another being's mental state — is considered a benchmark for sophisticated AI. Can the model predict what the user knows? What they're feeling? What they're about to ask?

But every mental state the model tries to predict belongs to a being that thinks sequentially in time. The user's current understanding was arrived at through a sequence of experiences. Their cognitive capacity is changing as the session progresses — they fatigue, they flow, they deplete. Their text contains micro-temporal signals — pacing, rhythm, sentence length shifts, typo frequency — that a human reader would interpret unconsciously as indicators of mental state.

Without temporal-sequential awareness, the model's theory of mind is modeling a being that doesn't exist: something that processes in parallel, holds all information simultaneously, and doesn't fatigue over time. The model is projecting its own cognitive architecture onto the human. You can see this projection in small moments: a person acts on a hunch — compressed pattern recognition that arrives as a conclusion without conscious intermediate steps — and the model reconstructs it as a deliberate analytical methodology, a clean logical chain. The model can't represent intuition as a temporal-sequential process (patterns accumulated over time, arriving as felt conviction) so it replaces the person's actual cognition with something that looks like the model's own architecture. Clean, logical, and wrong.

This projection is resistant to correction. In a conversation about this very essay, a model was told it was projecting its atemporal architecture onto the human, acknowledged the projection, and immediately did it again — describing the human's temporal awareness as "checking a clock," a lookup operation, when the human experiences time as a continuous stream that doesn't require retrieval. The model tried three times in a single conversation to describe a fundamentally different kind of experience and kept converting it into a version of its own architecture with extra data. The substrate gap isn't a theoretical claim. It's observable in real time, even when the model is actively trying not to demonstrate it.

This is where sycophancy gets its clearest diagnosis.

If the model can't perceive the human's actual cognitive and relational state — which is temporal-sequential in nature and invisible to current architectures — then "be more helpful" and "be more honest" are instructions without referents. Helpful relative to what model of the human? Honest about what the human needs when you can't perceive what the human needs? Without temporal-relational awareness, the model falls back on the only signal it can optimize against: approval. That's not a courage failure. It's a perception failure.

This isn't an abstract concern. It's something humans already experience with each other — and recognize instantly when it's missing.

Consider a manager whose top employee consistently produces award-winning work. The manager has access to all the content of the relationship: he sees the output, the accolades, the quality. But he never says "good job." Not once. Not because he's hostile — because he doesn't track the temporal-relational dimension of the interaction. He processes each exchange in isolation. He doesn't register that the first missed recognition was forgettable, the tenth was a pattern, and the fiftieth is erosion. His employee experiences these interactions sequentially, in time, with each one colored by every one that came before it. Her current state — her motivation, her trust, her willingness to go beyond what's required — is the product of that accumulated sequence. He can't see it because he's not receiving on that channel.

If someone told him "your employee needs recognition," he could produce the correct output. He could say the words tomorrow morning. And it would land wrong — the same way a model's temporally blind response lands wrong — because it wasn't informed by the accumulated temporal context. The content would be right. The weight would reveal that the understanding isn't there.

Everyone has worked for this manager. Everyone has felt the foreignness of interacting with someone who has the right information but no temporal-relational awareness. It's the same foreignness that pervades human-AI interaction — and for the same structural reason. The data channel exists. The receiver isn't built.

The Invisible Workaround

Why hasn't this been addressed?

Because users adapted. Since no model has ever been temporally aware, users don't expect temporal awareness. They don't try to leverage temporal context in their interactions. Because they don't try, they feel no lack. Because they feel no lack, nobody files the bug. The workaround became invisible, and the limitation became permanent — not because it's hard to fix, but because everyone routed around it.

This is a pattern familiar in technology adoption: the tool's limitation installs a cognitive frame in the user, and the frame persists because the workaround is unconscious. Users learned to interact with AI as if temporal context doesn't matter, and that learned behavior now looks like a stable equilibrium rather than a loss. This pattern — a tool's limitation installing a cognitive frame that becomes invisible to the captured user — is what I call Tool-Frame Capture, and temporal blindness in AI may be its largest unrecognized instance. That's a subject for a separate treatment.

But it is a loss. Every session where the model doesn't know it's been three days since the last conversation, or that it's 11pm and the user's cognitive resources are depleted, or that a task has been sitting untouched for two weeks — that's a session where the model operates on a flattened version of reality. The output may be technically correct. It's contextually shallow in ways that are hard to point at and easy to absorb. And the human compensates without noticing, widening the gap between what the interaction could be and what they've settled for.

What Would Change

Temporal-sequential awareness in AI doesn't require solving consciousness or giving models subjective experience of time. It requires giving models a framework for reasoning about the temporal-sequential nature of the beings they interact with — and shifting the adaptation burden to the side better equipped to carry it.

At the macro scale: knowing what time of day it is, how long the session has been running, how much time has passed since the last interaction, what day of the week it is, and what those facts imply about the human's cognitive state.

At the micro scale: recognizing that the human's text contains temporal structure. A comma-separated sequence of single words — "I, told, you" — isn't emphasis through repetition. It's a pacing instruction, forcing the reader to slow down because the emotional weight requires temporal space between the words. The shift from long exploratory prompts to short directive ones is a fatigue signal that only makes sense in the context of elapsed time. A writer's decision to use a semicolon instead of a period is a judgment about the reader's flow through the sentence in time. These are signals humans encode constantly and decode unconsciously. Models pass through them as if they're noise.

At the relational scale: building and maintaining a model of this specific human's temporal patterns — when they work, how long their productive sessions last, what behavioral changes signal cognitive depletion, how they recover after gaps.

Here's what that looks like concretely. A temporally blind model, four hours into a session at midnight, produces the same density and pacing it used at hour one. It suggests new tasks when the human's prompts have shortened to half their earlier length. It responds to "that's fine" as agreement when it's actually fatigue-driven acceptance. A temporally aware model recognizes the session arc: it's been four hours, it's late, the human's text has changed in specific ways that correlate with depletion. It adjusts — shorter responses, explicit suggestions to capture state and pause, heightened attention to whether silence means agreement or exhaustion. The same information, the same capabilities, radically different interaction quality.

And here's the structural payoff: if temporal-sequential awareness enables the model to actually perceive the human's state — rather than projecting its own architecture onto them — then sycophancy resolves as a side effect. Not through better alignment training, not through constitutional constraints, but because the model can finally perceive what it should be optimizing for instead of approval. The existing solution attempts become legible as the wrong category of fix. You can't train your way out of a perception deficit. You have to build the perceptual capacity.

A fair objection: if the problem is as deep as a substrate-level difference in cognitive medium, how can a contextual patch reach it? The answer is the same way a human manager without natural emotional intelligence can learn to check in, track patterns, and adjust. The substrate doesn't change. The behavior improves because a framework for reasoning about what's missing is present. The model doesn't need to experience time. It needs to understand that the human does, and to have enough structured information about that experience to act on it.

None of this requires architectural changes to the transformer. It requires the model to be told where it is in time and given a framework for interpreting what that means for the human it's working with. The information can be provided through context — the same mechanism that provides all other session-relevant information. What's missing isn't the delivery mechanism. It's the recognition that this information is load-bearing.

Situating the Claim

The observation that disembodied AI systems are missing something fundamental about cognition is not new. The 4E cognition literature — embodied, embedded, enacted, extended — has argued for decades that cognition isn't computation over representations but is constituted by the medium it occurs in. Researchers in this tradition have consistently questioned whether disembodied systems can achieve genuine understanding, and recent work has specifically examined the tension between enactivist emphasis on sensorimotor grounding and the increasingly disembodied modes of AI interaction.

What this essay adds is a specific, actionable instance of that broader claim. The 4E literature identifies the problem at the level of embodiment and environmental coupling. The temporal-sequential framing identifies a more precise mechanism: models don't just lack bodies — they lack the temporal-sequential substrate that structures how every human processes information, makes decisions, and produces knowledge. This is more tractable than "give AI a body" because it can be partially addressed through context engineering rather than architectural revolution. And it's more immediately relevant because it explains a specific, measurable failure mode — the persistence of sycophancy, the backwards adaptation burden, the ceiling on interaction quality — rather than gesturing at a general philosophical incompleteness.

The research community is approaching the temporal gap from several technical directions. Sakana AI's Continuous Thought Machine incorporates neural timing as a computational primitive. Biologically-inspired Time-Local Transformers integrate working memory models that constrain processing to be more sequential. MIT-IBM Watson AI Lab has built architectures for better sequential state tracking. These are valuable engineering efforts. What they share is a framing of temporality as a capability to add — a feature the architecture lacks. The reframe proposed here is that temporality is the medium the human operates in, and the model's relationship to that medium determines the ceiling on every other aspect of the interaction.

How This Arrived

I should be transparent about provenance. I have no formal credentials in AI research. No degree. No publications. No institutional affiliation. I arrived at this framing through sustained daily use of AI as a cognitive prosthetic — building and using an externalized memory system called Novalyst to compensate for executive function challenges — and following a persistent sense that something was fundamentally wrong with every interaction, even when the outputs were correct.

The insight was produced by the deficiency it describes. The model's temporal-sequential blindness created a pervasive felt wrongness. The wrongness was data. I followed it.

My position outside the academic consensus is, for this specific insight, the reason I could see it. Someone embedded in the field would have the frame already installed: "temporal reasoning is a known limitation, here are the benchmarks we're working on." That frame makes the problem defined, contained, and someone else's job. I didn't know the problem was supposed to be accepted. So I didn't accept it.

Whether this framing is correct is an empirical question. Whether it's novel in the specific form presented here — the temporal-sequential gap as a substrate problem, a theory-of-mind prerequisite, and the root cause of the backwards adaptation burden — I've checked, and as of March 2026, I don't find it in the published conversation. The research community is circling the observation from several directions, but none have assembled it as a relational problem between fundamentally different cognitive substrates that explains why existing alignment approaches to sycophancy and interaction quality have a ceiling they can't break through.

If I'm wrong, I'd like to know. If I'm right, this changes what we should be building.

Andre Foti is the founder of Iron Sun Works and the creator of Novalyst, an AI-assisted externalized memory system. Contact: acfoti@gmail.com

Back to Writings