Verification Scarcity: A Systems Model of Agentic AI Constraints
Jim MontgomeryVersion 2.1 —
Abstract
Agentic AI systems face compounding structural constraints arising independently from thermodynamics, capital formation, reliability theory, verification economics, and organizational design. This paper derives those constraints from first principles and identifies the conditions under which reliable large-scale agentic deployment is possible. The central finding is a single organizational variable — correction probability at the verification point — that appears independently across the reliability model, the verification cost model, and the organizational authority model, and is shown to be the same constraint in each frame. The analysis produces a structural prediction about labor market value that inverts the dominant displacement narrative: the capabilities most identified as vulnerable are shown to be structurally required for viable agentic deployment. Empirical data is cited where it confirms structural predictions; gaps are explicitly marked as open variables with specified resolution paths.
Preamble
This document derives structural constraints on agentic AI deployment from first principles — not from literature review. The conclusions below follow from thermodynamics, information theory, computational complexity, and organizational economics. Where empirical data confirms the structural predictions, it is cited. Where empirical data is incomplete, unknowns are explicitly marked
[UNKNOWN: description].
The framework is not a critique of AI capability. It is a derivation of what the physical and mathematical structure of these systems requires and forbids, and what the organizational conditions are under which reliable deployment is possible at all. The analysis converges on a single variable that appears across every layer: the correction probability of an authoritative expert at the verification point.
I. The Physical Layer: Thermodynamic Constraints
1.1 Landauer’s Floor
Every logical operation has a minimum energy cost. The lower bound is set by Landauer’s Principle:
Where is Boltzmann’s constant and is temperature. This is not an engineering problem — it is a physical invariant. No architectural improvement eliminates it.
1.2 Operational Power Model
The total power cost of inference at scale:
- : capacitance
- : switching frequency
- : supply voltage
- : leakage current (dominates as approaches threshold)
- : Power Usage Effectiveness
As approaches the threshold voltage, dominates, creating a hard physical floor on the cost per inference operation. Architectural improvements (sparse models, neuromorphic hardware, edge compute) shift the curve but cannot cross the floor.
1.3 The Jevons Paradox of Compute
Efficiency gains do not resolve the energy constraint. As inference cost per operation decreases, demand increases by a proportionally greater amount, producing net energy consumption growth:
The scale of this dynamic is now measurable. US data centers consumed approximately 4.4% of total national electricity in 2023 and are projected by the US Department of Energy’s Lawrence Berkeley National Laboratory to reach 6.7%–12% by 2028 (DOE/LBNL, 2024). Globally, the IEA projects data center electricity consumption will double to 945 TWh by 2030, with AI-focused data centers growing at 30% annually (IEA, Energy and AI, 2025). In 2025, data centers accounted for approximately 50% of all US electricity demand growth (IEA / Fortune, April 2026).
The Gigawatt Ceiling is therefore a demand-side constraint as much as a supply-side one. Infrastructure buildout accelerates utilization faster than grid capacity scales.
plain language explainer
This section establishes that AI compute runs into physical limits on energy that cannot be engineered away, and that efficiency improvements make the problem worse rather than better.
Landauer's Floor (1.1): Every time a computer performs a logical operation — a single calculation step — it must expend a minimum amount of energy. The formula Wmin = kB × T × ln(2) multiplies three fixed quantities: Boltzmann's constant (a standard physics number that converts temperature into energy units), the ambient temperature, and ln(2) (a mathematical constant arising from the binary nature of computation — one bit means one yes/no choice). At room temperature this works out to an extremely small number per operation, but AI inference involves trillions of operations, and the floor is absolute. No chip design, algorithm, or materials advance can eliminate it. This is a law of physics, not an engineering problem.
Operational Power Model (1.2): The actual power consumed at scale has two main components. The first — C × f × V² — is dynamic power: energy consumed when transistors switch state. Voltage (V) appears squared, which is why chip designers obsessively reduce it: halving voltage cuts this term to one quarter. The second — Ileak × V — is leakage power: current that seeps through transistors even when they are not switching. As designers push voltage lower to reduce the first term, leakage becomes proportionally more significant and eventually dominates. There is a voltage threshold below which transistors stop working reliably; near that threshold, leakage cannot be further reduced. The PUE multiplier adds data center overhead (cooling, networking, lighting) on top. Together these create a hard floor on inference cost per operation that architecture can approach but never cross.
Jevons Paradox (1.3): When something becomes more efficient to run, people use more of it, and total consumption rises rather than falls. The formula dEtotal/dη > 0 states this formally: as efficiency (η) increases, total energy consumption moves in the same direction. The data confirms it: US data center electricity share rising from approximately 4.4% toward 6.7–12%, global data center consumption doubling to 945 TWh by 2030, data centers accounting for roughly half of all US electricity demand growth in 2025. The Gigawatt Ceiling is the resulting constraint: infrastructure buildout cannot keep pace because demand acceleration is structurally faster than grid capacity growth.
II. The Capital Layer: Hardware Obsolescence Cascade
2.1 The Stranded Asset Mechanism
Data center capital is financed on mismatched amortization schedules:
- Building depreciation: 20–40 years
- Hardware depreciation: 3–7 years
- AI hardware economic obsolescence (accelerating): currently compressing toward 2–3 years
When next-generation architecture renders current inference hardware economically uncompetitive before debt service completes, the asset’s revenue-generating capacity falls below its financing cost. The debt does not obsolete with the hardware.
2.2 Capital Entropy
Let be the economic value of deployed hardware at time , and the outstanding debt obligation:
When architectural obsolescence drives below before the amortization schedule completes, .
The scale of the exposure is significant. Goldman Sachs’ baseline model projects $765 billion in annual AI CapEx in 2026 across compute, data centers, and power, growing toward $1.6 trillion annually by 2031 (Goldman Sachs, April 2026). Big-5 hyperscaler spending alone reached approximately $725 billion in 2026 following Q1 earnings revisions (CFA Analysis, April 2026). IEA notes that five large technology companies surged capex to over $400 billion in 2025 and set it to increase by a further 75% in 2026 (IEA, April 2026).
At these investment magnitudes, even a modest debt-financed fraction at 7-year schedules against a 2–3 year economic obsolescence horizon produces stranded exposure in the hundreds of billions USD.
Historical analogs: Telecom dark fiber overbuild (1999–2001); shale debt structured at $100/bbl against collapsed oil prices. Both produced cascading non-performance of loans and destruction of lender balance sheets.
[UNKNOWN: Precise debt-financing fraction of 2026 AI infrastructure capex and lender concentration — required to size the cascade exposure accurately]
2.3 Interaction with Thermodynamic Constraint
Organizations servicing stranded hardware debt while simultaneously absorbing exponential verification overhead (Section IV) face a bilateral cost squeeze: the capital cost of past infrastructure and the operational cost of present verification both compound against revenue.
plain language explainer
This section argues that AI hardware becomes economically worthless faster than the loans used to buy it get paid off, and that at current investment scales this creates a structural financial exposure with clear historical precedent.
The Stranded Asset Mechanism (2.1): Data center buildings are depreciated over 20–40 years. The servers inside are depreciated over 3–7 years. But AI hardware is becoming economically obsolete in 2–3 years — not because it breaks, but because next-generation hardware runs AI inference so much more cheaply that operators using older hardware cannot compete on cost. Revenue falls below what is needed to service the debt used to buy the hardware. The debt does not care: the lender still expects repayment on the original schedule.
Capital Entropy (2.2): The formula Ucapital(t) = A(t) − D(t) is straightforward accounting: at any point in time, take the economic value of deployed hardware (A) and subtract what is still owed on it (D). When obsolescence drives asset value below outstanding debt faster than the repayment schedule retires that debt, the result is negative — the asset is underwater. At $765 billion in projected annual AI capital expenditure for 2026, even a modest debt-financed fraction on 7-year schedules against a 2–3 year obsolescence horizon produces stranded exposure in the hundreds of billions of dollars. The historical analogues — telecom dark fiber (1999–2001) and shale oil debt structured at $100/barrel — both followed the same logical structure and produced cascading loan defaults and lender balance sheet destruction.
Bilateral Squeeze (2.3): Organizations carrying stranded hardware debt face two simultaneous cost pressures: the capital cost of past infrastructure still being paid off, and the growing operational cost of verification as agentic systems scale. Both compound against revenue at the same time. This is not one problem to be solved sequentially but two running concurrently.
III. The Reliability Layer: Geometric Decay
3.1 Base Model
For an -step agentic workflow where each step has independent success probability , system reliability is:
This is a geometric series with no floor above zero. At , :
A 99%-accurate agent executing a 100-step workflow produces reliable output only 36.6% of the time. This is not an edge case — it is the central operating reality of any sufficiently complex agentic pipeline.
Deployment data confirms the structural prediction. Gartner projects over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear ROI, or inadequate risk controls (Gartner, June 2025). Separately, Forrester and Anaconda 2026 data show 88% of agent pilots failing to reach production (Digital Applied, April 2026). Meanwhile, PwC’s 2026 AI Performance Study of 1,217 senior executives confirms that 74% of AI’s economic value is captured by just 20% of organizations, with the majority still trapped in pilot mode (PwC, April 2026).
3.2 The Precision Requirement
To maintain a target reliability as complexity increases, the required per-step precision is:
As grows, . The precision requirement imposed by complexity scales faster than any realistic model improvement trajectory.
3.3 Modified Model: Checkpointing with Correction
Insert verification checkpoints at intervals of steps, each with correction probability :
Key result: As , regardless of . Checkpointing with high-fidelity correction breaks geometric decay.
Critical constraint: is the expensive variable. It requires a human verifier with sufficient domain expertise to distinguish a correct output from a plausible-but-wrong one. Low-authority or low-expertise verification returns , collapsing back toward .
3.4 Modified Model: Parallel Redundancy
Run independent chains, accept best or majority result:
At , base improves to . Cost scales linearly with . Parallel redundancy buys reliability at proportional compute cost but does not break the underlying decay.
3.5 Unified Reliability Model
Combining checkpointing and redundancy:
At each of checkpoints, parallel chains each fail and go uncorrected with probability . The checkpoint is lost only if all chains fail uncorrected: . This is consistent with the correction model in Section 3.3; remains an independent correction event, not a multiplier on per-step success probability.
The only lever that breaks geometric decay without proportionally scaling compute cost is . All other optimizations (parallelism, checkpointing without correction) are cost multipliers on the same degrading base. The escape from reliability decay is not an engineering problem — it is an organizational one.
3.6 Time-Variant Precision Decay
is not static. As models are trained on increasing volumes of agent-generated synthetic data, the Kullback-Leibler divergence between the training distribution and ground truth grows:
Where is the ground-truth distribution and is the synthetic-data-contaminated training distribution. As , the model loses grounding in physical reality (stochastic drift). This adds a time derivative to :
Where is the contamination rate. The geometric decay in Section 3.1 therefore accelerates over time independent of chain length. Reliability is a function of both and .
[UNKNOWN: Empirical measurement of γ for current production model families — requires longitudinal benchmark tracking against verified ground-truth datasets]
plain language explainer
This section establishes that when AI systems execute multi-step tasks, small per-step error rates compound into large system-level failure rates. The math is ordinary probability applied to sequences. The implications are severe and the deployment data confirms the prediction.
Base Model (3.1): R(n) = pn. Take the probability that any single step is correct (p), and raise it to the power of the number of steps (n). At 99% per-step accuracy across 100 steps: 0.99100 ≈ 0.366. A 36.6% chance the full workflow is correct. This is not a worst-case scenario — it is the straightforward output of probability applied to sequences. A chain is only as reliable as the product of every link's reliability. The deployment statistics cited (88% of agent pilots failing to reach production, 40%+ of projects canceled) are consistent with this prediction.
Precision Requirement (3.2): The formula prequired = eln(Rtarget)/n reverses the compounding calculation: given a desired end-to-end reliability and a number of steps, it calculates the required per-step accuracy. As the number of steps increases, the required per-step accuracy approaches 1.0 — perfect. No current model achieves this for long workflows, and model improvement trajectories are not on a path to reach it.
Checkpointing with Correction (3.3): Inserting human review points throughout the workflow — rather than only checking the final output — can break the compounding failure problem. The formula Rcheckpoint = [1 − (1 − pn/m)(1 − vc)]m captures this: divide the workflow into m segments, each reviewed with correction probability vc. As vc approaches 1.0, overall reliability approaches 1.0 regardless of chain length. The critical constraint is vc itself: a reviewer without domain expertise, or without decision authority, has a vc near zero, which collapses the formula back to pn as though there were no checkpoints. Checkbox review by someone who cannot distinguish a correct output from a plausible-but-wrong one is not a checkpoint — it is theater.
Parallel Redundancy (3.4): Running k independent chains simultaneously and taking the best result improves reliability — at k=3, from 36.6% to approximately 74.5% — but at three times the compute cost. The underlying decay is not addressed. You have bought more chances at the correct answer, not a better process for producing one.
Unified Model (3.5): Combining checkpointing and parallel redundancy gives Rnet = [1 − ((1 − pn/m)(1 − vc))k]m. The conclusion is precise: vc is the only lever that breaks geometric decay without proportionally scaling compute cost. Every other optimization — more parallel runs, more checkpoints — is a cost multiplier on the same degrading base. The escape from reliability decay is not an engineering problem. It is an organizational one.
Time-Variant Precision Decay (3.6): The per-step accuracy p is not fixed — it degrades over time as AI models train increasingly on data generated by other AI models rather than by humans or verified real-world sources. The KL divergence DKL(P‖Q) measures the gap between what the real world actually looks like (P) and what the model has learned from its training data (Q). As that gap grows, accuracy falls at a rate given by dp/dt = −γ · DKL(t): accuracy degrades proportionally to how far the model has already drifted, scaled by a contamination rate γ. All earlier reliability estimates in this section assumed p was stable. It is not. Those estimates are therefore optimistic, and the gap widens over time without active intervention to maintain training data quality.
One additional note on p: the section frames precision decay primarily as a training-contamination problem. But per-step accuracy is also simply a property of the model's internal structure at any given moment — and that property is domain-specific. The same model may have p = 0.97 for summarization tasks and p = 0.81 for specialized legal reasoning, not because of synthetic data contamination but because of how and on what the model was trained, and the inherent difficulty structure of the domain. The reliability figures derived above are real, but they require empirical measurement per domain per task type, not inference from general benchmark scores.
IV. The Verification Layer: Cost Asymmetry
4.1 The Fundamental Asymmetry
Generating an agentic output is computationally . Verifying that output — particularly detecting plausible-but-wrong results — approaches -hard for sufficiently complex outputs. The generation-verification cost ratio is therefore not fixed; it worsens as output complexity increases.
Formally, verification cost as a function of output complexity and human cognitive bandwidth :
Verification cost grows faster than generation cost as task complexity increases. This is the structural ceiling on agentic ROI.
4.2 The Admissibility Gap
In high-stakes domains, outputs must be not merely accurate but auditable — bound to a deterministic evidence chain. The gap between outputs that appear audit-shaped (citations, professional prose, specific numbers) and outputs that are actually admissible (bound to verifiable, resolvable evidence) is the Admissibility Gap.
AI systems produce audit-shaped outputs at high volume. The human cost of determining admissibility scales with volume. At sufficient volume, the verification budget is exhausted and admissibility checking becomes stochastic — which means high-credibility errors travel further before detection.
4.3 Verification Budget as Finite Resource
Holding a community’s verification capacity fixed, any increase in agentic output volume mechanically dilutes verification per claim :
This is not a resourcing problem that scales away with hiring. Verification requires domain expertise with long formation timelines. The resource is structurally scarce.
plain language explainer
This section establishes that producing AI output is cheap and fast while verifying that output is expensive and slow — and that the gap between the two widens as outputs become more complex. This asymmetry is the structural ceiling on how much value agentic AI can actually deliver.
The Fundamental Asymmetry (4.1): Generation scales in a manageable way with complexity — computationally tractable. Verifying that sufficiently complex outputs are actually correct approaches a class of problems with no known efficient algorithm: computationally intractable at scale. The formula Cv = ∫ Complexity(O)/B dt adds up verification cost continuously over time: at each moment, cost equals the complexity of the output divided by the human cognitive bandwidth available to evaluate it — how much an expert can process per unit of time. The accompanying statement dCv/dn > dCgen/dn says: as complexity increases, verification cost grows faster than generation cost. You can generate a 10,000-word legal analysis in seconds. Determining whether it is actually correct requires a lawyer, time, and domain knowledge. That gap widens as outputs become more sophisticated, because sophisticated wrong answers are harder to detect than simple ones.
The Admissibility Gap (4.2): In high-stakes domains, an output must not merely appear correct — it must be auditable, traceable to a verifiable evidence chain. AI systems produce outputs that look audit-ready at high volume: proper citations, professional prose, specific numbers, confident framing. Whether any given output is actually admissible requires expert review. As volume grows, the same expert budget is spread across more documents. The fraction receiving genuine admissibility review falls. Errors that would have been caught at low volume now travel further — into briefs, reports, decisions — before detection. Whether any given error gets caught becomes effectively random at high volume.
Verification Budget as Finite Resource (4.3): The formula d(V/O)/dO < 0 states that as output volume increases, verification per output decreases — a fixed total capacity divided by a growing number of outputs. This cannot be resolved by hiring. Domain expertise requires long formation timelines. You cannot hire your way to twice as many experienced radiologists, securities lawyers, or structural engineers in a year. The supply of expert verifiers is structurally constrained regardless of willingness to pay, and the demand for their attention is growing faster than they can be produced.
V. The Organizational Layer: Authority-Expertise Decoupling
5.1 The Principal-Agent-AI Three-Node System
Traditional principal-agent problems involve two nodes: Principal (management) → Agent (employee). Agentic AI introduces a third: Principal (management) → Expert (verifier) → Agent (AI).
When the Expert is denied decision authority, Verification Latency is introduced. Total task cost:
- : labor cost to verify at hierarchy level
- : output uncertainty (entropy) at level
- : delay constant from bureaucratic decoupling
- : number of hierarchical hops between agent output and decision authority
Cost grows exponentially with . At (authority co-located with expertise at the verification point), .
5.2 The Equivalence: Authority Index = Correction Probability
The precision decay model (Section 3.3) and the organizational cost model (Section 5.1) express the same constraint in different frames.
Let be the Authority Index — the degree to which a verifier has decision authority over the output they are evaluating.
When (the expert is an observer with no authority), the incentive to perform high-fidelity verification collapses (moral hazard). The precision decay equation:
At : — maximum decay. At : — decay halted by motivated expert correction.
The equivalence: in the reliability model and in the organizational model are the same variable. Organizational structure is not a soft consideration adjacent to the technical reliability problem. It is a direct input to .
5.3 The Fundamental Invariant
When decision authority is co-located with domain expertise at the point where information and verification intersect, verification latency , correction probability , and the reliability model escapes geometric decay.
This invariant holds wherever authority-expertise co-location is achieved — across domains, organizational sizes, and industries. The specific organizational form is an instance of the invariant, not the invariant itself. The invariant is the thing.
5.4 Adverse Selection of Output
Organizations with (authority-expertise decoupling) do not simply fail to verify — they systematically select against the outputs most in need of expert judgment. High-complexity, high-value agentic outputs require the most expertise to evaluate. Without authority at the expertise level, organizations filter these out in favor of low-complexity, easily-signable outputs.
Result: the measurable productivity gains from agentic AI accrue to organizations with and disappear into verification overhead for organizations with . The PwC 74/20 split is the empirical expression of this selection effect.
plain language explainer
This section shows that organizational structure is a direct technical input to system reliability — not a management consideration adjacent to it. The correction probability vc from the reliability model and the authority index α introduced here are the same variable, seen from different angles.
Three-Node System (5.1): Traditional management theory has two players: a principal (sets objectives) and an agent (executes). AI adds a third node: the AI does the work, an expert verifies it, and management holds decision authority. When the expert and the decision authority are different people at different hierarchical levels, delay and cost compound with every layer between them. The formula Ctotal = Cgen + Σ Ψ(Li, σi) · (1+τ)i captures this: the (1+τ)i term means cost grows exponentially with hierarchy depth — not linearly. Each additional hop multiplies the accumulated cost again. At k=0, where the expert who verifies also has authority to act on that verification, the sum disappears entirely and total cost equals just the generation cost.
The Equivalence (5.2): The authority index α runs from 0 (expert can review but cannot act on findings) to 1 (expert's judgment is the decision). Correction probability vc = f(α): as authority increases, correction probability increases; at α=0, vc approaches zero. The decay equation dp/dt = −λ(1−α) shows the mechanism: an expert without authority has no functional incentive to verify deeply. The output proceeds regardless of their assessment. The cost of careful review falls on the expert; the benefit of catching errors accrues elsewhere; the expert has no lever to act on what they find. Verification becomes performative. Insert α=0 into the reliability formulas from Section III and Rnet collapses to pn. Organizational structure is a parameter in the reliability model.
The Fundamental Invariant (5.3): One structural condition resolves the reliability problem, and it appears across domains, organization sizes, and industries in different forms: decision authority and domain expertise must be co-located at the point where outputs are evaluated. When this condition holds, verification latency goes to zero, correction probability reaches its maximum, and the system escapes geometric reliability decay. Any organizational structure that achieves this works. Any structure that separates authority from expertise fails — regardless of how sophisticated or well-intentioned the design.
Adverse Selection of Output (5.4): Organizations with low α do not simply verify poorly. They systematically avoid the outputs that most need expert verification, because those outputs are also the hardest to approve without genuine expertise. High-complexity, high-value agentic tasks require the most expert judgment to evaluate. Without authority at the expertise level, these create unresolvable friction. The path of least resistance is to process lower-complexity, more easily approved outputs instead. The result: organizations with high α capture the gains from agentic AI; organizations with low α churn through verification overhead on low-value work and miss the value entirely. The finding that 74% of AI economic value is captured by just 20% of organizations is the empirical expression of this selection dynamic.
VI. The Integrated System Model
6.1 Net Utility Function
Agentic system viability over time:
| Term | Definition |
|---|---|
| Gross value: function of Autonomy and Data Fidelity | |
| Energy cost: | |
| Verification cost: labor against uncertainty | |
| Maintenance: hardware amortization and model retraining |
System survives only if . The stranded asset cascade (Section II) adds a time-indexed debt service term to the cost side, further compressing the window of viability for organizations carrying hardware debt against accelerating obsolescence.
6.2 The Trust-Autonomy Duality
Autonomy and reliability are coupled, not independent:
- : complexity factor (increases with chain length)
- : human correction rate (function of expert authority )
As grows (model contamination) and falls (authority decoupling), increases without bound. Autonomy growth is eventually overwhelmed by entropy growth.
6.3 The Two Interacting Entropic Systems
Two distinct decay processes interact multiplicatively:
| System | Mechanism | Metric |
|---|---|---|
| Agentic Entropy | Agents optimize for local correctness, eroding global architectural intent | Stochastic drift: local success masks global failure |
| Cognitive Debt | Human supervisors lose system-level mental model as AI velocity exceeds comprehension bandwidth | Oversight collapse: loss of capacity to detect the next wave of entropy |
| Interaction | increases opacity → deepens → prevents detection of next | Non-linear amplification: each system’s failure accelerates the other |
The interaction term is multiplicative, not additive. This follows directly from the framework in Section 5: undetected error accumulation = errors generated × (1 − correction probability). Correction probability degrades as cognitive debt increases. Therefore:
This is a mathematical identity given those definitions — not an empirical assertion. If either factor is zero (no entropy generated, or full correction capacity intact), the compound failure mode does not occur. Additive formulations lack this property: they produce nonzero total entropy even when one system is at zero, which is not consistent with the coupling mechanism described above.
The same multiplicative structure appears across complex systems failure literature: Reason’s Swiss Cheese Model (1990), Perrow’s Normal Accidents (1984), and Shannon’s noisy channel (error rates compounding multiplicatively through chained channels). All share the same underlying form: hazard introduction rate × probability of escaping detection = multiplicative, as a structural identity.
As agentic entropy makes the system more complex and opaque, the cognitive debt of the verifier deepens. Deepened cognitive debt prevents detection and correction of the next entropy wave. The loop is self-reinforcing and accelerates without an external corrective force — which is, again, : expert authority at the verification point.
plain language explainer
This section integrates every prior constraint into a single system model and identifies two interacting decay processes that amplify each other non-linearly. It also identifies the one variable that breaks the cycle.
Net Utility Function (6.1): Unet(t) = V(A, D) − [Φ(E) + Ψ(L, σ) + Ω(M)]. Net utility equals value generated minus total costs. The value side depends on how autonomous the system is and how reliable its data is. The cost side runs three pressures simultaneously: energy costs from Section I (inference energy × electricity price × PUE overhead), verification costs from Sections III and IV (expert labor against output uncertainty), and maintenance costs including hardware amortization from Section II and ongoing model retraining. The stranded asset debt service from Section II adds a time-indexed term that grows as obsolescence outpaces repayment. The system remains viable only if the left side stays positive. These pressures are concurrent, not sequential.
Trust-Autonomy Duality (6.2): Two coupled equations govern how autonomy and uncertainty evolve over time. The first — dA/dt = k₁(V) − k₂(σ) — says autonomy grows when the system delivers value and contracts when uncertainty is high. The second — dσ/dt = αc(n) · DKL(t) − β(H) — says uncertainty grows with chain length and model drift, and is reduced by expert correction. As model contamination increases and authority-expertise decoupling reduces correction rates, uncertainty increases without bound, suppressing autonomy growth. The viability window closes from both sides simultaneously.
Two Interacting Entropic Systems (6.3): Two distinct failure processes interact to amplify each other. Agentic entropy is the tendency of AI agents to optimize locally while global architectural integrity erodes — each step looks correct while the whole drifts. Cognitive debt is the progressive loss, by human supervisors, of the mental model needed to oversee the system — AI velocity exceeds the rate at which humans can track what the system is doing. Each makes the other worse: agentic entropy increases system opacity, which deepens cognitive debt; deepened cognitive debt prevents detection and correction of the next entropy wave, which allows further opacity to accumulate.
The interaction is multiplicative: ΔStotal = ΔSagent × ΔDcog. This is not a modeling choice but a structural identity. An additive formulation would produce nonzero total failure even when one factor is zero, which is physically wrong: if no entropy is being generated, there is nothing for cognitive debt to fail to catch; if correction capacity is fully intact, errors are caught as generated and do not accumulate. Multiplication correctly produces zero when either factor is zero. This same structure appears across established failure literature — Reason's Swiss Cheese Model, Perrow's Normal Accidents, Shannon's noisy channel theorem — all of which share the same underlying form: hazard introduction rate × probability of escaping detection.
The loop is self-reinforcing and accelerates without an external corrective force. That force is, again, vc: expert authority co-located at the verification point. Every failure mode identified across this paper — thermodynamic, capital, reliability, verification, organizational — converges on the same single variable.
VII. The Labor Value Inversion
7.1 The Counter-Narrative Emergent
The dominant public frame asserts that AI displaces knowledge workers. The model produces the opposite structural conclusion.
— the only lever that breaks geometric reliability decay without proportionally scaling cost — requires three things that cannot be automated away:
Domain expertise sufficient to distinguish correct from plausible-but-wrong output. An agent cannot verify another agent’s output against ground truth it doesn’t possess. Only a human with domain expertise can close this loop.
Systems thinking sufficient to detect local correctness masking global architectural failure. This is the Cognitive Debt problem inverted: the same capability that is destroyed by AI velocity in organizations with becomes the irreplaceable asset in organizations with .
Intuition — the pattern recognition capability that operates below the threshold of articulable rules — sufficient to identify when an output is audit-shaped but inadmissible. This is precisely what long formation in a domain builds and what no training run replicates, because it is grounded in physical and social reality, not in the token distribution of prior outputs.
These are not peripheral skills. They are the structural inputs to the only escape from .
7.2 The Empirical Confirmation
Software engineering roles in 2026 confirm the prediction. Senior engineers with systems judgment are increasing in market value. Junior developers whose primary function is producing outputs that look correct are being restructured out — replaced by agents that produce the same class of output at lower marginal cost.
This is not a talent market fluctuation. It is the labor market expressing the mathematical constraint. The model predicted it before the market showed it. The same dynamic will propagate through any domain characterized by high (complex multi-step workflows), high risk (models operating far from verified ground truth), and currently low (authority-expertise decoupling). Healthcare, law, engineering design, and financial analysis are the next wave.
7.3 The Formal Statement
Let be the market value of a labor input:
Labor value is a direct function of verification capability under authority. As agentic output volume increases across the economy, -capable labor becomes scarcer relative to demand, increasing its price. Simultaneously, labor whose primary output is indistinguishable from agentic output — pattern-matching, first-draft generation, routine summarization — is displaced.
The inversion is not symmetric. The increase in value for -capable labor is driven by the mathematical structure of the reliability problem, which gets harder as agentic deployment deepens. The displacement of substitutable labor is driven by cost pressure. Both are mandatory outcomes of the same underlying system.
plain language explainer
This section derives a specific prediction about labor market value directly from the mathematical structure of the reliability problem — and shows the prediction is already being confirmed in the software engineering market.
The Counter-Narrative (7.1): The dominant public frame treats AI as a broad displacer of knowledge workers. The structural model produces a different and more precise conclusion. vc — the only variable that breaks geometric reliability decay without proportionally scaling cost — requires three human capabilities that cannot be automated away.
First: domain expertise sufficient to distinguish a correct output from a plausible-but-wrong one. An AI agent cannot verify another agent's output against ground truth it does not independently possess. The verification loop cannot close without a human who has that ground-truth access.
Second: systems thinking sufficient to detect when local correctness is masking global architectural drift. This is the cognitive debt problem in reverse: the same capability that cognitive debt destroys in low-authority organizations becomes the irreplaceable asset in high-authority ones.
Third: intuition — pattern recognition built through long formation in a domain against real-world outcomes, not against a distribution of prior text outputs. An experienced practitioner notices something is wrong before they can articulate why. This is grounded in physical and social reality that no training run replicates, because training runs learn from text about outcomes, not from the outcomes themselves.
These are not peripheral or soft skills. They are the structural inputs to the only escape from R(n) = pn.
Empirical Confirmation (7.2): The 2026 software engineering labor market is expressing the prediction. Senior engineers with systems judgment are appreciating in market value. Junior developers whose primary output is structurally similar to what agents produce — pattern-matching, first-draft generation, routine implementation — are being displaced. The model predicted this before the market showed it. The same dynamic will propagate through any domain sharing the same structural parameters: long multi-step workflows, models operating far from verified ground truth, and current authority-expertise decoupling. Healthcare, law, engineering design, and financial analysis are identified as the next wave on these grounds — not domain-specific reasoning, but structural parameter matching.
The Formal Statement (7.3): Vlabor ∝ vc(α, expertise, systems judgment). Labor market value is proportional to verification capability under authority. As agentic output volume increases across the economy, demand for vc-capable labor grows while supply remains structurally constrained by long formation timelines. Price rises. Simultaneously, labor whose primary output is indistinguishable from agentic output faces displacement by cost pressure from a substitute that produces the same class of output at lower and falling marginal cost. Both movements are mandatory outcomes of the same underlying system. They are not symmetric: the appreciation of vc-capable labor is driven by the mathematical structure getting harder as deployment deepens; the displacement of substitutable labor is driven by cost pressure. Neither is optional.
VIII. Structural Conclusions
1. Reliability decay is mathematically required for any agentic pipeline of sufficient complexity. No model improvement escapes without external correction. The escape requires , which requires expert authority. This is a structural necessity, not a design choice.
2. Verification cost grows faster than generation cost as task complexity increases. This is the structural ceiling on agentic ROI, and it is not resolvable by scaling compute.
3. Organizational structure is a direct input to system reliability, not a management consideration adjacent to it. . Authority-expertise co-location is a technical requirement derivable from the reliability model.
4. Two entropic systems interact multiplicatively. Agentic entropy and cognitive debt amplify each other non-linearly. The only corrective force is expert authority at the verification point.
5. Hardware capital is being structured on mismatched
timelines. Economic obsolescence is outpacing amortization
schedules at investment magnitudes ($765B+ annual AI CapEx in 2026) that
will produce non-performance cascades in the hundreds of billions USD.
[UNKNOWN: Precise exposure size pending debt-financing fraction data]
6. Model contamination adds a time derivative to base
precision.
is not static; it degrades as synthetic training data accumulates.
Reliability is a function of both pipeline complexity
and time
.
[UNKNOWN: Empirical γ for production model families]
7. Labor value inverts against the dominant narrative. The mathematical structure requires expertise, systems thinking, and intuition — precisely the capabilities the displacement narrative treats as vulnerable. The market is already expressing this in software roles, and will propagate through every high-, high- domain.
IX. Open Variables
| Unknown | Description | Resolution Path |
|---|---|---|
| Contamination rate: speed at which synthetic training data degrades base precision | Longitudinal benchmark tracking against verified ground-truth datasets | |
| Debt exposure | Precise debt-financing fraction of 2026 AI infrastructure capex and lender concentration | Financial disclosure analysis; structured finance data |
| Ceiling on correction probability achievable by human expert under full authority | Empirical study of expert-in-loop system performance at high | |
| Geopolitical and geoeconomic factors | External variables — including competitive capability development, access constraints, and policy interventions — that interact with but are not fully captured by the structural model | Partially inferrable from public data and market signals; military, intelligence, and policy trajectories are not resolvable from open sources |
structural assessment
This note addresses what the model establishes firmly, where it makes claims that slightly outrun their formal grounding, and what remains open.
What the argument establishes: The layered structure — physical constraints → capital mismatches → reliability decay → verification cost asymmetry → organizational structure → labor value inversion — is logically sequenced and each layer genuinely builds on the prior rather than merely following it. The unification of vc across Sections III, IV, V, and VI is the paper's strongest structural move: one variable appears independently in the reliability model, the verification cost model, the organizational authority model, and the entropy interaction model, and is shown to be the same thing in each. That convergence is not forced. It is the output of working each layer through separately. The explicit flagging of unknowns throughout is methodologically sound: the paper distinguishes between conclusions that follow from the structure and parameters that require empirical calibration before specific exposure sizes can be calculated.
Three qualifications worth holding:
The NP-hard verification claim (Section 4.1) is the paper's strongest assertion and its least formally supported. That verification of sufficiently complex outputs is computationally harder than generation is intuitively correct and practically observable. Asserting this reaches formal NP-hardness is a meaningful technical claim stated without proof. It is likely directionally right. It is not demonstrated here.
The Jevons Paradox formulation (Section 1.3) is presented as a structural invariant expressed as a derivative. It is more accurately an empirical regularity with strong historical support. The data cited supports the claim in the current range of efficiency gains. Whether the relationship holds at the extremes is an open empirical question. The paper states it slightly more strongly than the evidence strictly licenses — though this does not materially affect any downstream conclusion.
The equivalence vc = f(α) (Section 5.2) is the hinge on which the entire organizational layer turns. The underlying behavioral claim — that experts without decision authority perform lower-quality verification due to moral hazard — is well-grounded in organizational economics and consistent with observed behavior. The paper treats it as a mathematical identity derivable from the reliability model. It is in fact an empirical claim about incentive structures that the model then formalizes. The claim is correct; its epistemic status is slightly elevated above what it has formally earned.
What the paper does not address: Whether AI systems could develop verification capabilities that materially change the vc dynamic — the paper assumes verification remains a human function without stating that assumption explicitly. What happens if vmax (the ceiling on human expert correction probability under full authority) is materially lower than the model implicitly assumes — this is flagged as an open variable but not stress-tested in the body. Regulatory responses to the capital cascade are not modeled. The geopolitical and geoeconomic factors acknowledged in Section IX are external variables the structural model cannot resolve from first principles.
Overall: This is a structural argument, not a trend piece. The thermodynamic, capital, and reliability layers rest on physics, accounting, and probability respectively, and the conclusions follow necessarily from their premises. The verification and organizational layers contain the paper's only genuine argumentative dependencies — empirical claims about human behavior and computational complexity that are well-reasoned and consistent with available evidence, but not derived in the same sense as the prior layers. The labor inversion conclusion is counterintuitive, follows legitimately from the structural model, and is already being confirmed in the software engineering market. The three qualifications above are worth carrying as you read — they do not undermine the argument, but they mark where the paper is reasoning rather than deriving.
Document captures the structural model as of 2026-05-02. Mathematical additions and empirical calibration of open variables to follow as data emerges.