June 8, 2026

Stop "Tokenmaxxing" Your Diligence

The AI pricing model most PE firms signed up for is changing fast. Token-based billing is replacing per-seat subscriptions, and deal teams running diligence through general-purpose tools are about to find out how much reprocessing actually costs. Whether those tokens go toward analysis or repetition comes down to architecture.

At a Glance

The AI pricing model most firms signed up for suddenly changed from fixed to variable pricing.
A deal team that re-processes the same VDR materials in every session is paying for the same answer multiple times, which maximizes token use, but doesn’t necessarily increase leverage.
Without an efficient architecture, many workflows will consume tokens inefficiently, and the pricing model now makes that visible.

In software engineering, token burn can signal leverage. In diligence, it can signal waste.

The hottest productivity metric in Silicon Valley over the past year has been "tokenmaxxing." Nvidia CEO Jensen Huang said he would be "deeply alarmed" if a $500,000 engineer did not spend at least $250,000 on tokens annually.

The underlying logic equated token volume with leverage, turning consumption itself into a status metric. The point of encouraging token use was not to waste tokens, it was to encourage productivity. But it did not hold up. Employees at prominent firms, like Meta, were caught gaming the system to inflate their token usage and sparked a flurry of headlines declaring the end of tokenmaxxing.

PE firms are similarly pushing their deal teams to use AI aggressively, and that means burning tokens. But PE diligence has different dynamics, evaluation metrics, and definitions for success. In a PE context, firms are focused on efficiency, and that applies to using AI too.

PE firms are not rewarding token consumption for its own sake, but if the platform underneath their workflows was not built for efficiency, the result can look similar: high token burn, without commensurate analytical value. As token bills start rising, firms need to analyze whether the infrastructure underneath their chosen AI platform is burning tokens on analysis or on reprocessing.

AI pricing is shifting from per-seat to per-token

The AI pricing model most firms signed up for suddenly changed from fixed to variable pricing.

Anthropic moved enterprise billing from fixed per-seat subscriptions to usage-based pricing, with seat fees covering platform access and all usage billed separately at standard API rates.
OpenAI shifted Codex from per-message pricing to token-based metering on April 2 across Business and Enterprise plans.
GitHub tightened Copilot usage limits on April 20, temporarily pausing new Copilot Pro signups and acknowledging that agentic workflows have fundamentally changed the platform’s compute demands, with long-running sessions consuming far more resources than the original plan structure supported.

Per-seat pricing subsidized early adoption, but with OpenAI and Anthropic both widely expected to pursue public offerings, token pricing better serves the revenue growth that lofty valuations demand. Every prompt, every response, every agentic tool call drives more token usage.

The reprocessing problem for document heavy diligence

The cost problem in PE diligence is not that firms use too many tokens. It is that without an efficient architecture, many workflows will consume tokens inefficiently, and the pricing model now makes that visible.

When a deal team uses a general-purpose chat interface for diligence, many workflows effectively recreate context repeatedly across users, sessions, and workstreams. The same document gets re-ingested every time someone asks a new question about it.

A team of four running parallel workstreams over three weeks will re-process the same materials repeatedly, paying ingestion costs each time. Every follow-up, every refinement is a fresh round trip through a frontier model. The tokens consumed in re-processing often dwarf the original query.

This mirrors what's happening with coding agents more broadly. An experienced engineer might close a task in three iterations. Someone less familiar with the tooling might need twenty. Both produce a working result, but the token cost profiles diverge by an order of magnitude. The issue is not the usage. It is the absence of architecture to make the usage efficient.

How efficient architecture controls token costs

In a companion piece on AI architecture in PE due diligence, we examined how purpose-built infrastructure produces measurable accuracy gains over direct model access. The same design decisions that improve accuracy also compress token consumption.

Three choices that matter:

Ingestion architecture. When a platform indexes a VDR once, every subsequent query draws from that index. No re-processing. No duplication across team members or workstreams.

Retrieval efficiency. A structured retrieval pipeline delivers only the relevant context to the model for each query. A general-purpose chat interface often resends large portions of prior context with every interaction. Across hundreds of queries over a multi-week process, the gap in token consumption is substantial.

Model routing. Not every diligence task requires a frontier model. Extraction and summarization can run on lighter, less expensive models. Cross-document synthesis and judgment calls need frontier capabilities. A platform that routes intelligently optimizes cost and performance simultaneously.

ToltIQ was built around these principles because the firms we work with need cost behavior they can forecast, not just accuracy they can measure. These are infrastructure decisions that determine what AI actually costs across the life of a deal.

Spend on analysis, not repetition

Jensen Huang’s framing is worth taking seriously, even if his context is a long way from PE. He is right that token consumption will become a meaningful line item for knowledge work. The challenge is for firms to decide whether they should treat token usage as a cost to manage or a signal to optimize around.

In PE diligence, the goal is getting the fullest possible picture of a target company across every document in the data room. A deal team that re-processes the same VDR materials in every session is paying for the same answer multiple times, which maximizes token use, but doesn’t necessarily increase leverage.

Controlling AI costs by capping usage, restricting access, or throttling just pushes the work back onto people. Choosing infrastructure that leverages leading AI models while eliminating redundancy before the meter starts running will provide more value from every token.This way the spend is focused on analysis and synthesis, not repetition.

Tokenmaxxing works for Silicon Valley. For PE firms, the value of AI is measured in competitive advantages and deeper analytical capabilities, not in how many tokens the meter logged.