Back to Insights
May 20, 2026

PE Due Diligence: Why AI Architecture Matters More Than Model Selection

Private equity firms evaluating AI for due diligence tend to focus on which model to use. The benchmarking data suggests they are focused on the wrong variable. How AI is deployed determines not just accuracy, but whether it works across the full scope of a live deal: persistent knowledge, team collaboration, cost predictability, and security.

Download Article
PE Due Diligence: Why AI Architecture Matters More Than Model Selection

At a Glance

  • Some PE firms are optimizing for the wrong variable. The model you pick matters far less than the infrastructure it runs through.
  • Purpose-built architecture improved accuracy by up to 9.63 percentage points. The gains were largest in exactly the areas where deal teams lose the most time.
  • If your AI resets between sessions, your diligence has gaps. Deal knowledge has to persist across workstreams, documents, and weeks of analysis.
  • Counterparties and investors are starting to ask how your team handles confidential materials when AI touches them. Most firms don't have a good answer yet.

Private equity firms evaluating AI for due diligence spend enormous time debating models: ChatGPT vs. Claude, frontier vs. mid-tier, newest release vs. last quarter’s version. Those choices matter, but they are not what determines whether AI creates leverage at deal scale.

Frontier models have largely converged on a baseline level of competence for diligence tasks. Once firms try to operationalize AI across the deal process, the core problem stops being model capabilities and becomes knowledge persistence. The performance gap increasingly comes from how context is managed, retrieved, and preserved across the process.

That makes infrastructure the determining factor. It governs how documents are ingested, how deal knowledge is maintained across the process, how a team works inside the platform, and how the system handles the volume and sensitivity of real VDR work.

Maintaining the right context across thousands of documents, multiple workstreams, and weeks of evolving analysis is where most AI deployments fall short. The failure point is rarely the model itself. It is the system’s ability to preserve and manage context as diligence evolves.

What PE due diligence actually requires

A typical mid-market VDR may contain hundreds or even thousands of documents including financial statements, credit agreements, customer contracts, management presentations, compliance records, and in many cases scanned PDFs of documents executed years before the current deal team was assembled. The documents arrive in every format and follow no consistent structure. The quality of the scanned documents may be poor. The seller’s narrative runs through all of them, and the deal team’s job is to confirm or challenge it before IC.

That work has four requirements that any AI deployment in PE diligence needs to meet.

Scale. AI has to be effective at analyzing all of the documents in the VDR, not just a curated subset. A model or system that handles 20 or even 100 documents well but fragments at 500 or 5,000 is not reliable in the context of a diligence platform.

Persistence. Due diligence is rarely, if ever, conducted in a single session. It may span weeks, across defined workstreams, with findings from one thread informing how another interprets what it is seeing. A deal environment needs to carry knowledge across that entire process, to ensure continuity, accuracy, and pattern recognition across the full record.

Collaboration. Deal teams are not single users asking questions in sequence. Financials, legal, commercial, and compliance workstreams run in parallel, and the quality of what reaches IC depends on how well those threads stay connected. The environment needs to be built for a team as a foundational component of the architecture, not as an option to try out.

Security. VDR documents include unpublished financials, management agreements, employee information, litigation history, and transaction terms that may constitute material non-public information. The data handling posture of an AI platform matters to internal compliance, to LP due diligence on firm operations, and to counterparties whose confidential materials appear in the VDR.

These are the conditions under which diligence actually happens. Any AI deployment that does not meet all four is only delivering partial capability on a process that requires complete coverage.

What the benchmark data shows about AI accuracy

Whether architecture actually matters as much as the model itself is a testable question. ToltIQ AI Research used the Vals AI CorpFin V2 framework, an independent benchmark comprising 360 prompts drawn from 18 financial documents across 20 PE due diligence use cases, evaluating four frontier models under two conditions: accessed directly and deployed through a purpose-built ingestion and retrieval architecture. Each prompt was run five times per model, producing 14,400 scored responses.

The principal finding was that deployment architecture is at least as consequential as model selection. Three of the four models tested showed statistically significant accuracy gains when deployed through purpose-built architecture, with improvements ranging from 6.63 to 9.63 percentage points. Opus 4.5 achieved the highest aggregate accuracy at 85.11%, against a 76.38% standalone baseline in the same controlled evaluation. Sonnet 4.5 followed at 83.54%, GPT 5.1 at 80.45%. The gains were consistent across architecturally distinct models, which indicates they reflect structural characteristics of the retrieval pipeline rather than any model-specific interaction.

The pattern of gains was not uniform across use cases. The biggest improvements appeared in the exact places where diligence teams lose the most time and miss the most context: locating and reconciling information spread across large, inconsistent document sets. CIM Analysis and VDR Analysis each exceeded 13 percentage points across multiple models. Synthesis-intensive tasks, including Quality of Earnings Analysis and Credit Term Benchmarking, remained below 81% across all configurations, pointing to where human oversight remains essential regardless of platform or model choice.

Since this evaluation was completed, both Anthropic and OpenAI have released new model generations not reflected in this analysis. The same methodology will be applied to those models in a follow-up study. The practical implication for PE firms is direct. A firm optimizing model selection is adjusting a variable that accounts for a fraction of the performance gap. The infrastructure decision is the consequential one.

What to look for in purpose-built architecture

The organizing principle is straightforward: the full data room, indexed once, available to the whole deal team, for the duration of the process.

At ingestion, every document in the VDR should be processed through a pipeline designed specifically for the structural and linguistic characteristics of PE deal documents: intelligent chunking, semantic embedding, vector-based retrieval, and context-aware prompt construction.

Every member of the deal team can then query from that same indexed corpus. The financial workstream and the legal workstream draw from the same knowledge base, with findings accumulating across the process rather than resetting between sessions. At deal scale, 10,000 documents or more, that architecture maintains output consistency and accuracy. This ensures that what gets presented to the IC is a coherent record the whole team built together, not a set of session fragments reconciled at the end.

Platforms that are model-agnostic give PE firms and deal teams access to multiple frontier models within the same environment, allowing them to use different models for different tasks as capabilities evolve, with no vendor lock-in and no workflow rebuild when a new model becomes available.

Token use and cost predictability

AI infrastructure cost in PE is increasingly a budget planning problem, not just a procurement one. The industry shift toward usage-based pricing accelerated in Q1 2026, when Anthropic moved enterprise billing from fixed per-seat subscriptions to per-token pricing with mandatory monthly spending commitments. OpenAI made parallel moves across its enterprise and agentic products during the same period.

For PE firms accessing frontier models directly, costs now scale with document volume, query frequency, and agentic workflow expansion in ways that are difficult to forecast at signing. A firm running multiple active deals simultaneously may face a materially different cost profile than the one initially modeled.

Architecture changes that equation. Systems designed to structure and retrieve information efficiently reduce redundant processing across large document environments, which can materially improve cost predictability as usage scales across active deals.

For PE firms, predictable AI costs are becoming a budgeting requirement, not simply a procurement consideration.

The security requirement

Deal documents are among the most commercially sensitive materials in financial services. The security posture of an AI platform is relevant to internal compliance, to LP due diligence on firm operations, and to counterparties whose confidential materials appear in the VDR.

AI governance is beginning to appear in LP operational diligence questionnaires at some firms, alongside cybersecurity and data handling policies. The question being asked with increasing frequency: how is our confidential information handled when your team runs AI on it? For PE firms, that makes the answer a matter of investor relations as much as internal compliance.

In practice that requires a platform that operates under NDA, stores each firm’s data in a dedicated single-tenant environment, and applies zero data retention. The underlying frontier models should neither retain client data nor train on it. Deal-level confidentiality requires deal-level isolation. These are design decisions that should be foundational to how a platform is built.

The infrastructure decision

The competitive landscape between frontier model providers will keep shifting, model selection will continue to matter, and token pricing will introduce budgetary variables that are difficult to project at signing.

Every variable that matters in AI diligence runs through infrastructure. Whether AI performs at deal scale comes down to how documents are ingested, how deal knowledge is maintained across workstreams, how the team operates inside the platform, how costs behave as usage scales, and how the system handles the security requirements of the material.

At this stage of AI adoption, frontier model access is nearly universal. The choice of infrastructure is now the differentiator that separates deal teams getting real diligence value from those still working around the technology’s limitations.