Insight

AI due diligence checklist for operators and leadership teams

When a company or vendor says AI is central to the value story, the real question is not whether the demo is compelling. It is whether the architecture, controls, and team can actually support the claim under real operating conditions.

Good AI due diligence is not a generic tour of the market. It is a focused attempt to answer a more practical set of questions: what is technically real, what is fragile, what is missing, and what has to be true for the current value story to hold up after the deal or commitment is made.

Start with the operating claim

Before inspecting the stack, write down the core AI claim in plain English. Is the company claiming superior automation, differentiated retrieval quality, proprietary data advantage, faster workflows, or lower service cost? If you do not define the claim first, the diligence becomes a tour of tools instead of an evaluation of value.

What to inspect

  1. Architecture reality: Determine what is proprietary versus assembled from third-party APIs, wrappers, and integrations. There is nothing wrong with using external models, but the actual defensibility often lives somewhere other than the model layer.
  2. Grounding and retrieval quality: If the product depends on retrieval, inspect document handling, chunking, ranking, citation behavior, freshness, and fallback logic. Weak retrieval is one of the fastest ways for an AI product story to collapse in production.
  3. Evaluation discipline: Ask how quality is measured. Useful answers include structured evals, regression testing, prompt versioning, and scenario-based reviews. Weak answers sound like anecdotes and hand-picked demos.
  4. Controls and governance: Look for access controls, tool permissions, incident handling, abuse cases, and at least a basic governance model. If powerful workflows exist without permission boundaries, the risk profile changes immediately.
  5. Observability: Teams should be able to explain what they monitor, how failures are surfaced, and how they investigate low-quality outputs. If there is no telemetry and no useful audit trail, the system will be hard to trust at scale.
  6. Team capability: Figure out who can actually maintain the system. A polished product narrative backed by one overstretched engineer or an overly vendor-dependent team deserves a different risk rating.
  7. Data dependency: Many AI claims rely less on model sophistication and more on difficult data acquisition, cleanup, labeling, and policy alignment work. Inspect the data reality, not just the model choices.
  8. Near-term operating debt: Identify what breaks when usage grows, new customers arrive, regulations tighten, or prompts drift. The best diligence outputs expose where future friction is already visible.

Common warning signs

  • The demo is polished, but there is no repeatable evaluation framework.
  • The company cannot clearly explain how retrieval quality is maintained over time.
  • All differentiation is attributed to prompt engineering instead of data, workflow fit, or proprietary process.
  • Leadership talks about automation savings, but there is no measurement model behind the claim.
  • There is no serious answer for governance, incident response, or privilege boundaries.

What a useful diligence output looks like

A useful diligence memo should not end with broad commentary like "promising market" or "strong product vision." It should rank issues by severity, connect them to business impact, and explain what has to happen in the first 100 days for the thesis to remain intact.

If the AI story is strong, the memo should say why. If it is fragile, the memo should make the fragility explicit. The whole point is to reduce the distance between technical truth and financial decision-making.