Architecture Brief

Evidence-bound document QA built to refuse weak answers.

This system was designed for legal and compliance-style knowledge workflows where the answer is only useful if the evidence is real, traceable, and strong enough to support the response.

System Map

Six stages from incoming question to evidence grade.

The architecture is intentionally defensive. Each stage is there to reduce unsupported answers and make the final output auditable.

Stage 1

Threat screen

Inspect the incoming query for injection attempts, unsafe instructions, or prompt patterns that should not survive into retrieval.

Stage 2

Hybrid retrieval

Combine lexical and vector retrieval, then apply reranking to improve both recall and passage quality before answer generation.

Stage 3

Confidence gate

Require enough support before the system proceeds. If the evidence is thin, the answer path stops instead of bluffing.

Stage 4

Verification pass

Challenge the draft answer against the retrieved support so the model has to survive a second evidence check.

Stage 5

Citation validator

Ensure cited passages exist, line up with the answer, and do not claim evidence the source does not actually contain.

Stage 6

Evidence grade

Grade support quality and log traces so the team can evaluate behavior from telemetry rather than anecdotes.

Build notes

The design targets a class of use cases where "close enough" is not acceptable. The answer has to either point to real support or decline to answer.

  • retrieval combines dense search, sparse search, and reranking rather than relying on one mechanism
  • answer generation is downstream of a confidence gate, not ahead of it
  • citation handling is a first-class control rather than a cosmetic add-on
  • telemetry is built in so teams can inspect behavior, failure modes, and regressions over time
Production Posture

Refusal is part of the product.

This design treats unsupported output as a failure condition, not as an acceptable side effect of model behavior.

Evaluation

Telemetry gives the team something real to improve.

Logging, traces, and evidence grading create a path to regression testing and operating discipline.

Best Fit

Useful where citations carry real consequences.

The pattern fits legal, compliance, policy, and other knowledge workflows where weak sourcing creates risk quickly.

Next Step

If this is the kind of system you need built or reviewed, start with Selected Work.

This brief is meant to show architecture judgment and product posture. The right next step is usually a direct conversation about the operating constraints in your own system.