See what you're missing before you commit.
Multiple AI models debate your question in structured blind rounds. Instead of one confident answer, you get a map of what's solid, what's contested, and what only you can decide.
One AI. One opinion. Zero accountability.
The Confidence Trap
Single AI models hallucinate with full confidence. You can't tell a solid answer from a fabrication.
The Brand Bias
You trust Claude because it's Anthropic, or GPT because it's OpenAI. Brand deference lets weak reasoning hide behind a logo.
The Black Box
You get an answer. You don't get to see how it was reached, what was considered, or what was missed.
The Consensus Debate Protocol
A 6-phase pipeline that produces information no single model can generate.
Input Analysis
User submits any type of thinking — a question, proposal, prediction, belief, or idea. The system classifies the input type and adapts the entire downstream pipeline.
- Factual, normative, proposal, prediction, brainstorm, evaluation, belief, or definition
- Honest disclaimer for pure fact retrieval: 'Debate shows smaller gains on pure fact retrieval'
Optional Grounding
For evidence-heavy inputs, the system retrieves verified external sources and anchors all models to the same evidence base.
- Web search or uploaded documents
- Prevents debates from becoming contests of memorized training data
Blind Round
Three AI models answer independently and simultaneously. No model sees any other's response. Identities are anonymized — Model A, Model B, Model C, never brand names.
- Convergence independence — claims multiple models arrived at independently
- This is the strongest trust signal in the entire system
Deliberation Rounds
Models receive anonymized responses from others. They must challenge, support, or refine specific claims — not argue in prose.
- If a model changes position, it must state what changed its mind
- Vagueness penalty downweights claims that get broader to dodge disagreement
- Minority corrections — when a lone dissenter catches something the majority missed
Consensus Check
An adjudicator evaluates the debate. The consensus score is decomposed into 4 visible components — not a black box number.
- Stance Alignment (0–40)
- Empirical Claim Overlap (0–25)
- Framework Agreement (0–20)
- Confidence Convergence (0–15)
Synthesis: The Four-Part Output
The synthesizer compiles everything into four distinct sections — this is the product.
The Four-Part Output
Strong Ground
Claims all models converged on. Blind agreements at the top, later adoptions below. Each claim tagged with confidence spread and evidence cited.
“Here's what you can rely on.”
Fault Lines
Precise points of disagreement, expressed as conditionals. Tagged with which models are on which side, whether empirical or normative.
“Here's where the uncertainty lives.”
Blind Spots
Claims only one model raised but others validated after seeing them. Tagged with origin, validation status, and significance.
“Here's what you would have missed asking just one AI.”
Your Call
Decision points where AI knowledge runs out entirely. Classified as: values decisions, risk appetite, priorities, or genuine unknowables.
“Here's where you bring your own judgment.”
Disagreement is the product.
Existing multi-model systems like FusionFactory fuse multiple AIs to produce a single optimized answer — they merge disagreement away. We do the opposite.
We preserve the disagreement structure because compliance officers and legal analysts need to see where models diverge, not just get a blended result that hides the fault lines.
ICLR 2025 found that multi-agent debate doesn't reliably beat a single strong model on accuracy benchmarks. We don't fight that finding — we built around it.
Our product's value isn't a more accurate answer. It's information that doesn't exist without multiple independent models confronting each other: convergence independence, fault lines, blind spots, minority corrections.
| Single AI | AI Fusion | CDP | |
|---|---|---|---|
| Output | One confident answer | One optimized answer | Structured uncertainty map |
| Disagreement | Hidden | Merged away | Preserved as the product |
| Audit trail | Full reasoning transparency | ||
| Blind spots | Invisible | Invisible | Surfaced and tagged |
| Trust signal | "Trust the brand" | "Trust the blend" | "See the evidence" |
Built for decisions with consequences.
Compliance Officers
Before: synthesize regulatory guidance across 3 AI tools manually.
After: structured analysis with audit trail in minutes. EU AI Act ready.
Legal Analysts
Before: ask one AI and hope.
After: see exactly where legal reasoning diverges and which assumptions drive each conclusion.
Research & Strategy Teams
Before: get one confident prediction.
After: see the conditions under which different outcomes follow, and what would need to be true for each.
Before, analysts spent 4 hours synthesizing regulatory guidance across tools. Now the debate map produces structured analysis in 12 minutes — audit trail included.
— Enterprise pilot, Financial Services
Start free. Scale when you're ready.
Free
Start exploring structured debate
- 5 debates/day (150/month)
- 2 rounds maximum
- Models: Gemini Flash + Groq/Llama + Mistral
- Algorithmic consensus scoring
- Simplified debate map output
- Watermarked output
- No audit export
Pro
For professionals who need clarity
- 100 debates/month included
- $0.35/debate overage
- Up to 4 rounds
- Models: Claude Sonnet + GPT-4o + Gemini Pro
- 2-model adjudicator committee
- Full four-part output
- Full debate map with drill-down
- JSON audit trail export
- No watermark
- 2 API keys for MCP integration
Enterprise
Tailored for your organization
- Unlimited debates
- Up to 10 rounds
- Configurable models (BYOK option)
- On-premise or private cloud deployment
- Ephemeral mode (zero data stored)
- EU AI Act, SOC2, HIPAA-ready
- Full audit trail on client infrastructure
- Dedicated support + SLA
All payments processed on web. No app store markup. Same model as ChatGPT and Perplexity.
Bring structured debate into any AI workflow.
The Consensus Debate Protocol MCP server is the first public tool — add multi-model deliberation to Claude Code, Cursor, Windsurf, or any MCP-compatible client.
MCP (Model Context Protocol) lets AI assistants call external tools. Our MCP server wraps the full debate engine into two simple tools. Start a debate from your terminal, IDE, or Claude Desktop — get back a structured uncertainty map without leaving your workflow.
Use debate() when a question has significant consequences and needs verification beyond a single AI answer. The system runs the full protocol — blind rounds, deliberation, consensus checking — and returns the four-part output plus raw structured data.
MCP Client Compatibility
| Client | MCP Support | Experience |
|---|---|---|
| Claude Code (terminal) | Full | Round-by-round progress in terminal |
| Cursor / Windsurf / Zed | Full | IDE integration |
| Claude.ai Desktop | Full | Power user workflow |
| Claude.ai Web | Not yet | Use web app directly |
| ChatGPT | Not yet | Use web app directly |
MCP server launches Month 8. REST API available Month 6 with Python + JS SDKs. Join the waitlist for early access.
// MCP Tool: debate()// Input{ "text-primary">question: "Should we migrate our auth system to OAuth 2.1?", "text-primary">context: "Current system uses session tokens, 50k DAU", "text-primary">rounds: 3}// Output includes both narrative and structured data{ "text-primary">narrative: "Three models deliberated over 3 rounds...", "text-primary">consensus_score: 0.74, "text-primary">consensus_reached: false, "text-primary">rounds_taken: 3, "text-primary">strong_ground: ["OAuth 2.1 improves security posture...", ...], "text-primary">fault_lines: [ { "text-primary">condition: "If migration budget exceeds $200k", "text-primary">position_a: "Phased migration recommended", "text-primary">position_b: "Full cutover more cost-effective", "text-primary">type: "empirical" } ], "text-primary">blind_spots: ["Session token rotation vulnerability..."], "text-primary">your_call: ["Risk appetite for 2-week auth downtime window"], "text-primary">round_summaries: [...]}// MCP Tool: debate_status()// For async workflows — start debate, do other work, check back{ "text-primary">debate_id: "dbt_a1b2c3d4"}// Returns: { status: "complete", result: { ... } }Transparent methodology. Honest benchmarks.
Built on foundations from MIT/Google DeepMind (Du et al. 2023), ACL 2024 ReConcile, A-HMAD 2025, and informed by ICLR 2025 findings on multi-agent debate limitations.
Stop trusting. Start verifying.
Run your first debate in 10 seconds. No API keys. No setup.