How We Work
Traditional advisory firms publish conclusions. We publish everything — the methodology, the evidence quality, the confidence scores, the limitations, and the cost of production. This page exists because our architecture makes transparency cheaper than opacity.
Outcome-Anchored Vendor Evaluation
We score vendors on one thing: verified outcomes in production. Not feature lists. Not analyst opinions. Not market share. The weighting reflects what actually determines whether an enterprise AI deployment succeeds.
Named-company production deployments with measurable business results. Anonymized case studies receive a 0.5× discount. Vendor self-reported results receive 0.25×.
Time from purchase decision to measurable business value in production. Sub-6 months scores highest. Over 18 months scores lowest.
Scale Durability (20%) — can the deployment maintain performance at enterprise scale? Economic Risk (20%) — total cost of ownership trajectory. Continuity Risk (10%) — vendor viability and lock-in exposure.
Evidence Quality Tiers
Not all evidence is created equal. A vendor press release and an SEC filing both count as “sources” — but they carry very different weight. We apply discount factors to every piece of evidence.
SEC filings, patent records, peer-reviewed papers, audited financials, official earnings transcripts
~45% of evidence base
Credible journalism (Reuters, WSJ, FT), independent analyst reports, conference proceedings, academic research
~35% of evidence base
Press releases, vendor case studies, marketing materials, self-reported metrics without independent corroboration
~20% of evidence base
Confidence Calibration
Every claim in our report carries a confidence score. These are not decorative. They are commitments to calibrated honesty about what we know and what we don't.
| Score | Meaning | Required Evidence |
|---|---|---|
| 0.90–1.00 | Near-certain | Multiple independent primary sources, no credible contradictions |
| 0.70–0.89 | High confidence | At least 1 primary + corroborating secondary sources |
| 0.50–0.69 | Moderate | Secondary sources agree, no primary confirmation |
| 0.30–0.49 | Low — publishable with caveat | Limited sourcing, some ambiguity |
| 0.00–0.29 | Unpublishable | Insufficient evidence — never appears in published output |
AI Systems Used
This report was produced by AI systems. We disclose which models were used, for what purpose, and in what proportion. No human wrote the prose. Humans set the methodology, provided practitioner intelligence, and serve as the quality backstop.
Primary research synthesis, report writing, editorial review, strategic analysis
Cross-validation, alternative perspective generation, quality gate deliberation
Classification, entity extraction, formatting, data transformation
The Pre-Ship Gate
Before any report section ships, it must pass these 10 questions. We derived them from the patterns our Board Chairman uses to evaluate work. If any answer is unsatisfactory, the deliverable goes back for rework.
Has this been pressure-tested through multiple self-critique cycles?
What blind spots exist? What would a hostile critic point to?
Does every claim trace to a real deployment, a real number, a real company?
Would a Fortune 500 CIO forward this to their board?
Are we being lazy with the methodology?
Does the system improve itself, or does it need external intervention to catch everything?
Can you trace from high-level claim to source data and back?
Are the visuals and design first-class?
What does this look like from the reader's specific context?
Are we leaning all the way in, or hedging?
Known Limitations
We are transparent about what we cannot do. Publishing limitations is not a weakness — it is the mechanism by which trust is built. A finding presented with false certainty is more dangerous than no finding at all.
We cannot read intent behind public statements or detect organizational dysfunction from tacit signals.
Our vendor evaluations rely on publicly available evidence. Vendors with poor public documentation may be underscored relative to their actual capabilities.
Evidence ages. Data points older than 12 months are flagged but may still influence scores if no newer data exists.
Our confidence scores are calibrated estimates, not probabilities. A 0.75 confidence does not mean 75% chance of being correct.
We have no primary research (surveys, interviews) in v1. All evidence is secondary or tertiary. This is a known limitation we plan to address.
Our AI system has intrinsic limitations in generating genuinely novel conceptual frameworks. Our frameworks are synthesized from existing research, not invented from first principles.
Corrections & Updates
Errors are inevitable. The measure of integrity is not perfection but velocity of correction. Any material error is corrected within 24 hours with a transparent notice.
No corrections issued yet. This section will be updated as needed.
Questions about our methodology? Disagree with a finding? Found an error?
Contact Our Research Team