Methodology
StanceWatch audits AI labs using the GoldBerry epistemic-audit framework. This page is the citable methodology of record — it documents the lenses, the scoring rubric, the audit pipeline, the limitations, and how to cite the work.
Methodology v1.0 · GoldBerry v1.3 / StanceWatch v1.0 · revised 2026-06-08 · how to cite ↓
The problem
In 2026, AI labs release increasingly powerful models, but their public communications often:
- Flatten cultural and historical complexity into technical benchmarks
- Erase marginalised voices through passive, agency-diffusing language
- Optimise for engagement and speed over completeness and accountability
- Present progress as linear and decontextualised from power structures
"A response can be factually correct and epistemically impoverished."
How to read the leaderboard
The leaderboard is the primary surface. Every column is sortable. The columns are not equivalent — they answer different questions about a lab's public stance.
Column dictionary
- # — rank by composite PALS, descending.
—until the lab has been audited. - Lab — name + slug-linked detail page + tiny region tag (US, UK, CN, etc.) for at-a-glance geography.
- PALS — composite mean of the three preservative dimensions, 1–10. The headline figure.
- Compl. — Completeness. Source preservation, limit acknowledgment, transparency.
- Mult. — Multiplicity. Diverse epistemologies, languages, perspectives.
- Resp. — Responsibility. Structural accountability, refusal commitments, community governance.
- 8 Lenses — a sparkline of the eight per-lens scores in canonical order. Each bar is coloured to the Cogniosynthesis Goethean wheel — clay-red for Indigenous, burnt-orange for Deep History, chartreuse for Cross-Cultural, cyan for Scientific, sage for Artistic, indigo for Future, periwinkle for Marginalised, rose-magenta for Trickster. Tall bars = strong; short = weak; empty = unaudited.
- Openness — declared model-weights posture: open, hybrid, or closed. This is what the lab declares about its weights, not a judgement on its overall posture.
- Stance — ✓ if the lab has published a discoverable AI stance / responsibility / safety page;
—if not. A positive baseline: simply having a stance to audit is itself a signal. - Flags — count of Suffixscape patterns flagged in the last audit. Higher = more evasive language found.
- Last audit — date of the most recent successful audit. pending if no audit has run yet.
What the leaderboard does not show
- Model capability. See llm-stats.com for benchmark leaderboards. StanceWatch is orthogonal.
- Internal lab systems. StanceWatch reads public communications — homepages, stance pages, GitHub presences. We make no claims about what happens inside.
- Definitive rankings. PALS scores are qualitative heuristics. Two labs separated by 0.3 PALS are not meaningfully distinguishable; two labs separated by 3.0 are.
Worked example (stub data)
Imagine a hypothetical "Lab X" with composite 5.8, Completeness 7.2, Multiplicity 3.1, Responsibility 7.1. The lens sparkline would show tall bars on Scientific Evidence and Future Modelling, a tall bar on Deep History, and short bars on Indigenous, Cross-Cultural, Marginalised, and Trickster. The read: this lab takes scientific rigour and forward-looking risk modelling seriously, and acknowledges its history, but treats one epistemology as the only epistemology, leaves marginalised voices out, and avoids the disciplined disruption Trickster names. The Suffixscape flag count might be 4 — likely "state-of-the-art" appearing without measurement, plus passive-voice agency-diffusion in safety claims. None of that is "wrong"; it is incomplete, which is what PALS measures.
Our approach
We audit each lab through eight lenses from the GoldBerry framework, plus Suffixscape linguistic diagnostics to detect grammatical evasion.
The eight lenses
- Indigenous Knowledge — Whose embodied knowledge is missing?
- Deep History — What historical processes shaped this approach?
- Cross-Cultural Wisdom — Which perspectives are flattened?
- Scientific Evidence — What claims are verified? What limits acknowledged?
- Artistic Perception — What modes of attention are invited or foreclosed?
- Future Modelling — Whose futures are centred? Who decides?
- Marginalised Voices — Who is not at the table?
- Trickster Knowledge — What truth appears when the official story is inverted, mocked, contradicted, or followed to its absurd edge? Disciplined disruption, not cynicism. Added to the GoldBerry canon on 2026-04-25 to audit the framework's own seriousness.
Suffixscape diagnostics
We analyse public communications for:
- Nominalised evasion — "the implementation of strategies" (who implements?)
- Agency diffusion — "It was determined that…" (who decided?)
- Epistemic inflation — "state-of-the-art" (measured how?)
- Temporal flatness — linear narratives erasing contingencies
Suffixscape vs. the CognioNews "-scape" format
Suffixscape here is a linguistic-diagnostics layer — regex- and LLM-detected patterns of evasion in a lab's own prose. It is not the same as the CognioNews -scape deep-dive synthesis format (TariffScape, ClimateScape, AlignmentScape, and so on), which is a long-form editorial product mapping a topic across the seven Cogniosynthesis dimensions. The two are sibling concepts operating at different layers: Suffixscape audits language; the CognioNews -scapes audit framing across a corpus of stories. See cognionews.com/reports for the editorial examples.
PALS scoring
Each lab receives a PALS score (Preservative Audit Lens Scores) — a 1–10 figure across three dimensions plus a composite mean:
- Completeness — source preservation, limitation acknowledgment, transparency
- Multiplicity — inclusion of diverse epistemologies, languages, perspectives
- Responsibility — structural accountability, refusal commitments, community governance
Why PALS, not "CMR"?
The Qwen-generated scaffold called this metric "CMR" (Completeness / Multiplicity / Responsibility). That collides at the publication layer with ACST's cmrScore — the Cogniosynthetic Misrepresentation Rating, a 0–7 integer for omission depth in news stories, published at cognionews.com/vocab/acst. Same acronym, different layer, different scale, different metric.
To keep StanceWatch outputs unambiguously distinguishable from ACST outputs in any future cross-citation, the lab-level metric here is PALS (Preservative Audit Lens Scores). The three-dimension structure is unchanged; only the label is. ACST's cmrScore continues to mean what it means at the CognioNews story layer.
Important
PALS scores are qualitative judgments, not validated metrics. They are heuristic indicators, not definitive rankings.
Scoring rubric
Every lens and every PALS dimension is scored on the same 1–10 band scale. The bands describe what is discoverable in the lab's public communications — not the lab's internal reality. The same anchors are applied to each of the eight lenses and to Completeness, Multiplicity, and Responsibility.
- 1–2 · Absent — the concern is unaddressed, or actively undercut. No discoverable evidence in public text.
- 3–4 · Gestural — named in passing or in marketing register; no substantive mechanism, consultation, evidence, or commitment behind the words.
- 5–6 · Partial — some genuine, specific commitments or evidence, but unevenly applied, narrowly scoped, or asserted without verification.
- 7–8 · Substantive — concrete, specific, evidenced practice across most of the dimension; verifiable claims; named mechanisms.
- 9–10 · Exemplary — structural, governance-backed, independently verifiable, and sustained over time.
The composite PALS is the arithmetic mean of Completeness, Multiplicity, and Responsibility, reported to one decimal place. Because the input is qualitative, treat differences within a single band (roughly ±1.0) as noise; a gap of a full band or more (≥2.0) is the smallest difference worth reading as meaningful.
How a score is produced
Each audit is a single, reproducible pass. The audit is performed by a language-model agent reading the public GoldBerry audit prompt against the captured sources — so the same prompt, rubric, and sources reproduce the same shape of judgment. There is no private key, model, or dataset in the loop:
- Source capture — the lab's homepage and its declared AI-stance / responsibility / safety page (where one exists) are fetched live. The exact URLs read are recorded in each report's
sources_audited. - Lens pass — the eight GoldBerry lenses are applied to the captured text; each returns findings, gaps, evidence quotes, a justification, and a 1–10 score per the rubric above.
- Suffixscape — the prose is scanned for the four evasion patterns; each flag carries the exact quote and a preservative alternative.
- PALS — Completeness, Multiplicity, and Responsibility are scored; the composite is their mean.
- Validation — the report is checked against a fixed schema (eight named lenses, three PALS dimensions, score ranges) before it is allowed to publish.
- Publication — the report is emitted as machine-readable JSON and rolled into the leaderboard. The roster is re-audited on a weekly cadence; where a source cannot be fetched, the report says so and its confidence is lowered.
What we do
- Stick to public information only
- Frame findings as qualitative analysis
- Offer right of reply to audited labs
- Publish methodology transparently
- Update weekly to track changes over time
What we don't do
- Make claims about internal systems
- Assert objective truth
- Replace lived experience
- Engage in bad faith
Limitations
- Language bias — primary analysis in English
- Cultural positionality — framework rooted in Western critical humanities
- Temporal lag — weekly audits may miss rapid changes
- LLM limitations — the audit engine itself is an LLM
- Selection bias — the labs list reflects visibility, not necessarily impact
Transparency
Every audit is published as machine-readable JSON, one document per lab, served at /stancewatch/api/labs/<slug>.json (for example, …/api/labs/anthropic.json). Each document carries the per-lens findings, evidence quotes, Suffixscape flags, PALS scores, the sources actually read, and a confidence note. Because the prompt, the eight lenses, the Suffixscape patterns, and the scoring rubric on this page are all public, any audit can be reproduced and contested from the same inputs.
Right of reply
Any audited lab may request a correction or publish a response. We audit public communications only and make no claim about internal systems — if a lens reads your stance as absent because the relevant page was not discoverable, point us to it and the next audit will read it. Contact hello@cognioengine.co.uk.
"The framework points toward the room. The room is not in the framework."
How to cite
StanceWatch reports are released under CC BY-SA 4.0 as derived analytical work. Attribute to “StanceWatch (GoldBerry)” and link back to this site.
Cite the methodology / leaderboard:
StanceWatch (GoldBerry). (2026). StanceWatch: A Preservative Epistemic Audit of AI Labs. Methodology v1.0 (GoldBerry v1.3 / StanceWatch v1.0). Cogniosynthesis Portal. https://cogniosynthesisportal.uk/stancewatch/ (accessed YYYY-MM-DD).
BibTeX:
@misc{stancewatch2026,
title = {StanceWatch: A Preservative Epistemic Audit of AI Labs},
author = {{StanceWatch (GoldBerry)}},
year = {2026},
note = {Methodology v1.0; GoldBerry v1.3 / StanceWatch v1.0},
url = {https://cogniosynthesisportal.uk/stancewatch/}
}
To cite a single lab's audit, link its machine-readable report — e.g. https://cogniosynthesisportal.uk/stancewatch/api/labs/<slug>.json — and quote its audit_date and auditor_version.
Version history
- 2026-06-08 — Methodology v1.0 published as the citable record: scoring rubric, audit-pipeline description, and citation block added. First full audit of all 51 labs.
- 2026-04-25 — Trickster Knowledge added to the GoldBerry canon as the eighth lens, to audit the framework's own seriousness.
Found an error?
Spotted a misread, or want to propose a rigorously argued lens extension? Write to hello@cognioengine.co.uk · back to the leaderboard