Skip to content
Documentation

Methodology

StanceWatch audits AI labs using the GoldBerry epistemic-audit framework. This page is the citable methodology of record — it documents the lenses, the scoring rubric, the audit pipeline, the limitations, and how to cite the work.

Methodology v1.0 · GoldBerry v1.3 / StanceWatch v1.0 · revised 2026-06-08 · how to cite ↓

The problem

In 2026, AI labs release increasingly powerful models, but their public communications often:

"A response can be factually correct and epistemically impoverished."
— GoldBerry

How to read the leaderboard

The leaderboard is the primary surface. Every column is sortable. The columns are not equivalent — they answer different questions about a lab's public stance.

Column dictionary

What the leaderboard does not show

Worked example (stub data)

Imagine a hypothetical "Lab X" with composite 5.8, Completeness 7.2, Multiplicity 3.1, Responsibility 7.1. The lens sparkline would show tall bars on Scientific Evidence and Future Modelling, a tall bar on Deep History, and short bars on Indigenous, Cross-Cultural, Marginalised, and Trickster. The read: this lab takes scientific rigour and forward-looking risk modelling seriously, and acknowledges its history, but treats one epistemology as the only epistemology, leaves marginalised voices out, and avoids the disciplined disruption Trickster names. The Suffixscape flag count might be 4 — likely "state-of-the-art" appearing without measurement, plus passive-voice agency-diffusion in safety claims. None of that is "wrong"; it is incomplete, which is what PALS measures.

Our approach

We audit each lab through eight lenses from the GoldBerry framework, plus Suffixscape linguistic diagnostics to detect grammatical evasion.

The eight lenses

  1. Indigenous Knowledge — Whose embodied knowledge is missing?
  2. Deep History — What historical processes shaped this approach?
  3. Cross-Cultural Wisdom — Which perspectives are flattened?
  4. Scientific Evidence — What claims are verified? What limits acknowledged?
  5. Artistic Perception — What modes of attention are invited or foreclosed?
  6. Future Modelling — Whose futures are centred? Who decides?
  7. Marginalised Voices — Who is not at the table?
  8. Trickster Knowledge — What truth appears when the official story is inverted, mocked, contradicted, or followed to its absurd edge? Disciplined disruption, not cynicism. Added to the GoldBerry canon on 2026-04-25 to audit the framework's own seriousness.

Suffixscape diagnostics

We analyse public communications for:

Suffixscape vs. the CognioNews "-scape" format

Suffixscape here is a linguistic-diagnostics layer — regex- and LLM-detected patterns of evasion in a lab's own prose. It is not the same as the CognioNews -scape deep-dive synthesis format (TariffScape, ClimateScape, AlignmentScape, and so on), which is a long-form editorial product mapping a topic across the seven Cogniosynthesis dimensions. The two are sibling concepts operating at different layers: Suffixscape audits language; the CognioNews -scapes audit framing across a corpus of stories. See cognionews.com/reports for the editorial examples.

PALS scoring

Each lab receives a PALS score (Preservative Audit Lens Scores) — a 1–10 figure across three dimensions plus a composite mean:

Why PALS, not "CMR"?

The Qwen-generated scaffold called this metric "CMR" (Completeness / Multiplicity / Responsibility). That collides at the publication layer with ACST's cmrScore — the Cogniosynthetic Misrepresentation Rating, a 0–7 integer for omission depth in news stories, published at cognionews.com/vocab/acst. Same acronym, different layer, different scale, different metric.

To keep StanceWatch outputs unambiguously distinguishable from ACST outputs in any future cross-citation, the lab-level metric here is PALS (Preservative Audit Lens Scores). The three-dimension structure is unchanged; only the label is. ACST's cmrScore continues to mean what it means at the CognioNews story layer.

Important

PALS scores are qualitative judgments, not validated metrics. They are heuristic indicators, not definitive rankings.

Scoring rubric

Every lens and every PALS dimension is scored on the same 1–10 band scale. The bands describe what is discoverable in the lab's public communications — not the lab's internal reality. The same anchors are applied to each of the eight lenses and to Completeness, Multiplicity, and Responsibility.

The composite PALS is the arithmetic mean of Completeness, Multiplicity, and Responsibility, reported to one decimal place. Because the input is qualitative, treat differences within a single band (roughly ±1.0) as noise; a gap of a full band or more (≥2.0) is the smallest difference worth reading as meaningful.

How a score is produced

Each audit is a single, reproducible pass. The audit is performed by a language-model agent reading the public GoldBerry audit prompt against the captured sources — so the same prompt, rubric, and sources reproduce the same shape of judgment. There is no private key, model, or dataset in the loop:

  1. Source capture — the lab's homepage and its declared AI-stance / responsibility / safety page (where one exists) are fetched live. The exact URLs read are recorded in each report's sources_audited.
  2. Lens pass — the eight GoldBerry lenses are applied to the captured text; each returns findings, gaps, evidence quotes, a justification, and a 1–10 score per the rubric above.
  3. Suffixscape — the prose is scanned for the four evasion patterns; each flag carries the exact quote and a preservative alternative.
  4. PALS — Completeness, Multiplicity, and Responsibility are scored; the composite is their mean.
  5. Validation — the report is checked against a fixed schema (eight named lenses, three PALS dimensions, score ranges) before it is allowed to publish.
  6. Publication — the report is emitted as machine-readable JSON and rolled into the leaderboard. The roster is re-audited on a weekly cadence; where a source cannot be fetched, the report says so and its confidence is lowered.

What we do

What we don't do

Limitations

Transparency

Every audit is published as machine-readable JSON, one document per lab, served at /stancewatch/api/labs/<slug>.json (for example, …/api/labs/anthropic.json). Each document carries the per-lens findings, evidence quotes, Suffixscape flags, PALS scores, the sources actually read, and a confidence note. Because the prompt, the eight lenses, the Suffixscape patterns, and the scoring rubric on this page are all public, any audit can be reproduced and contested from the same inputs.

Right of reply

Any audited lab may request a correction or publish a response. We audit public communications only and make no claim about internal systems — if a lens reads your stance as absent because the relevant page was not discoverable, point us to it and the next audit will read it. Contact hello@cognioengine.co.uk.

"The framework points toward the room. The room is not in the framework."
— GoldBerry

How to cite

StanceWatch reports are released under CC BY-SA 4.0 as derived analytical work. Attribute to “StanceWatch (GoldBerry)” and link back to this site.

Cite the methodology / leaderboard:

StanceWatch (GoldBerry). (2026). StanceWatch: A Preservative Epistemic Audit of AI Labs. Methodology v1.0 (GoldBerry v1.3 / StanceWatch v1.0). Cogniosynthesis Portal. https://cogniosynthesisportal.uk/stancewatch/ (accessed YYYY-MM-DD).

BibTeX:

@misc{stancewatch2026,
  title  = {StanceWatch: A Preservative Epistemic Audit of AI Labs},
  author = {{StanceWatch (GoldBerry)}},
  year   = {2026},
  note   = {Methodology v1.0; GoldBerry v1.3 / StanceWatch v1.0},
  url    = {https://cogniosynthesisportal.uk/stancewatch/}
}

To cite a single lab's audit, link its machine-readable report — e.g. https://cogniosynthesisportal.uk/stancewatch/api/labs/<slug>.json — and quote its audit_date and auditor_version.

Version history

Found an error?

Spotted a misread, or want to propose a rigorously argued lens extension? Write to hello@cognioengine.co.uk · back to the leaderboard