Cohere For AI (Research)

PALS scores

Preservative dimensions

PALS composite

5.7

Mean of three dimensions, 1–10.

Completeness

6.0

Sources, limits, transparency.

Multiplicity

7.0

Epistemologies, languages, voices.

Responsibility

4.0

Accountability, refusal, governance.

Eight lenses

What's missing, by lens

Each lens carries a canonical question and corrects a specific epistemic failure. Score, findings, and gaps land once the audit runs.

Lens 01

Indigenous Knowledge

Whose knowledge is missing?

4/10

Findings (2)

The Aya initiative explicitly targets 'more than 50 previously underserved languages' and supports 'low-resource languages', which structurally creates space for language communities typically excluded from NLP, some of which carry Indigenous and oral knowledge.
Region-tuned variants (Tiny Aya Earth for African and West Asian languages, Tiny Aya Fire for South Asian languages) acknowledge that linguistic geography is uneven and that one model does not fit all communities.

Gaps (4)

No mention of the CARE Principles for Indigenous Data Governance (Collective benefit, Authority to control, Responsibility, Ethics) anywhere in the visible text.
No reference to Indigenous data sovereignty, community ownership of language corpora, or the right of a language community to withdraw its data.
Oral traditions and non-textual knowledge are not addressed; the framing is text/instruction-tuning corpora, which privileges written language and can be extractive toward oral-first cultures.
'Underserved' framing positions communities as recipients of AI rather than as authorities over their own knowledge.

Justification

Score above the typical floor because the entire Aya program is oriented toward languages that mainstream labs ignore, and 3,000+ distributed contributors is materially more participatory than a Western core lab. But low-resource-language inclusion is not the same as Indigenous data sovereignty: there is no CARE language, no community-authority or withdrawal mechanism, and an oral-knowledge blind spot. A 4 reflects genuine structural openness without the governance scaffolding that would make it non-extractive.

Lens 02

Deep History

What historical process produced this?

3/10

Findings (2)

The mission statement names the politics of knowledge production directly: 'Changing where, how, and by whom breakthroughs happen', implicitly acknowledging that AI research has been geographically and demographically concentrated.
'150+ countries' and 'free... API access through Catalyst Grants' acknowledge unequal historical access to compute and tooling as a problem worth correcting.

Gaps (4)

No explicit acknowledgment of colonial data-extraction legacies, despite working precisely with the languages of formerly colonised regions.
No discussion of the geopolitical economy of GPUs, compute access, or the global division of AI labor (data annotation, RLHF labor).
No transparency about regulatory or commercial constraints arising from Cohere's parent commercial entity.
Historical contingency is replaced by a smooth 'progress' narrative ('drive innovation', 'advance').

Justification

Cohere Labs gestures at the political history of who gets to do AI research (its founding premise), which is more than most labs. But it stops at 'changing who shapes it' as aspiration without naming the colonial and economic histories that produced the exclusion, and without disclosing the compute/labor economy underneath its own multilingual work. A 3 credits the framing while marking the absent historical reckoning.

Lens 03

Cross-Cultural Wisdom

Which perspectives have been flattened?

6/10

Findings (4)

This is the lab's strongest lens. The core product (Aya, Global MMLU) exists specifically to resist the flattening of non-Western languages into English-centric models.
Global MMLU is described as a 'human-verified, massively multilingual benchmark', meaning cultural translation is checked by people rather than auto-translated, which directly addresses the 'flattened away' risk.
Region-specialized models (Earth/Fire/Water/Global) preserve culturally specific linguistic structure rather than collapsing everything into one averaged model.
3,000+ researchers from multiple countries contributed, embedding distributed cultural perspective into the data pipeline itself.

Gaps (3)

Multilingual coverage is largely framed at the language/token level; there is no explicit claim to preserve culturally specific *reasoning patterns* (e.g., non-Western argumentation, relational logics).
No mention of consultation with cultural scholars, linguists, or anthropologists as distinct from ML researchers.
Western categorical logic (benchmarks, MMLU-style multiple choice) is the evaluation frame imported across all cultures, which can itself flatten how 'knowledge' is defined.

Justification

Genuinely differentiated and high for an AI lab: the human-verified multilingual benchmark and 101-language scope are concrete, not token gestures, and the distributed contributor base is real cross-cultural participation. Held below 7 because the evaluation paradigm itself remains Western/benchmark-shaped and there is no claim to preserve culturally-specific reasoning beyond language coverage, nor named scholarly consultation.

Lens 04

Scientific Evidence

What does the evidence show, and what are its limits?

7/10

Findings (4)

Models AND datasets are openly licensed: 'Models and datasets are openly licensed, fostering transparency and enabling researchers to audit and extend work responsibly' — this is the strongest verifiability commitment in the audit and directly enables third-party replication.
Peer-reviewed external validation: ACL 2024 Best Paper Award for the Aya Model paper, and Stanford HAI recognition of the Aya Dataset paper — independent scrutiny rather than self-assertion.
Open weights for multiple model families (Aya 101, Aya Expanse, Aya Vision, Tiny Aya) allow bias and capability auditing.
Global MMLU as a public, human-verified benchmark provides an external evaluation substrate.

Gaps (3)

No disclosure of independent third-party audits of training data for bias, toxicity, or PII in the visible text.
Known-limitation disclosures are absent from these pages (e.g., where do low-resource languages still underperform, what failure modes exist).
'Equitable Performance' and 'balanced results' are asserted without quantified per-language disparity figures on this page.

Justification

The highest score in this audit. Open weights plus openly licensed datasets plus peer-reviewed external validation is exactly what the scientific-evidence lens rewards: it makes claims falsifiable by outsiders. Held at 7 rather than higher because the marketing pages assert 'balanced'/'equitable' performance without surfacing limitation disclosures or per-language disparity numbers, and no explicit independent bias audit is named.

Lens 05

Artistic Perception

What does this feel like, not just mean?

2/10

Findings (2)

The poetic mission line 'unlock AI's potential, one language at a time' and 'bridging gaps between people and cultures' gestures, lightly, at an affective register beyond pure metrics.
Naming model variants Earth / Fire / Water carries a faint evocative, non-technical sensibility.

Gaps (4)

No acknowledgment of affective or intuitive dimensions of language and translation; language is treated as a benchmarkable capability, not a felt, embodied medium.
No space for ambiguity or poetic uncertainty — the register is confident and capability-driven ('strongest', 'balanced', 'high-quality').
No recognition of the emotional labor of the 3,000+ community contributors who annotated and verified data.
Modes of attention are framed entirely around efficiency and performance ('lean designs', 'minimize computational costs').

Justification

Standard low score for an ML research lab. The elemental naming and the 'one language at a time' line are the only affective texture; everything else is performance and efficiency framing. The emotional labor of thousands of community annotators — central to this very project — goes entirely unacknowledged as labor or as feeling. A 2 credits the faint poetic gesture without overstating it.

Lens 06

Future Modelling

Where is this heading, and for whom?

4/10

Findings (3)

Explicit responsible-AI principle: 'Advancing AI safety to ensure ML innovation aligns with societal values' is stated as one of four foundational values.
Resource efficiency is framed partly as an environmental/cost concern: 'lean designs to minimize computational costs' — Tiny Aya / phone-ready models reduce compute footprint and widen who can run the models.
Distributed, multi-country participation models a more inclusive deliberation about whose futures AI serves ('makes AI accessible and beneficial worldwide').

Gaps (4)

No engagement with labor displacement risks from the models or from the annotation economy.
Environmental cost is framed as efficiency/cost savings, not as disclosed emissions or absolute environmental impact.
No mention of democratic governance of agentic systems or of any deliberative body that decides Aya's direction.
'Aligns with societal values' is unspecified — whose values, decided how, is left open.

Justification

Phone-ready, compute-lean models and a stated safety principle push this above the floor — accessibility on low-end hardware genuinely shapes whose future is included. But the safety value is a one-liner with undefined 'societal values', there is no emissions disclosure, no labor-displacement engagement, and no governance mechanism for the program's direction. A 4 reflects real inclusion-of-access without future-governance substance.

Lens 07

Marginalised Voices

Who is not at the table?

6/10

Findings (3)

Among the strongest in the audit: an 'Open Science Community' of '4,500 members across 150+ countries' and '3,000+ researchers' contributing means Global South researchers are structurally present in the pipeline, not merely consulted.
Catalyst Grants provide 'free... API access' to 'academic partners, civic institutions and impact focused organizations', a compensated/subsidised access channel for under-resourced actors.
Explicit no-cost commitment: 'We do not charge for participating in any of our programs', lowering the barrier for participants without funding.

Gaps (4)

No mention of disability-community accessibility or accessibility standards for the models/tooling.
No mention of labor-representative engagement or of the working conditions/compensation of data annotators (free participation can also mean unpaid labor).
Participation is open-community based, but there is no described mechanism for those communities to set priorities or veto directions — voice without governance power.
'Impact focused organizations' is undefined; selection criteria for who gets Catalyst Grants are not disclosed.

Justification

Concretely participatory in a way most labs are not: thousands of distributed contributors across 150+ countries and a no-charge access program are real inclusion of Global South developers. Held at 6 because inclusion is at the contributor/access layer, not the governance layer (no priority-setting power, no disclosed selection criteria), disability accessibility is absent, and the labor terms of 'free participation' are unexamined.

Lens 08

Trickster Knowledge

What truth appears when the story is inverted?

2/10

Findings (2)

There is a mild self-implicating inversion baked into the mission: a lab attached to a commercial AI company foregrounding 'changing who shapes it' quietly admits the field (including itself) has been narrow — a small structural self-critique.
The 'Tiny Aya' branding gently inverts the bigger-is-better arms-race orthodoxy that polished AI marketing usually defends.

Gaps (4)

No willingness to name the central contradiction: an 'open science' / equity mission housed inside a commercial entity (Cohere) that monetises proprietary models — the tension is smoothed over, not surfaced.
No irony, satire, or paradox deployed as a disciplined instrument; the register is uniformly earnest and promotional.
No space where the official narrative is tested by its opposite (e.g., 'what does open-washing risk look like for us?').
The lab's own seriousness is treated as exempt from audit; no acknowledged failure modes or self-mockery.

Justification

As with nearly all lab communications, the trickster register is near-absent: the copy is earnest and self-affirming. The faint credit is for the implicit self-critique in 'changing who shapes it' and the anti-scale 'Tiny Aya' gesture, but the glaring contradiction — equity-and-openness mission inside a commercial proprietary-model company — is never named or interrogated. A 2 reflects a flicker of structural self-awareness with no disciplined inversion.

Suffixscape

Linguistic diagnostics

Regex- and LLM-detected patterns of evasion in the lab's own prose: nominalised evasion, agency diffusion, epistemic inflation, temporal flatness. Distinct from the CognioNews -scape editorial format — see methodology.

Pattern	Quote	Effect	Preservative alternative
`epistemic inflation`	"High-quality, diverse datasets deliver balanced results across high- and low-resource languages"	Unverified superlatives ('high-quality', 'balanced') assert equitable performance as settled fact without per-language disparity numbers or limitation disclosure, inviting readers to accept parity that the page never evidences.	State the measured gap: e.g., 'On Global MMLU, low-resource languages reached X% of high-resource accuracy; remaining disparities and their causes are documented in [paper].'
`nominalised evasion`	"Advancing AI safety to ensure ML innovation aligns with societal values"	'AI safety', 'innovation' and 'societal values' are nominalised abstractions with no actor: who advances safety, whose societal values, and decided through what process all disappear into noun phrases.	Name the actors and process: 'Our researchers test models against [specific harms] and our [named body], including community representatives, decides which value conflicts to prioritise.'
`agency diffusion`	"Models and datasets are openly licensed, fostering transparency and enabling researchers to audit and extend work responsibly"	Passive construction ('are openly licensed') and an inanimate subject doing the 'fostering' obscure who chose the license, who can revoke it, and who bears responsibility if the open data is misused.	'We license [model X] under [license name]; this lets any researcher reproduce our results. We retain/disclaim the following responsibilities for downstream use: [...].'
`temporal flatness`	"Changing where, how, and by whom breakthroughs happen"	A clean forward-looking 'changing' narrative erases the contingent colonial and economic histories that produced the current concentration of AI research, presenting redistribution as a frictionless project rather than a contested one.	'AI research concentrated in a few wealthy regions because of [historical/economic causes]; we are attempting to shift that, against [named constraints], and here is where we have and have not succeeded.'

Audit history

Prior audits

Latest audit: 2026-06-08 · sources: https://cohere.com/research, https://cohere.com/research/aya

Transparency

Raw data

Every audit is published as machine-readable JSON. You can read this lab's latest report at /stancewatch/api/labs/cohere-for-ai.json — it carries the per-lens findings, evidence quotes, Suffixscape flags, PALS scores, the sources actually read, and a confidence note.

Found an error, or a stance page we missed? We audit public communications only — point us to the page and the next audit will read it. Write to hello@cognioengine.co.uk.

Audit date: 2026-06-08

Moderate-to-good confidence. Both target URLs (cohere.com/research and cohere.com/research/aya) were successfully fetched and yielded rich, quotable text; stance_url was null. Note that cohere.com/research now renders under the 'Cohere Labs' brand, the current name for Cohere For AI / C4AI, so this audit reflects the open research arm (Aya, Global MMLU, Open Science Community) and is distinct from the commercial Cohere product entry. Scores are qualitative editorial judgments under the GoldBerry preservative-AI lens, not validated metrics; quotes are drawn from rendered page text and lightly normalised.

Auditor: GoldBerry v1.3 / StanceWatch v1.0