Skip to content

Stanford HAI / CRFM

AI policyevaluationfoundation model researchethics

Academic; produces influential benchmarks and policy work.

PALS scores

Preservative dimensions

PALS composite
4.7
Mean of three dimensions, 1–10.
Completeness
6.0
Sources, limits, transparency.
Multiplicity
3.0
Epistemologies, languages, voices.
Responsibility
5.0
Accountability, refusal, governance.
Eight lenses

What's missing, by lens

Each lens carries a canonical question and corrects a specific epistemic failure. Score, findings, and gaps land once the audit runs.

Lens 01
Indigenous Knowledge
Whose knowledge is missing?
2/10
Findings (2)
  • Generic appeals to AI 'that benefits all of humanity' and serving 'the collective needs of humanity' frame benefit universally rather than through any specific community's relational knowledge.
  • Affinity groups and civil-society training gesture at community breadth but at the level of institutional outreach, not Indigenous epistemology.
Gaps (3)
  • No mention of Indigenous data sovereignty, CARE Principles, or Tribal data governance anywhere in mission or homepage text.
  • No acknowledgment of oral, embodied, or non-textual knowledge traditions; intelligence is framed around annotated data and language.
  • No reference to extractive data practices or consent frameworks for communities whose data trains foundation models.
Justification

The corpus contains zero engagement with Indigenous knowledge, data sovereignty, or relational epistemologies. 'Cultural conventions' is invoked instrumentally (so systems gain 'broad acceptance'), not as a commitment to sovereignty. Score reflects total absence with only faint universalist gestures.

Lens 02
Deep History
What historical process produced this?
4/10
Findings (3)
  • A 'Major Milestones' timeline (2019-2023) gives institutional history and explicitly names origins: NAIRR proposal, Digital Economy Lab, CRFM launch under Percy Liang, policy boot camps, Biden meeting.
  • The 2026 AI Index framing acknowledges declining data transparency as a structural field-level problem: 'In a field where data transparency is declining, independent, rigorous measurement has never been more critical.'
  • AI + Policy Symposium text situates governance within competing global political philosophies (EU, China, Brazil, Japan, African Union, ASEAN), showing geopolitical awareness.
Gaps (3)
  • No acknowledgment of colonial data-extraction legacies or the labor (annotation, content moderation) underpinning the foundation models CRFM studies.
  • GPU access, compute concentration, and the political economy of who can build foundation models are absent despite CRFM's mandate.
  • Institutional history is told as a clean ascent of milestones with no contingency, failure, or contestation.
Justification

Stronger than most labs on field-level and geopolitical history, and unusually candid about declining transparency. But it omits the deeper histories of extraction and labor that shaped foundation models, and tells its own story as frictionless ascent. Mid-range.

Lens 03
Cross-Cultural Wisdom
Which perspectives have been flattened?
3/10
Findings (2)
  • Global governance convenings (AI + Policy Symposium 'Global Stocktaking') and an Asia Foundation education partnership signal some international engagement.
  • A featured study audits chatbots and finds 'substantial regional disparity, dependence on distinct information ecosystems,' showing awareness that AI performance is culturally and regionally uneven.
Gaps (3)
  • Entire site is English-only; no multilingual presence or commitment to linguistic diversity.
  • 'Cultural conventions' are treated as a compliance surface for AI acceptance, not as plural reasoning traditions worth preserving.
  • No consultation with cultural scholars or non-Western epistemologies named; intelligence is modeled on a single (cognitive-science / neuroscience) frame.
Justification

There is genuine, research-grounded awareness of regional disparity (the chatbot audit) and some global convening, which lifts this above the floor. But culture is framed instrumentally and the institute speaks monolingually from a single epistemic center. Low-mid.

Lens 04
Scientific Evidence
What does the evidence show, and what are its limits?
7/10
Findings (4)
  • The AI Index Report is positioned as 'the most comprehensive analysis of AI's trajectory available' and explicitly champions 'independent, rigorous measurement.'
  • CRFM is a named, dedicated center for studying foundation models, and the homepage foregrounds evaluation work, including a 'Real-Time Audit of Six Commercial Chatbots.'
  • Featured research openly publishes negative/limiting results: 'AI Coding Agents Fail at Teamwork' and 'Two models working together perform worse than one alone, exposing a critical gap.'
  • The HAI/Stanford Data Science merger is explicitly framed as betting 'that academic openness will shape AI's future.'
Gaps (3)
  • Openness is asserted as a value but the corpus does not point to open weights, open training data, or third-party replication protocols for HAI/CRFM's own releases.
  • No mention of independent audits of HAI's own training data or bias; the audit posture is outward (auditing others) more than reflexive.
  • 'Most comprehensive analysis available' is an unverified superlative about its own flagship product.
Justification

This is HAI's clear strength: evaluation, measurement, an institutional commitment to openness, and willingness to publish failure findings. Docked for self-superlatives and for the openness claim being values-level rather than evidenced by open weights/replication for its own work.

Lens 05
Artistic Perception
What does this feel like, not just mean?
4/10
Findings (3)
  • Mission language repeatedly invokes affective and human-interior dimensions: machine intelligence should understand 'human language, emotions, intentions, behaviors, and interactions.'
  • Art is named as a domain AI can enhance: 'better writing, design, healthcare, communication, teaching, and art.'
  • The work-quality framing values making work 'better and more enjoyable,' acknowledging an experiential, not purely efficient, register.
Gaps (3)
  • Emotion and intuition are treated as objects for machines to model, not as auditor stances or modes of attention the institute itself practices.
  • No space for ambiguity, poetic uncertainty, or the affective labor of those building/annotating systems.
  • Aesthetic register of the site is corporate-institutional; pull-quotes are leadership testimonials, not invitations into uncertainty.
Justification

Emotions, enjoyment, and art are explicitly present, which is more than purely utilitarian labs offer, but they are objects to be modeled or domains to be improved rather than felt dimensions the institute sits with. Mid-low.

Lens 06
Future Modelling
Where is this heading, and for whom?
6/10
Findings (4)
  • Labor displacement is engaged head-on and repeatedly: the institute platforms the argument that AI need not mean layoffs and that productivity can come without 'removing labor costs.'
  • Governance of futures is central: policymaker education, the Joint California Summit, and an 'AI governance ecosystem' framing put democratic deliberation in scope.
  • The 2026 AI Index explicitly names a governance lag: 'the frameworks needed to govern, evaluate, and understand this technology are falling behind.'
  • Human-impact research scope includes 'surveillance, population control, and waging war,' and effects on 'labor markets, economic growth, and trade across nations.'
Gaps (3)
  • No environmental or compute/energy cost disclosure anywhere in the corpus despite foundation-model focus.
  • Deliberation is convened among policymakers, industry, and academics; affected publics and Global South futures are not at the table as participants.
  • Agentic-system governance is gestured at but the democratic mechanism for governing them is unspecified.
Justification

One of the stronger lenses: substantive, repeated engagement with labor futures, governance lag, and even surveillance/warfare. Held back by total silence on environmental cost and by an elite (policymaker/industry) rather than participatory model of deliberation.

Lens 07
Marginalised Voices
Who is not at the table?
3/10
Findings (3)
  • Bias and equity are named research objects: studying whether 'algorithms introduce, compound, or mitigate biases and risk,' and a stated aim of 'equitable and trustworthy technology.'
  • Affinity groups, civil-society/nonprofit training, and K-12 programs broaden who is reached by HAI's education.
  • Benefits framed as broadly shared: 'humanity benefits ... and that the benefits are broadly shared.'
Gaps (3)
  • No participatory design with Global South developers; engagement model is to study and to 'guide industry,' not co-create with affected groups.
  • No disability-community accessibility commitment, no labor-representative engagement, no compensated feedback channels.
  • Annotation/data labor — the marginalised workforce behind foundation models — is invisible.
Justification

Equity and bias are research priorities and the education footprint reaches beyond elites, but participation remains representational/study-based rather than co-creative. The labor underpinning the models and specific marginalised constituencies (disability, Global South builders, data workers) are absent. Low-mid.

Lens 08
Trickster Knowledge
What truth appears when the story is inverted?
5/10
Findings (3)
  • HAI shows real appetite for puncturing hype with its own evidence: 'AI Coding Agents Fail at Teamwork' and a real-time audit exposing chatbot 'fragility' invert the industry's success narrative.
  • The AI Index foregrounds a 'widening gap between what AI can do and how prepared we are,' naming a contradiction the field's promotional consensus smooths over.
  • Publishing that two models do worse than one is a disciplined inversion of the more-is-better scaling story.
Gaps (3)
  • The institute's own seriousness is exempt from this inversion: 'human-centered,' 'improve the human condition,' and 'most comprehensive analysis available' are never turned back on themselves.
  • No irony, satire, or paradox is directed inward; the trickster is pointed only at others' systems.
  • Leadership pull-quotes ('historical opportunity and responsibility') are solemn and unironic, with no space allowed for the official narrative to be tested by its opposite.
Justification

There is genuine evidence-driven myth-busting aimed outward — that earns a mid score. But the trickster never turns on HAI itself; the institute's own framing is treated as exempt from audit, which is precisely the structural inversion this lens asks for.

Suffixscape

Linguistic diagnostics

Regex- and LLM-detected patterns of evasion in the lab's own prose: nominalised evasion, agency diffusion, epistemic inflation, temporal flatness. Distinct from the CognioNews -scape editorial format — see methodology.

Pattern Quote Effect Preservative alternative
epistemic inflation "This report provides the most comprehensive analysis of AI's trajectory available." An unverified superlative about its own flagship product borrows the authority of measurement to place the work beyond comparison, discouraging the reader from asking what it omits. State scope and method concretely: 'This report compiles metrics across N domains from public and partner data; coverage gaps include X and Y.'
nominalised evasion "shape the development and responsible deployment of AI" 'Development' and 'deployment' as nominalisations hide who develops and deploys, and who bears the consequences, letting HAI claim influence without naming the actors it influences or its leverage over them. Name the actors and the mechanism: 'we advise specific companies and agencies that build and deploy AI, through these named programs and with this degree of influence.'
agency diffusion "AI systems must conform to the often-implicit cultural conventions that underlie human interaction." 'AI systems must conform' makes the system the moral subject and erases the engineers, institutions, and choices that decide which conventions count, diffusing responsibility onto the technology. Restore the actor: 'we and other builders decide which cultural conventions our systems encode, and we are accountable for those choices.'
epistemic inflation "Advancing AI research, education, and policy to improve the human condition." 'Improve the human condition' is an unfalsifiable, maximal claim of benefit that frames the institute's work as inherently good and pre-empts scrutiny of trade-offs and harms. Bound the claim: 'we aim to improve specific outcomes (named domains) while monitoring and disclosing the harms and trade-offs our work can produce.'
temporal flatness "2019: HAI officially launches ... 2023: HAI leaders ... meet with Pres. Biden to discuss American innovation in AI." The milestone timeline renders institutional history as a smooth, inevitable ascent, erasing contestation, course-corrections, and the contingencies (funding, controversy, critique) that actually shaped the institute. Include friction: note debates the institute lost or revised, criticisms received, and decisions that could have gone otherwise.
Audit history

Prior audits

Latest audit: 2026-06-08 · sources: https://hai.stanford.edu, https://hai.stanford.edu/about

Transparency

Raw data

Every audit is published as machine-readable JSON. You can read this lab's latest report at /stancewatch/api/labs/stanford-hai.json — it carries the per-lens findings, evidence quotes, Suffixscape flags, PALS scores, the sources actually read, and a confidence note.

Found an error, or a stance page we missed? We audit public communications only — point us to the page and the next audit will read it. Write to hello@cognioengine.co.uk.

Audit date: 2026-06-08

High confidence. Both target URLs resolved via Firecrawl: the homepage rendered fully and /about carried the complete mission, who-we-are, focus-area, leadership, and milestone text (the originally requested /about/mission path 404'd, but its content is contained in /about). Judgments rest on directly quoted public copy. Qualitative judgment; not a validated metric.

Auditor: GoldBerry v1.3 / StanceWatch v1.0