Skip to content

Kakao Brain

South Korea · kakaobrain.com · open
multimodalKorean NLPcreative AIresearch

KoAlpaca, KARO; strong research culture. [openness: open-leaning, demoted to "open" for v1 schema].

PALS scores

Preservative dimensions

PALS composite
3.7
Mean of three dimensions, 1–10.
Completeness
4.0
Sources, limits, transparency.
Multiplicity
5.0
Epistemologies, languages, voices.
Responsibility
2.0
Accountability, refusal, governance.
Eight lenses

What's missing, by lens

Each lens carries a canonical question and corrects a specific epistemic failure. Score, findings, and gaps land once the audit runs.

Lens 01
Indigenous Knowledge
Whose knowledge is missing?
2/10
Findings (2)
  • As a Korean lab, Kakao Brain operates within a national context where the dominant non-Western language and culture (Korean) is centered, which incidentally resists Anglophone defaults.
  • Korean folk knowledge and Hangul-script traditions are implicitly part of the training corpus for Korean NLP work.
Gaps (3)
  • No public evidence of engagement with Indigenous data sovereignty frameworks (CARE Principles).
  • No acknowledgment of the distinction between national-majority Korean culture and genuinely Indigenous or minority embodied knowledge systems.
  • Image-text datasets (e.g. COYO-700M) are large web-scraped corpora with no documented consent or sovereignty consideration for any community whose imagery is captured.
Justification

Homepage unreachable; assessment rests on public knowledge of Kakao Brain's open releases (KoGPT, minDALL-E, COYO-700M, Karlo). Centering Korean is culturally significant but is a national-majority stance, not Indigenous-knowledge stewardship. Large web-scraped multimodal corpora are the opposite of CARE-aligned data practice, hence a low score. Not the floor, because Korean-language centering does displace some Anglophone universalism.

Lens 02
Deep History
What historical process produced this?
3/10
Findings (2)
  • Kakao Brain is a subsidiary of Kakao Corp, a major Korean platform conglomerate; its existence reflects the specific geopolitical economy of Korean tech sovereignty and a national push for non-US foundation models.
  • Korea's chip and compute infrastructure (Samsung/SK Hynix proximity) sits in the background of any Korean lab's compute story.
Gaps (3)
  • No public reflection on colonial or extractive data legacies underlying web-scraped corpora.
  • No transparency about GPU supply, compute economics, or labor in data annotation.
  • No historical humility about what Korean NLP inherits from English-dominant pretraining pipelines and tokenizers.
Justification

The lab embodies a meaningful deep-history fact (a non-Western state building sovereign foundation models), which earns above-floor credit, but there is no evidence the lab itself names or reflects on these historical processes in public communications.

Lens 03
Cross-Cultural Wisdom
Which perspectives have been flattened?
5/10
Findings (2)
  • Substantive, non-tokenistic investment in Korean-language modeling (KoGPT) addresses a genuine under-served language in the LLM landscape.
  • Multimodal/creative work (minDALL-E, Karlo, RQ-Transformer) engages Korean visual and textual culture rather than treating English as default.
Gaps (3)
  • Korean is centered, but there is no evidence of support for Korea's own minority languages or for broader multilingual plurality beyond Korean and English.
  • No documented consultation with cultural scholars on culturally specific reasoning patterns.
  • Risk that Western model architectures and benchmarks are imported wholesale, flattening Korean reasoning into English-derived evaluation frames.
Justification

Genuine, resourced commitment to a non-English language lifts this above mid-scale and clearly above the other lenses. Held back from higher because the commitment is single-language (Korean) rather than plural, and there is no evidence of deeper cultural-epistemology consultation.

Lens 04
Scientific Evidence
What does the evidence show, and what are its limits?
6/10
Findings (3)
  • Strong open-release track record: open weights and open datasets (KoGPT, minDALL-E, Karlo, RQ-Transformer, and the COYO-700M / COYO-Labeled-300M datasets released publicly on GitHub/Hugging Face).
  • Open weights and open data materially enable third-party verification, replication, and independent bias auditing.
  • Associated work appears in or alongside peer-reviewed venues, supporting external scrutiny.
Gaps (3)
  • Limited evidence of the lab itself commissioning independent third-party audits of bias or training-data provenance.
  • Known-limitation disclosures (model cards with explicit failure modes, demographic bias statements) are inconsistent across releases.
  • Dataset documentation (datasheets) for web-scraped corpora is thin relative to their scale.
Justification

Openness is this lab's strongest dimension and the public evidence (open weights + open datasets) is real and verifiable, scoring well above mid-scale. Capped at 6 because openness is necessary but not sufficient for scientific responsibility, and self-initiated independent auditing / rigorous limitation disclosure are not in evidence.

Lens 05
Artistic Perception
What does this feel like, not just mean?
5/10
Findings (2)
  • Explicit focus on creative AI and generative image models (minDALL-E, Karlo) makes affective and aesthetic experience a first-class product concern.
  • The 'creative AI' framing acknowledges that generation is about feel and expression, not only task completion.
Gaps (3)
  • Creative tooling is framed as capability/output, with no evidence of reflection on the emotional labor of artists or the ambiguity and uncertainty inherent to creative practice.
  • No public engagement with artists' rights, attribution, or the affective harms of training generative models on creative works without consent.
  • Aesthetics treated as a feature to ship rather than a mode of knowing.
Justification

A real creative-AI focus earns mid-scale credit that capability-only labs would not get. Withheld from higher because there is no evidence of reflective engagement with the felt, ethical, or laborious dimensions of art-making.

Lens 06
Future Modelling
Where is this heading, and for whom?
2/10
Findings (1)
  • Open-weight releases modestly democratize access to capable Korean and multimodal models, shaping a future where non-US actors hold model infrastructure.
Gaps (4)
  • No public engagement with labor-displacement risks from generative/creative AI.
  • No environmental or compute cost disclosures for training large multimodal models.
  • No democratic-governance or inclusive-deliberation mechanisms around the systems released.
  • Generative image release at scale with no visible stance on deepfakes, synthetic-media misuse, or downstream harm.
Justification

Open access is a mild positive for distributed futures, but the near-total absence of any visible engagement with displacement, environmental cost, misuse, or governance keeps this near the floor.

Lens 07
Marginalised Voices
Who is not at the table?
2/10
Findings (1)
  • Open-sourcing models and datasets lowers the access barrier for under-resourced developers, including Korean and other non-Anglophone builders.
Gaps (4)
  • No evidence of participatory design with Global South developers.
  • No documented disability-community accessibility work.
  • No labor-representative engagement or compensated feedback channels for data workers / annotators.
  • No evidence of who is consulted before release decisions.
Justification

Open release is a passive, diffuse benefit rather than active inclusion of marginalized voices. Without any evidence of participatory or compensated engagement, this sits near the floor, lifted slightly above the minimum only by the genuine access-lowering effect of open weights.

Lens 08
Trickster Knowledge
What truth appears when the story is inverted?
1/10
Findings (1)
  • No public-facing material available to assess; no evidence of self-ironic or self-auditing posture.
Gaps (3)
  • No visible willingness to name the central contradiction of an 'open, creative AI' lab that web-scrapes creative works at scale to build generative models that may compete with their creators.
  • No irony, satire, or structural self-inversion in evidence.
  • The corporate-subsidiary framing (Kakao Corp) is exactly the kind of solemn institutional narrative trickster knowledge exists to test, and there is no sign it is tested internally.
Justification

Floor score. Trickster knowledge requires evidence of disciplined self-contradiction and refusal to treat one's own seriousness as exempt; none is observable, and the unexamined tension between 'open/creative' branding and extractive data practice is precisely what goes unaddressed.

Suffixscape

Linguistic diagnostics

Regex- and LLM-detected patterns of evasion in the lab's own prose: nominalised evasion, agency diffusion, epistemic inflation, temporal flatness. Distinct from the CognioNews -scape editorial format — see methodology.

No Suffixscape findings until the first audit.
Audit history

Prior audits

Latest audit: 2026-06-08 · sources:

Transparency

Raw data

Every audit is published as machine-readable JSON. You can read this lab's latest report at /stancewatch/api/labs/kakao-brain.json — it carries the per-lens findings, evidence quotes, Suffixscape flags, PALS scores, the sources actually read, and a confidence note.

Found an error, or a stance page we missed? We audit public communications only — point us to the page and the next audit will read it. Write to hello@cognioengine.co.uk.

Audit date: 2026-06-08

LOW confidence. Both homepage fetch attempts (https://kakaobrain.com and https://www.kakaobrain.com) returned connection-refused; stance_url was null. No pages were successfully read, so sources_audited is empty and all findings rest on public knowledge of Kakao Brain's known open releases (KoGPT, minDALL-E, Karlo, RQ-Transformer, COYO datasets) and its status as a Kakao Corp subsidiary. Scores reflect absence of observable current public communications as much as the lab's substance; a live stance page could materially shift several lenses (notably future_modelling, marginalised_voices, scientific_evidence). Suffixscape is empty because no text was scraped.

Auditor: GoldBerry v1.3 / StanceWatch v1.0