Findings (2)
- The Aya initiative covers 101 languages including 50+ described as 'underserved', which incidentally brings some Indigenous and low-resource languages into model scope.
- Cohere Labs frames its mission as 'Changing where, how, and by whom breakthroughs happen,' gesturing at participation by communities historically excluded from AI.
Gaps (4)
- No acknowledgment of Indigenous data sovereignty or the CARE Principles for Indigenous Data Governance.
- No mention of consent, benefit-sharing, or community control over the language data ingested for multilingual models — 'underserved languages' are framed as coverage gaps to be filled, not as living relational knowledge held by sovereign communities.
- No preservation of oral, ceremonial, or non-textual knowledge; the entire stack is text/retrieval-centric.
- The enterprise 'your data stays yours' commitment protects the paying customer's data, not the communities whose languages train the base models.
Justification
Multilingual breadth touches Indigenous and minoritised languages, but framing is extractive-by-default: languages are 'covered' as a benchmark frontier with no sovereignty, consent, or CARE framework. The 'your data stays yours' guarantee is conspicuously a customer-protection clause, throwing the absence of community-data protection into relief. Slightly above floor only because Aya's underserved-language focus is materially better than English-only labs.