BOULDER, Colo. — The University of Colorado Boulder is leading an effort to preserve the Arapaho language, for which there are fewer than 100 living native speakers today.
“There are not a lot of speakers left, and all the speakers are now getting into their 70s and even 80s,” said CU Boulder linguistics professor Andrew Cowell.
The Arapaho are a Native American tribe whose historic lands include Colorado and Wyoming.
Cowell has been leading the charge on the project to document the language in audio and video and store it in a database. His work is being shared in two databases — one containing a collection of over 100,000 sentences from native speakers sharing their stories and culture, and a lexical database, which is a structured collection of information about words, including their meanings, relationships, and linguistic properties.

“What we're able to do is computational approaches to the database where we can actually determine what the most common words are and what the least common words are, and so we've produced a student dictionary,” Cowell said.
The project is a passion project for Cowell, who has been researching the Arapaho language for 25 years. He said the relationships he built were the most rewarding part of the project.
“It does take a while to get to know people, to convince people that you're not there just to exploit them or just grab some data from them and run,” Cowell said. “I eventually was adopted into the tribe by a family and given an Arapaho name.”

Over the course of the project, Cowell and his team worked with 100 different speakers for the database. It is now being used to teach a new generation the language of their parents and grandparents.
“There are several younger people who are starting to speak the language, and I've been working with them, as well, for almost 20 years down in Oklahoma and in Wyoming to help them learn the language better, and so they're using the database,” Cowell said.
The database is available to CU students and faculty. Eventually, 5,000 sentences with detailed linguistic labeling will be available to the public.
“Many Native American tribes are concerned about what they call data sovereignty, especially with AI and ChatGPT and those kinds of tools," Cowell said. "They worry that their language and their culture and sensitive information is just going to be scraped and taken onto the web, and then they don't really have any control over it."
