In March 2018, Peter-Lucas Jones and the ten other staff at Te Hiku Media, a small non-profit radio station nestled just below New Zealand’s most northern tip, were in disbelief. In ten days, thanks to a competition it had started, Māori speakers across New Zealand had recorded over 300 hours of annotated audio in their mother tongue. It was enough data to build language tech for te reo Māori, the Māori language – including automatic speech recognition and speech-to-text.
The small staff of Māori language broadcasters and one engineer were about to become pioneers in indigenous speech recognition technology. But building the tools was only half the battle. Te Hiku soon found itself fending off corporate entities trying to develop their own indigenous data sets and resisting detrimental western approaches to data sharing. Guarding their data became the priority because the only people truly interested in revitalising the Māori language were the Māori people, themselves.
Languages around the world are dying – the UN estimates that an indigenous language dies every two weeks. Racist assimilation policies are largely to blame. Well into the 20th century, Māori children were often punished with shame or physical beatings when they spoke their native language in schools. As a result, when that generation reached adulthood, many chose not to pass on the language to their own children to protect them from the same types of persecution. This was a major cause of Māori language decline between 1920 and 1960. Now, the fluent population within many indigenous groups is both shrinking and aging. The language – and the traditional knowledge embedded in it – are both at risk of extinction.
Jones, the CEO of Te Hiku, and Keoni Mahelona, the chief technology officer, started to see a need for speech recognition after they digitised the massive audio collection Te Hiku had accumulated over 30 years of radio broadcasting. “We’d captured all these idiomatic phrases, colloquialism and unique phrases,” Jones says. It was the native sound of their language – one less adulterated by English and time. But to make this resource useful to Māori people living across the country and the world, Te Hiku would need to transcribe the audio. To transcribe the thousands of hours of Māori audio, they’d need to teach the computer to speak their language.
The tools for building speech-to-text systems – which allow Te Hiku to transcribe their radio content – and other speech recognition technology are fairly accessible, such as Mozilla’s open-source tool Deep Speech. The real challenge for indigenous communities is a lack of annotated data to build with. To create speech recognition tools from scratch, with no prior data, it typically requires a ballpark figure of 10,000 hours of annotated audio, according to Kelly Davis, cofounder of Coqui a start-up for open-source speech technology. That’s an extremely daunting, if not impossible, requirement for small indigenous languages with little prior documentation.
But with just its initial 320 hours of data, Te Hiku was able to build a speech-to-text engine with an initial word error rate of 14 per cent, according to Mahelona, a Native Hawaiian who’s been working at Te Hiku for seven years. For reference, Google’s ASR achieves a word error rate of 6.7 per cent with a 12,500-hour data set, according to one 2018 conference abstract. “The fact that they are getting word error rates that low for just over 300 hours, for a language that basically didn’t have speech recognition before, that’s very impressive,” Davis says.
Mahelona and Jones started presenting their success at conferences. It’s not important that they were the first to build ASR tools for an indigenous language, Mahelona says “but that we proved it was possible.” Language revitalisation experts from other indigenous communities, including the Mohawk in southeastern Canada and the Native People of Hawaii, have approached Te Hiku about using its code and mimicking its strategy. “Technology is a force multiplier,” says Nathan Brinklow, professor of Mohawk at Queen’s University, Canada. “They are leading the way. But this is something regular people can do.”