A significant proportion of the world’s languages are found in Africa, with estimates suggesting over a quarter of all languages are spoken on the continent. However, many of these languages are underrepresented in the development of artificial intelligence (AI). This is largely due to a lack of investment and readily available data, as most AI tools are trained on European and Chinese languages.
The scarcity of text data for African languages hinders the creation of AI tools that can effectively serve speakers of these languages. As many African languages are primarily spoken rather than written, there is a shortage of text to train AI models. This means that millions of people across the continent are being left out of the benefits of AI technology.
Researchers at the University of Pretoria, led by Prof Vukosi Marivate, have been working to address this issue. They have recently released the largest known dataset of African languages, which includes 9,000 hours of speech recorded across Kenya, Nigeria, and South Africa. The dataset covers 18 African languages, including Kikuyu, Dholuo, Hausa, Yoruba, isiZulu, and Tshivenda, and is intended to provide a foundation for the development of AI-ready datasets in these languages.
The African Next Voices project, which created the dataset, was made possible by a $2.2m grant from the Gates Foundation. The data will be open access, allowing developers to build tools that can translate, transcribe, and respond in African languages. This initiative has the potential to bridge the linguistic divide in AI technology and provide more equitable access to its benefits.
There are already examples of how indigenous languages can be used in AI to solve real-life challenges in Africa. For instance, a farmer in South Africa uses an app called AI-Farmer, which recognizes several South African languages, to help solve problems on her farm. The app allows her to ask questions and receive useful answers in her native language, Setswana.
Companies like Lelapa AI are also working to build AI tools in African languages for banks and telecoms firms. According to Pelonomi Moiloa, CEO of Lelapa AI, language can be a significant barrier to accessing essential services, and initiatives like African Next Voices are crucial in addressing this issue.
The inclusion of African languages in AI technology is not only a matter of convenience but also of cultural preservation. As Prof Marivate notes, language is not just a means of communication but also a repository of history, culture, and knowledge. The loss of indigenous languages could result in the loss of unique perspectives and ways of understanding the world. The African Next Voices project is an important step towards ensuring that African languages are represented in the development of AI technology.