Data Generation and Language Technology for Low-Resourced African Languages

Data Generation and Language Technology for Low-Resourced African Languages

This research project focuses on using manual and automated methods to build bilingual corpora for several language pairs involving any low-resourced African language.

The realization of developing natural language processing techniques in tasks such as machine Translation (MT) requires the availability of monolingual and cross-lingual resources. Currently, the exploration of various advances in NLP techniques for low-resource languages and language pairs in the developing world is complicated by the lack of data resources. For example, in Uganda, where there are over 40 independent languages, there are no monolingual nor bi/multilingual resources for developing NLP systems such as those that significantly benefit well-resourced languages.

We plan to use the corpora to explore several NLP applications involving any of the respective low-resourced African languages.