Language data infrastructure / 2026 / Independent builder

Inzwi

Bilingual Language Data Platform

Overview

Crowdsourcing platform for collecting bilingual English-Shona speech and text data. Built to support language tooling, voice interfaces, and low-resource NLP work in Zimbabwe.

Role

Independent builder

Focus

Language data infrastructure

Stack

TypeScript / Next.js / Node.js / PostgreSQL / TailwindCSS / Azure

Selected signals

Stack

6 technologies

TypeScript, Next.js, Node.js

Delivery

1 public endpoints

Exposed through public-facing project links.

Surface

Speech + text collection

A contribution workflow designed to gather bilingual training data.

Constraint

Low-resource language support

Targets a missing data layer that blocks local NLP and voice systems.

Case study

Inzwi

A crowdsourcing platform for building a bilingual English-Shona voice assistant through community data collection

The Problem

Shona is spoken by over 10 million people in Zimbabwe, yet it has virtually zero representation in modern language technology. No Google Translate, no Siri, no Alexa. The reason is simple: there's no training data. No speech corpora, no parallel text datasets, no emotion-labeled audio. Without data, there are no models. Without models, there's no voice for an entire language in the digital world.

The Solution

Inzwi (meaning "voice" in Shona) is a platform that tackles this from the ground up. Instead of waiting for big tech to notice low-resource African languages, it crowdsources the data directly from native speakers through a gamified web platform — then uses that data to finetune speech and language models.

The platform collects three types of data:

Text translations — English-Shona sentence pairs for bilingual model training
Audio recordings — Read speech and emotional speech for STT and TTS training
Validations — Community peer review to ensure data quality

How It Works

Contribute — Users translate sentences, record audio, or validate others' contributions
Gamification — Points, badges, leaderboards, and progress tracking keep contributors engaged
Quality Pipeline — Multi-stage validation (community voting + automated audio checks) ensures clean data
Model Training — Collected data is used to finetune Whisper (STT), LLaMA (bilingual LLM), XTTS (TTS), and emotion recognition models
WhatsApp Bot — Meets users where they are, enabling contribution and interaction via WhatsApp

Key Features

Audio Recording Module — Web-based recording with client-side quality checks (volume, noise detection)
Translation Module — Community-driven English ↔ Shona text translation with dialect tagging
Validation System — Peer validation for audio accuracy and translation quality
Emotional Speech Collection — Prompted scenarios for 5 emotions (neutral, happy, sad, angry, surprised)
Gamification — Leaderboards, badges, achievements, and contributor stats
WhatsApp Integration — Bot for audio collection and freeform conversation via Meta Cloud API
Dialect Support — Tags for Zezuru, Karanga, Manyika, Ndau, and Korekore dialects
PWA — Offline-capable progressive web app for low-bandwidth environments
Freeform Mode — Open-ended conversational data collection alongside structured tasks

Technical Stack

Frontend — Next.js 14, TypeScript, Tailwind CSS, NextAuth.js
Backend — Node.js, Fastify, TypeScript, Zod validation
Database — PostgreSQL with Prisma ORM
Storage — Azure Blob Storage for audio files
WhatsApp — Meta Cloud API for bot integration
ML Pipeline — PyTorch, HuggingFace Transformers (Whisper, LLaMA, XTTS, Wav2Vec2)
Deployment — Vercel (frontend), VPS (backend)

Architecture

inzwi/
├── frontend/     # Next.js 14 PWA
├── backend/      # Fastify API server
│   ├── routes/   # Auth, recordings, sentences, validations,
│   │             # freeform, WhatsApp, admin, profile
│   ├── services/ # Business logic
│   └── prisma/   # Database schema & migrations
├── shared/       # Shared TypeScript types
└── docs/         # Research proposal, data architecture

The Bigger Picture

Inzwi is my final-year research project at MSU (2026). The goal isn't just a platform — it's a replicable framework for any low-resource language. The crowdsourcing methodology, data pipeline, and model finetuning strategy are designed to be adapted for other African languages facing the same data scarcity.

Target Datasets

50,000–100,000 bilingual sentence pairs
50–100 hours of read Shona speech
10–20 hours of emotion-labeled speech
First Shona emotional speech dataset of its kind

Why I Built This

"Inzwi" means "voice." This project is about giving my language a voice in the digital world. Shona speakers deserve the same access to language technology that English speakers take for granted. Rather than waiting for someone else to build it, I'm building the data infrastructure and models myself — with help from the community.

What this work shows

Treats local-language data collection as infrastructure work, not just a form UI problem.
Creates a pathway for future translation, speech, and voice assistant systems.
Builds around a real regional gap where language tooling is blocked by missing raw data.

Links

Visit Site

Stack

TypeScript / Next.js / Node.js / PostgreSQL / TailwindCSS / Azure

Back to projects