Inzwi
Bilingual Voice Assistant Platform
2026
Solo Developer
Crowdsourcing platform for building a bilingual English-Shona voice assistant. Community-driven data collection with gamification, audio recording, translation, and WhatsApp integration.
Inzwi
A crowdsourcing platform for building a bilingual English-Shona voice assistant through community data collection
The Problem
Shona is spoken by over 10 million people in Zimbabwe, yet it has virtually zero representation in modern language technology. No Google Translate, no Siri, no Alexa. The reason is simple: there's no training data. No speech corpora, no parallel text datasets, no emotion-labeled audio. Without data, there are no models. Without models, there's no voice for an entire language in the digital world.
The Solution
Inzwi (meaning "voice" in Shona) is a platform that tackles this from the ground up. Instead of waiting for big tech to notice low-resource African languages, it crowdsources the data directly from native speakers through a gamified web platform β then uses that data to finetune speech and language models.
The platform collects three types of data:
- Text translations β English-Shona sentence pairs for bilingual model training
- Audio recordings β Read speech and emotional speech for STT and TTS training
- Validations β Community peer review to ensure data quality
How It Works
- Contribute β Users translate sentences, record audio, or validate others' contributions
- Gamification β Points, badges, leaderboards, and progress tracking keep contributors engaged
- Quality Pipeline β Multi-stage validation (community voting + automated audio checks) ensures clean data
- Model Training β Collected data is used to finetune Whisper (STT), LLaMA (bilingual LLM), XTTS (TTS), and emotion recognition models
- WhatsApp Bot β Meets users where they are, enabling contribution and interaction via WhatsApp
Key Features
- Audio Recording Module β Web-based recording with client-side quality checks (volume, noise detection)
- Translation Module β Community-driven English β Shona text translation with dialect tagging
- Validation System β Peer validation for audio accuracy and translation quality
- Emotional Speech Collection β Prompted scenarios for 5 emotions (neutral, happy, sad, angry, surprised)
- Gamification β Leaderboards, badges, achievements, and contributor stats
- WhatsApp Integration β Bot for audio collection and freeform conversation via Meta Cloud API
- Dialect Support β Tags for Zezuru, Karanga, Manyika, Ndau, and Korekore dialects
- PWA β Offline-capable progressive web app for low-bandwidth environments
- Freeform Mode β Open-ended conversational data collection alongside structured tasks
Technical Stack
- Frontend β Next.js 14, TypeScript, Tailwind CSS, NextAuth.js
- Backend β Node.js, Fastify, TypeScript, Zod validation
- Database β PostgreSQL with Prisma ORM
- Storage β Azure Blob Storage for audio files
- WhatsApp β Meta Cloud API for bot integration
- ML Pipeline β PyTorch, HuggingFace Transformers (Whisper, LLaMA, XTTS, Wav2Vec2)
- Deployment β Vercel (frontend), VPS (backend)
Architecture
inzwi/
βββ frontend/ # Next.js 14 PWA
βββ backend/ # Fastify API server
β βββ routes/ # Auth, recordings, sentences, validations,
β β # freeform, WhatsApp, admin, profile
β βββ services/ # Business logic
β βββ prisma/ # Database schema & migrations
βββ shared/ # Shared TypeScript types
βββ docs/ # Research proposal, data architecture
The Bigger Picture
Inzwi is my final-year research project at MSU (2026). The goal isn't just a platform β it's a replicable framework for any low-resource language. The crowdsourcing methodology, data pipeline, and model finetuning strategy are designed to be adapted for other African languages facing the same data scarcity.
Target Datasets
- 50,000β100,000 bilingual sentence pairs
- 50β100 hours of read Shona speech
- 10β20 hours of emotion-labeled speech
- First Shona emotional speech dataset of its kind
Why I Built This
"Inzwi" means "voice." This project is about giving my language a voice in the digital world. Shona speakers deserve the same access to language technology that English speakers take for granted. Rather than waiting for someone else to build it, I'm building the data infrastructure and models myself β with help from the community.
Links
Built With
Want to see more?
View All Projects