
Developed by Nari Labs, Dia-1.6B is generating buzz with its ultra-realistic dialogue generation capabilities, quickly gaining over 6.5K stars on GitHub in just two days after being open-sourced! It’s reported to outperform ElevenLabs and Sesame, achieving emotional control, non-verbal sounds (like laughter and coughs), and zero-shot voice cloning with only 1.6B parameters – all while maintaining impressive running efficiency. It supports generating multi-character dialogues from text scripts, distinguishing roles using tags like [S1] and [S2], producing natural speech with non-verbal expressions and voice cloning, currently limited to English. Model weights and a Gradio Demo are available on Hugging Face for testing.

Dia-1.6B, developed by Nari Labs, has sparked considerable discussion due to its ultra-realistic dialogue generation capabilities, garnering 6.5K+ stars on GitHub within just two days of being open-sourced! It’s said to surpass ElevenLabs and Sesame in performance, enabling emotional control, non-verbal sounds (such as laughter, sighs, and coughs), and zero-shot voice cloning with a mere 1.6B parameters while boasting remarkable efficiency. It supports generating multi-character dialogues from text scripts using tags like [S1] and [S2] to differentiate roles, creating natural speech that includes non-verbal expressions and voice cloning – currently only for English. Model weights and a Gradio Demo are also available on Hugging Face.
Key Features:
- Multi-Character Dialogue Generation: Use tags like [S1], [S2] to distinguish roles, generating multi-character dialogues with natural pacing and emotional transitions.
- Human-like Expression: Supports non-verbal emotions such as laughter (laugh), sighs (sigh), and coughs (cough).
- Zero-Shot Voice Cloning: Fine-tune or specify a voice style to clone user or character voices.
- High-Quality Speech Synthesis: Sound quality rivals ElevenLabs and Sesame, with natural details and realistic emotional changes.
- Real-Time Inference Speed: Approximately 40 tokens/s on an A4000 GPU for a smooth experience without waiting.
- Gradio Interface Support: Includes a usable Web UI for immediate text-to-speech testing.
Getting Started with Dia-1.6B:
The official Dia-1.6B lab provides detailed installation guides and a Gradio demo.
Online Experience: No configuration required – simply open the Hugging Face Demo to input scripts or audio for testing: https://huggingface.co/spaces/nari-labs/Dia-1.6B
Installation and Deployment Steps:
- Clone the project:
git clone https://github.com/nari-labs/dia.git
- Navigate to the directory:
cd dia
- Create a virtual environment:
python -m venv .venv
- Activate the virtual environment:
source .venv/bin/activate
- Install dependencies:
pip install -e .
- Start the Gradio UI:
python app.py
- Access http://localhost:7860 to input scripts or upload audio and generate dialogue.
Example Script: [S1] Dia is amazing! [S2] Yeah, it generates laughs too! (laughs)
You can also install Dia as a Python package for API access: # Install directly from GitHub
pip install git+https://github.com/nari-labs/dia.git
Python Example:
import soundfile as sf
from dia.model import Dia
model = Dia.from_pretrained("nari-labs/Dia-1.6B")
text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
output = model.generate(text)
sf.write("simple.mp3", output, 44100)
PyPI package and CLI tools will be released soon.
Recommended Use Cases:
- Audiobooks / Novel Narration: Allow different characters to “speak,” restoring the true context with emotional words.
- Podcast Voiceovers: Quickly synthesize emotive, stylized speech for interview-style podcasts.
- AI Role-Playing: Combine with an Agent for multi-character simulation systems.
- TTS Research and Fine-Tuning: Voice cloning, emotion control, non-verbal expression.
In Conclusion:
Dia-1.6B is the latest research achievement in open-source TTS, astonishing users with its realistic dialogue and low resource requirements. Its small size (1.6B parameters) generates high-fidelity speech comparable to ElevenLabs and Sesame, even distinguishing roles and simulating non-verbal emotions (e.g., [cough], [sigh], [laughter]). Its free open-source nature + ultra-high realism + full support for multi-character/non-verbal expression make it the most noteworthy TTS project in the open-source domain currently! However, it currently only supports English; we look forward to future support for Chinese and more languages.
GitHub Project Address: https://github.com/nari-labs/dia
HF Model Address: https://huggingface.co/nari-labs/Dia-1.6B
Online Demo: https://huggingface.co/spaces/nari-labs/Dia-1.6B
If you have an AI character that you want to “speak” – and even “laugh” – Dia-1.6B is a perfect fit.
Reproduction without permission is prohibited:AI LAB » A Rising Star in Open-Source TTS! Dia-1.6B: Ultra-Realistic Dialogue Generation, Garners 6.5K Stars in Just 2 Days After Release!