A Rising Star in Open-Source TTS! Dia-1.6B: Ultra-Realistic Dialogue Generation, Garners 6.5K Stars in Just 2 Days After Release!

Developed by Nari Labs, Dia-1.6B is generating buzz with its ultra-realistic dialogue generation capabilities, quickly gaining over 6.5K stars on GitHub in just two days after being open-sourced! It’s reported to outperform ElevenLabs and Sesame, achieving emotional control, non-verbal sounds (like laughter and coughs), and zero-shot voice cloning with only 1.6B parameters – all while maintaining impressive running efficiency. It supports generating multi-character dialogues from text scripts, distinguishing roles using tags like [S1] and [S2], producing natural speech with non-verbal expressions and voice cloning, currently limited to English. Model weights and a Gradio Demo are available on Hugging Face for testing.

Dia-1.6B, developed by Nari Labs, has sparked considerable discussion due to its ultra-realistic dialogue generation capabilities, garnering 6.5K+ stars on GitHub within just two days of being open-sourced! It’s said to surpass ElevenLabs and Sesame in performance, enabling emotional control, non-verbal sounds (such as laughter, sighs, and coughs), and zero-shot voice cloning with a mere 1.6B parameters while boasting remarkable efficiency. It supports generating multi-character dialogues from text scripts using tags like [S1] and [S2] to differentiate roles, creating natural speech that includes non-verbal expressions and voice cloning – currently only for English. Model weights and a Gradio Demo are also available on Hugging Face.

Key Features:

Multi-Character Dialogue Generation: Use tags like [S1], [S2] to distinguish roles, generating multi-character dialogues with natural pacing and emotional transitions.
Human-like Expression: Supports non-verbal emotions such as laughter (laugh), sighs (sigh), and coughs (cough).
Zero-Shot Voice Cloning: Fine-tune or specify a voice style to clone user or character voices.
High-Quality Speech Synthesis: Sound quality rivals ElevenLabs and Sesame, with natural details and realistic emotional changes.
Real-Time Inference Speed: Approximately 40 tokens/s on an A4000 GPU for a smooth experience without waiting.
Gradio Interface Support: Includes a usable Web UI for immediate text-to-speech testing.

Getting Started with Dia-1.6B:

The official Dia-1.6B lab provides detailed installation guides and a Gradio demo.

Online Experience: No configuration required – simply open the Hugging Face Demo to input scripts or audio for testing: https://huggingface.co/spaces/nari-labs/Dia-1.6B

Installation and Deployment Steps:

Clone the project: git clone https://github.com/nari-labs/dia.git
Navigate to the directory: cd dia
Create a virtual environment: python -m venv .venv
Activate the virtual environment: source .venv/bin/activate
Install dependencies: pip install -e .
Start the Gradio UI: python app.py
Access http://localhost:7860 to input scripts or upload audio and generate dialogue.

Example Script: [S1] Dia is amazing! [S2] Yeah, it generates laughs too! (laughs)

You can also install Dia as a Python package for API access: # Install directly from GitHub pip install git+https://github.com/nari-labs/dia.git

Python Example:

import soundfile as sf
from dia.model import Dia

model = Dia.from_pretrained("nari-labs/Dia-1.6B")

text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."

output = model.generate(text)

sf.write("simple.mp3", output, 44100)

PyPI package and CLI tools will be released soon.

Recommended Use Cases:

Audiobooks / Novel Narration: Allow different characters to “speak,” restoring the true context with emotional words.
Podcast Voiceovers: Quickly synthesize emotive, stylized speech for interview-style podcasts.
AI Role-Playing: Combine with an Agent for multi-character simulation systems.
TTS Research and Fine-Tuning: Voice cloning, emotion control, non-verbal expression.

In Conclusion:

Dia-1.6B is the latest research achievement in open-source TTS, astonishing users with its realistic dialogue and low resource requirements. Its small size (1.6B parameters) generates high-fidelity speech comparable to ElevenLabs and Sesame, even distinguishing roles and simulating non-verbal emotions (e.g., [cough], [sigh], [laughter]). Its free open-source nature + ultra-high realism + full support for multi-character/non-verbal expression make it the most noteworthy TTS project in the open-source domain currently! However, it currently only supports English; we look forward to future support for Chinese and more languages.

GitHub Project Address: https://github.com/nari-labs/dia
HF Model Address: https://huggingface.co/nari-labs/Dia-1.6B
Online Demo: https://huggingface.co/spaces/nari-labs/Dia-1.6B

If you have an AI character that you want to “speak” – and even “laugh” – Dia-1.6B is a perfect fit.

Reproduction without permission is prohibited：AI LAB » A Rising Star in Open-Source TTS! Dia-1.6B: Ultra-Realistic Dialogue Generation, Garners 6.5K Stars in Just 2 Days After Release!

A Rising Star in Open-Source TTS! Dia-1.6B: Ultra-Realistic Dialogue Generation, Garners 6.5K Stars in Just 2 Days After Release!

作者：rffanlab

Recommended Content