初めに

TTSおよびvoice cloneのモデルが出たので触ってみます。現時点では学習周りは一切できないみたいです

Today, we're excited to announce a beta release of Zonos, a highly expressive TTS model with high fidelity voice cloning.

We release both transformer and SSM-hybrid models under an Apache 2.0 license.

Zonos performs well vs leading TTS providers in quality and expressiveness. pic.twitter.com/jaliZNJecm
— Zyphra (@ZyphraAI) 2025年2月10日

トレーニングコードに関するissue

github.com

開発環境

Windows11
Zonos (de8d4d84f3fc83da7635b4741e5f1c3f1bf233aa)

実行

cloneしたコードだとうまく動かなったので、docker-compose.yml を以下のように修正をしました。

version: '3.8'

services:
  zonos:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: zonos_container
    runtime: nvidia
    ports:
      - "7860:7860"
    stdin_open: true
    tty: true
    command: ["python3", "gradio_interface.py"]
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
      - GRADIO_SHARE=False

実行コマンドは以下です