開発環境

Windows11
python 3.11
4070 ti super

準備

まずは stable-audio-toolsをcloneします

次に依存周りをインストールしていきます

pip install .

このままだとcudaが認識されないので、cudaに対応したライブラリのインストールを行います

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

モデルのダウンロードのために huggingfaceにログインを行います (read権限だけで大丈夫です)

huggingface-cli login

実行

以下で実行することができます

python ./run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0

プロンプトに生成したい音声の情報を入れることで実行することができます

備考

Google Colob(L4)で実行すると以下のエラーでインストールが進みませんでした

  Building editable for stable-audio-tools (pyproject.toml) ... done
  Created wheel for stable-audio-tools: filename=stable_audio_tools-0.0.16-0.editable-py3-none-any.whl size=4116 sha256=acea626d79508289c34cf0c48f52ea72b2c1bdd04b957dee333284faaf722f5b
  Stored in directory: /tmp/pip-ephem-wheel-cache-axzo8rji/wheels/ec/b0/ad/af15732c5c021a13bcb6f3df8110ac75670c0cad6050ee76b3
Successfully built stable-audio-tools
Installing collected packages: argparse, stable-audio-tools
  Attempting uninstall: stable-audio-tools
    Found existing installation: stable-audio-tools 0.0.16
    Uninstalling stable-audio-tools-0.0.16:
      Successfully uninstalled stable-audio-tools-0.0.16
Successfully installed argparse-1.4.0 stable-audio-tools-0.0.16
WARNING: The following packages were previously imported in this runtime:
  [argparse]
You must restart the runtime in order to use newly installed versions.

yousanのメモ

stable-audio-toolsでstabilityai/stable-audio-open-1.0の推論を行う

開発環境

準備

実行

備考