はじめに

Qwen/Qwen-72Bが公開されたのですが、動かせるGPU RAMがないため以下の7B版を動かしてみます

環境

Goolgle Colob

準備

ライブラリをインストールします

!pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
!pip install auto-gptq optimum

推論

モデルのロード等を行います

from transformers import AutoTokenizer, AutoModelForCausalLM

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat-Int4", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-7B-Chat-Int4",
    device_map="auto",
    trust_remote_code=True
).eval()

まずは、挨拶から聞いてみます

response, history = model.chat(tokenizer, "こんにちは", history=None)
print(response)

次にいつもの質問をしてみます

response, history = model.chat(tokenizer, "まどマギで一番可愛いキャラはなんですか？", history=None)
print(response)

なんか日本の文化はちゃんと学んでいないみたいですね

yousanのメモ

Qwen/Qwen-7B-Chat-Int4をGoogle Colobで動かす

はじめに

環境

準備

推論