はじめに
今更ですが、CALM2のAWQ版を動かしてみます。
環境
準備
必要なライブラリの追加します
!pip -q install --upgrade accelerate autoawq !pip install torch==2.1.0+cu121 torchtext==0.16.0+cpu torchdata==0.7.0 --index-url https://download.pytorch.org/whl/cu121
autoawqだけを入れようとすると以下のエラーで怒られたので、issueを参考にしています
ImportError Traceback (most recent call last) <ipython-input-2-e1b236244288> in <cell line: 1>() ----> 1 from awq import AutoAWQForCausalLM 2 from transformers import AutoTokenizer 3 4 model_name_or_path = "TheBloke/calm2-7B-chat-AWQ" 5 5 frames /usr/local/lib/python3.10/dist-packages/awq/modules/linear.py in <module> 2 import torch 3 import torch.nn as nn ----> 4 import awq_inference_engine # with CUDA kernels 5 6 ImportError: libcudart.so.12: cannot open shared object file: No such file or directory --------------------------------------------------------------------------- NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt. To view examples of installing some common dependencies, click the "Open Examples" button below.
推論
from awq import AutoAWQForCausalLM from transformers import AutoTokenizer print("start") model_name_or_path = "TheBloke/calm2-7B-chat-AWQ" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False) # Load model model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True, trust_remote_code=False, safetensors=True) prompt = "Tell me about AI" prompt_template=f'''USER: {prompt} ASSISTANT: ''' print("*** Running model.generate:") token_input = tokenizer( prompt_template, return_tensors='pt' ).input_ids.cuda() # Generate output generation_output = model.generate( token_input, do_sample=True, temperature=0.7, top_p=0.95, top_k=40, max_new_tokens=512 ) # Get the tokens from the output, decode them, print them token_output = generation_output[0] text_output = tokenizer.decode(token_output) print("LLM output: ", text_output) """ # Inference should be possible with transformers pipeline as well in future # But currently this is not yet supported by AutoAWQ (correct as of September 25th 2023) from transformers import pipeline print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.95, top_k=40, repetition_penalty=1.1 ) print(pipe(prompt_template)[0]['generated_text']) """
いつもの質問も聞いてみます。
推論に必要なリソース