初めに
以下のマルチモーダルのモデルが出ていたので、試していきます
環境
- Google Colob
準備
!git clone https://github.com/qnguyen3/hermes-llava.git %cd hermes-llava !pip install --upgrade pip # enable PEP 660 support !pip install -e . !pip install -e ".[train]" !pip install flash-attn --no-build-isolation !pip install transformers==4.34.1
実行
以下の画像で試していきます
GLIでもGPUでも動かせるみたいなので、両方で試していきます
まずは、モデルのロードをしていきます
from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path from llava.eval.run_llava import eval_model model_path = "NousResearch/Nous-Hermes-2-Vision" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) )
CLI
!python -m llava.serve.cli \ --model-path liuhaotian/llava-v1.5-7b \ --image-file "https://llava-vl.github.io/static/images/view.jpg" \ --load-4bit
USER :
what is look it?
ASSISTANT:
The image features a pier extending over a body of water, with a mountain in the background. The pier is made of wood and appears to be a docking area for boats. The scene is serene and picturesque, with the mountain and the water creating a beautiful natural setting.