Nous-Hermes-2 VisionをGoogleColobで動かす

初めに

以下のマルチモーダルのモデルが出ていたので、試していきます

github.com

環境

準備

!git clone https://github.com/qnguyen3/hermes-llava.git
%cd hermes-llava
!pip install --upgrade pip  # enable PEP 660 support
!pip install -e .
!pip install -e ".[train]"
!pip install flash-attn --no-build-isolation
!pip install transformers==4.34.1

実行

以下の画像で試していきます

GLIでもGPUでも動かせるみたいなので、両方で試していきます

まずは、モデルのロードをしていきます

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model

model_path = "NousResearch/Nous-Hermes-2-Vision"

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path=model_path,
    model_base=None,
    model_name=get_model_name_from_path(model_path)
)

CLI

!python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-7b \
    --image-file "https://llava-vl.github.io/static/images/view.jpg" \
    --load-4bit

USER :

what is look it?

ASSISTANT:

The image features a pier extending over a body of water, with a mountain in the background. The pier is made of wood and appears to be a docking area for boats. The scene is serene and picturesque, with the mountain and the water creating a beautiful natural setting.