NousResearch/Nous-Hermes-2-Vision-AlphaをGoogleColobで動かす

初めに

マルチモーダルのモデルが出たみたいなので、触っていきます

準備

基本的には、以下と同じ動かし方で大丈夫みたいでした

ayousanz.hatenadiary.jp

!git clone https://github.com/qnguyen3/hermes-llava.git
%cd hermes-llava
!pip install --upgrade pip  # enable PEP 660 support
!pip install -e .
!pip install -e ".[train]"
!pip install flash-attn --no-build-isolation
!pip install transformers==4.34.1

実行

推論で使い画像は、以下を使用します

モデルのロード

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model

model_path = "NousResearch/Nous-Hermes-2-Vision-Alpha"

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path=model_path,
    model_base=None,
    model_name=get_model_name_from_path(model_path)
)

推論の実行

!python -m llava.serve.cli \
    --model-path NousResearch/Nous-Hermes-2-Vision-Alpha \
    --image-file "https://llava-vl.github.io/static/images/view.jpg" \
    --load-4bit

Human:

what is look it?
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.

Assistant:

The image showcases a 
serene lakeside scene. A wooden dock extends into the calm waters of the lake. The lake is surrounded by a forested area, with tall trees lining the shore. In the distance, a mountain range can be seen, adding to the scenic beauty of the location. The sky is clear, suggesting a bright and sunny day