初めに
マルチモーダルのモデルが出たみたいなので、触っていきます
Announcing Nous Hermes 2.5 Vision! @NousResearch's latest release builds on my Hermes 2.5 model, adding powerful new vision capabilities thanks to @stablequan!
— Teknium (e/λ) (@Teknium1) 2023年12月3日
Download: https://t.co/AC2AKpRBY7
Prompt the LLM with an Image!
Function Calling on Visual Information!
SigLIP… pic.twitter.com/2XfDGLNsqH
準備
基本的には、以下と同じ動かし方で大丈夫みたいでした
!git clone https://github.com/qnguyen3/hermes-llava.git %cd hermes-llava !pip install --upgrade pip # enable PEP 660 support !pip install -e . !pip install -e ".[train]" !pip install flash-attn --no-build-isolation !pip install transformers==4.34.1
実行
推論で使い画像は、以下を使用します
モデルのロード
from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path from llava.eval.run_llava import eval_model model_path = "NousResearch/Nous-Hermes-2-Vision-Alpha" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) )
推論の実行
!python -m llava.serve.cli \ --model-path NousResearch/Nous-Hermes-2-Vision-Alpha \ --image-file "https://llava-vl.github.io/static/images/view.jpg" \ --load-4bit
Human:
what is look it? The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Assistant:
The image showcases a serene lakeside scene. A wooden dock extends into the calm waters of the lake. The lake is surrounded by a forested area, with tall trees lining the shore. In the distance, a mountain range can be seen, adding to the scenic beauty of the location. The sky is clear, suggesting a bright and sunny day