ローカルでsmollmで動画の内容に関する推論を行う

初めに

昨日以下のLLMを動かしました。

ayousanz.hatenadiary.jp

昨日のPRで動画の推論コードがマージされたので、触ってみます

github.com

開発環境

  • Windows
  • uv
  • python 3.11
  • smollm[85a4eb2dd5dd0eb4e116264f1853ae259846a957]

セットアップ

ライブラリをインストールします

uv pip install transformers
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
uv pip install pillow opencv-python

動画推論

動画は以下を使います

結果は以下になります

Question: Describe the video
Response: User: Answer briefly.<image>Describe the video
Assistant: A green and white airplane is parked on a runway. A group of people are walking towards the airplane. A person is walking away from the airplane. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a runway. The airplane is parked on a