初めに

自分でもMoEが作れるみたいなので、実際にやってみます

mergekitを使ってMoEモデルを作ってみました。
・rinna/youri-7b-instruction
・rinna/youri-7b-chat

chatモデルとinstructionモデルを繋げる効果がどのくらいあるかわからないけれど、動くところまで確認できた。
時間があればJGLUE試してみる。https://t.co/p9IPsy3yww
— はち (@CurveWeb) 2024年1月13日

参考サイト

MergekitでMoEを作る

note.com

MergekitでLLM同士をマージする

ayousanz.hatenadiary.jp

作成したモデルは以下で公開しています

huggingface.co

環境

L4 GPU
ubuntu22.04

準備

cloneとライブラリのインストール

現時点では、main ブランチにマージされていないのでブランチを指定して cloneをします

!git clone -b mixtral https://github.com/cg123/mergekit.git
%cd mergekit 
!python3 -m pip install --upgrade pip
!pip install -q -e .

必要なフォルダの作成

# outputフォルダとconfigフォルダを作成
!mkdir output
!mkdir config

config ファイルを作成

gate_modeは以下の3つを選択できます

    "hidden" 各レイヤーに与えられたプロンプトに対する隠れ状態ベクトルを使用する。
    "cheap_embed" プロンプトのトークン埋め込み値の平均を使用。
    "random" ランダム

以下を実行して、config.yamlを作成します

import yaml

MODEL_NAME = "ca-youri-7B-merge-MoE-slerp"
yaml_config = """
base_model: cyberagent/calm2-7b-chat
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: cyberagent/calm2-7b-chat
    positive_prompts: 
      - "質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。"
      - "前提と仮説の関係を含意、矛盾、中立の中から回答してください。"
      - "以下のテキストを、ポジティブまたはネガティブの感情クラスのいずれかに分類してください。"
      - "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
  - source_model: rinna/youri-7b-instruction
    positive_prompts: 
     - "質問に対する回答を題名と文章から一言で抽出してください。回答は名詞で答えてください。"
     - "与えられたニュース記事を要約してください。"
     - "与えられた文が文法的であるかを回答してください。"
"""

# Save config as yaml file
with open('config/config.yaml', 'w', encoding="utf-8") as f:
    f.write(yaml_config)

LLMのマージ

以下でマージを実行します

# マージの実行
!python mergekit/scripts/mixtral_moe.py config/config.yaml output -v

オプション

Options:
  --load-in-4bit       Load model in 4bit for computing hidden
                                  states
  --load-in-8bit                  Load model in 8bit for computing hidden
                                  states
  --device TEXT                   Device to use to compute embeddings
                                  [default: auto]
  -v, --verbose                   Verbose logging
  --allow-crimes / --no-allow-crimes
                                  Allow mixing architectures  [default: no-
                                  allow-crimes]
  --transformers-cache TEXT       Override storage path for downloaded models
  --lora-merge-cache TEXT         Path to store merged LORA models
  --cuda / --no-cuda              Perform matrix arithmetic on GPU  [default:
                                  no-cuda]
  --low-cpu-memory / --no-low-cpu-memory
                                  Store results and intermediate values on
                                  GPU. Useful if VRAM > RAM  [default: no-low-
                                  cpu-memory]
  --out-shard-size SIZE           Number of parameters per output shard
                                  [default: 5B]
  --copy-tokenizer / --no-copy-tokenizer
                                  Copy a tokenizer to the output  [default:
                                  copy-tokenizer]
  --clone-tensors / --no-clone-tensors
                                  Clone tensors before saving, to allow
                                  multiple occurrences of the same layer
                                  [default: no-clone-tensors]
  --trust-remote-code / --no-trust-remote-code
                                  Trust remote code from huggingface repos
                                  (danger)  [default: no-trust-remote-code]
  --random-seed INTEGER           Seed for reproducible use of randomized
                                  merge methods
  --lazy-unpickle / --no-lazy-unpickle
                                  Experimental lazy unpickler for lower memory
                                  usage  [default: no-lazy-unpickle]
  --write-model-card / --no-write-model-card
                                  Output README.md containing details of the
                                  merge  [default: write-model-card]
  --safe-serialization / --no-safe-serialization
                                  Save output in safetensors. Do this, don't
                                  poison the world with more pickled models. [default: safe-serialization]

モデルをhuggingfaceに公開

モデルカードの作成

以下を作成して、huggingfaceにログインおよびモデルカードの作成をします

!pip install -qU huggingface_hub

from huggingface_hub import ModelCard, ModelCardData
from jinja2 import Template

username = "user id"

template_text = """
---
license: llama2
tags:
- merge
- mergekit
- lazymergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## 🧩 Configuration

"""

# Create a Jinja template object
jinja_template = Template(template_text.strip())

# Get list of models from config
data = yaml.safe_load(yaml_config)

# Fill the template
content = jinja_template.render(
    model_name=MODEL_NAME,
    models=models,
    yaml_config=yaml_config,
    username=username,
)

# Save the model card
card = ModelCard(content)
card.save('out/README.md')

モデルのアップロード

以下でモデルをアップロードします

from huggingface_hub import HfApi

# Defined in the secrets tab in Google Colab
api = HfApi(token="api key")

api.create_repo(
    repo_id=f"{username}/{MODEL_NAME}",
    repo_type="model"
)
api.upload_folder(
    repo_id=f"{username}/{MODEL_NAME}",
    folder_path="output",
)

yousanのメモ

Mergekitでcalm2-7b-chatとyouri-7b-instructionで日本語モデルMoEを作成する

初めに

環境

準備