yousanのメモ

cl-nagoya/shioriha-large-ptを動かす

AI

初めに
環境
準備
実行

初めに

公開されたので触っていきます

東北大BERT-largeに対し、batch size 8192, 系列長 256で、日本語WikipediaやMMARCOといった弱教師データによる対照事前学習を行ったモデルであるshioriha-large-ptを公開しました。

文埋め込みにおける日本語対照事前学習の事例はまだまだ少ないので、今後も頑張ります！！https://t.co/sBB9CiFhxT
— hpp (@hpp_ricecake) 2024年3月12日

環境

L4 GPU
ubuntu22.04

準備

ライブラリを入れていきます

!pip install -U sentence-transformers
!pip install fugashi
!pip install unidic_lite

実行

サンプルのコードを実行します

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('cl-nagoya/shioriha-large-pt')
embeddings = model.encode(sentences)
print(embeddings)

結果は以下のようになります

[[ 0.239826    0.48606512 -0.4585114  ...  0.12551573  0.37342685
   0.6000814 ]
 [-0.33355263  0.3741393  -0.8048501  ...  0.20616896  0.27591145
  -0.36157683]]