初めに
microsoftからスクリーンショットの内容を理解するモデルおよびライブラリが出ているので、これを触っていきます。 環境構築のために、docker環境を作っていきます
以下でdocker環境をまとめたリポジトリをあげています
開発環境
- Windows
- Docker
詳細
master ブランチには、docker環境はなかったので以下を参考に作成していきます
Dockerfileおよびdownload.pyを作成します。
# Dockerfile for OmniParser with GPU and OpenGL support.
#
# Base: nvidia/cuda:12.3.1-devel-ubuntu22.04
# Features:
# - Python 3.12 with Miniconda environment.
# - Git LFS for large file support.
# - Required libraries: OpenCV, Hugging Face, Gradio, OpenGL.
# - Gradio server on port 7861.
#
# 1. Build the image with CUDA support.
# ```
# sudo docker build -t omniparser .
# ```
#
# 2. Run the Docker container with GPU access and port mapping for Gradio.
# ```bash
# sudo docker run -d -p 7861:7861 --gpus all --name omniparser-container omniparser
# ```
#
# Author: Richard Abrich (richard@openadapt.ai)
FROM nvidia/cuda:12.3.1-devel-ubuntu22.04
# Install system dependencies with explicit OpenGL libraries
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
git-lfs \
wget \
libgl1 \
libglib2.0-0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& git lfs install
# Install Miniconda for Python 3.12
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && \
bash miniconda.sh -b -p /opt/conda && \
rm miniconda.sh
ENV PATH="/opt/conda/bin:$PATH"
# Create and activate Conda environment with Python 3.12, and set it as the default
RUN conda create -n omni python=3.12 && \
echo "source activate omni" > ~/.bashrc
ENV CONDA_DEFAULT_ENV=omni
ENV PATH="/opt/conda/envs/omni/bin:$PATH"
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy project files and requirements
COPY . .
COPY requirements.txt /usr/src/app/requirements.txt
# Initialize Git LFS and pull LFS files
RUN git lfs install && \
git lfs pull
# Install dependencies from requirements.txt with specific opencv-python-headless version
RUN . /opt/conda/etc/profile.d/conda.sh && conda activate omni && \
pip uninstall -y opencv-python opencv-python-headless && \
pip install --no-cache-dir opencv-python-headless==4.8.1.78 && \
pip install -r requirements.txt && \
pip install huggingface_hub
# Run download.py to fetch model weights and convert safetensors to .pt format
RUN . /opt/conda/etc/profile.d/conda.sh && conda activate omni && \
python download.py && \
echo "Contents of weights directory:" && \
ls -lR weights && \
python weights/convert_safetensor_to_pt.py
# Expose the default Gradio port
EXPOSE 7861
# Configure Gradio to be accessible externally
ENV GRADIO_SERVER_NAME="0.0.0.0"
import os from huggingface_hub import snapshot_download # Set the repository name repo_id = "microsoft/OmniParser" # Set the local directory where you want to save the files local_dir = "weights" # Create the local directory if it doesn't exist os.makedirs(local_dir, exist_ok=True) # Download the entire repository snapshot_download(repo_id, local_dir=local_dir, ignore_patterns=["*.md"]) print(f"All files and folders have been downloaded to {local_dir}")
ビルドおよび実行をします
docker build -t omniparser . docker run -it -p 7860:7860 omniparserdocker run -it --gpus all -p 7861:7861 --name omniparser-container omniparser /bin/bash```
docker環境に入った後に gradioを実行します
python gradio_demo.py
適当な画像(以下はリポジトリ内のサンプル画像)を渡すと、画像内部にあるものを理解して提示してくれます
