Intel® OpenVINO™ 实操：优化 YOLOv10

2024年6月18日 2014点热度 1人点赞 0条评论

实时物体检测旨在以低延迟准确预测图像中的物体类别和位置。YOLO 系列凭借其在性能和效率之间的平衡而处于该研究的前沿。然而，对 NMS 的依赖和架构效率低下阻碍了最佳性能。YOLOv10 通过引入一致的双重分配以实现无 NMS 训练和整体效率-准确度驱动的模型设计策略来解决这些问题。

YOLOv10 由清华大学的研究人员基于Ultralytics Python 软件包开发，引入了一种实时物体检测的新方法，解决了之前 YOLO 版本中存在的后处理和模型架构缺陷。通过消除非最大抑制 (NMS) 并优化各种模型组件，YOLOv10 实现了最先进的性能，同时显著降低了计算开销。大量实验表明，它在多个模型规模上都具有出色的准确率-延迟权衡。

教程简介

本教程演示了如何使用 OpenVINO 运行和优化 PyTorch YOLO V10 的分步说明。

本教程包括以下步骤：

准备 PyTorch 模型
将 PyTorch 模型转换为 OpenVINO IR
使用 OpenVINO 运行模型推理
使用 NNCF 准备并运行优化管道
比较 FP16 和量化模型的性能。
对视频运行优化的模型推理
启动交互式 Gradio 演示

使用说明

这是一个独立的示例，完全依赖于其自身的代码。
关于 OpenVINO 最新版本和 Jupyter Notebook 的开发环境，可以参阅本站文章（如下），并从文章中下载包含了 OpenVINO 2024.2 最新版本和 Jupyter Notebook 环境的 docker 镜像直接运行。

Intel® OpenVINO™ 2024.2 版本正式发布

优化步骤

1. Prerequisite

import os

os.environ["GIT_CLONE_PROTECTION_ACTIVE"] = "false"

%pip install -q "nncf>=2.11.0"
%pip install --pre -Uq openvino --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
%pip install -q "git+https://github.com/THU-MIG/yolov10.git" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q "torch>=2.1" "torchvision>=0.16" tqdm opencv-python "gradio>=4.19" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install ipywidgets

如果使用上面博文里预编译好的 docker 镜像环境，则把第 4 条命令（%pip install –pre -Uq openvino –extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly）注释掉不用执行，否则会出现如下的版本冲突错误（因为 docker 镜像里已经安装了 2024.2 版本）：

运行正常完成后则会显示如下结果：

%pip install ipywidgets

注意在这一步完成后重新启动一下 python kernel，或者直接重启一下 OpenVINO 容器。

from pathlib import Path

# Fetch `notebook_utils` module
import requests

r = requests.get(
    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
)

open("notebook_utils.py", "w").write(r.text)

from notebook_utils import download_file, VideoPlayer

2. 下载 PyTorch 模型

模型作者提供了多个版本的 YOLO V10 模型。每个模型都有不同的特性，具体取决于训练参数的数量、性能和准确性。出于演示目的，我们将使用 yolov10n，但相同的步骤也适用于 YOLO V10 系列中的其他模型。

models_dir = Path("./models")
models_dir.mkdir(exist_ok=True)

model_weights_url = "https://github.com/jameslahm/yolov10/releases/download/v1.0/yolov10n.pt"
file_name = model_weights_url.split("/")[-1]
model_name = file_name.replace(".pt", "")

download_file(model_weights_url, directory=models_dir)

3. 将 PyTorch 模型导出为 OpenVINO IR 格式

如前所述，YOLO V10 代码是在 Ultralytics 库的基础上设计的，并且具有与 YOLO V8 类似的接口（您可以查看 YOLO V8 笔记本以获取有关如何使用 Ultralytics API 的更详细说明）。Ultralytics 支持使用模型类的导出方法导出 OpenVINO 模型。此外，我们可以指定负责目标输入大小、静态或动态输入形状和模型精度（FP32/FP16/INT8）的参数。INT8 量化可以在导出阶段额外执行，但为了使方法更灵活，我们考虑如何使用 NNCF 执行量化。

import types
from ultralytics.utils import ops, yaml_load, yaml_save
from ultralytics import YOLOv10
import torch

detection_labels = {
    0: "person",
    1: "bicycle",
    2: "car",
    3: "motorcycle",
    4: "airplane",
    5: "bus",
    6: "train",
    7: "truck",
    8: "boat",
    9: "traffic light",
    10: "fire hydrant",
    11: "stop sign",
    12: "parking meter",
    13: "bench",
    14: "bird",
    15: "cat",
    16: "dog",
    17: "horse",
    18: "sheep",
    19: "cow",
    20: "elephant",
    21: "bear",
    22: "zebra",
    23: "giraffe",
    24: "backpack",
    25: "umbrella",
    26: "handbag",
    27: "tie",
    28: "suitcase",
    29: "frisbee",
    30: "skis",
    31: "snowboard",
    32: "sports ball",
    33: "kite",
    34: "baseball bat",
    35: "baseball glove",
    36: "skateboard",
    37: "surfboard",
    38: "tennis racket",
    39: "bottle",
    40: "wine glass",
    41: "cup",
    42: "fork",
    43: "knife",
    44: "spoon",
    45: "bowl",
    46: "banana",
    47: "apple",
    48: "sandwich",
    49: "orange",
    50: "broccoli",
    51: "carrot",
    52: "hot dog",
    53: "pizza",
    54: "donut",
    55: "cake",
    56: "chair",
    57: "couch",
    58: "potted plant",
    59: "bed",
    60: "dining table",
    61: "toilet",
    62: "tv",
    63: "laptop",
    64: "mouse",
    65: "remote",
    66: "keyboard",
    67: "cell phone",
    68: "microwave",
    69: "oven",
    70: "toaster",
    71: "sink",
    72: "refrigerator",
    73: "book",
    74: "clock",
    75: "vase",
    76: "scissors",
    77: "teddy bear",
    78: "hair drier",
    79: "toothbrush",
}


def v10_det_head_forward(self, x):
    one2one = self.forward_feat([xi.detach() for xi in x], self.one2one_cv2, self.one2one_cv3)
    if not self.export:
        one2many = super().forward(x)

    if not self.training:
        one2one = self.inference(one2one)
        if not self.export:
            return {"one2many": one2many, "one2one": one2one}
        else:
            assert self.max_det != -1
            boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
            return torch.cat(
                [boxes, scores.unsqueeze(-1), labels.unsqueeze(-1).to(boxes.dtype)],
                dim=-1,
            )
    else:
        return {"one2many": one2many, "one2one": one2one}


ov_model_path = models_dir / f"{model_name}_openvino_model/{model_name}.xml"
if not ov_model_path.exists():
    model = YOLOv10(models_dir / file_name)
    model.model.model[-1].forward = types.MethodType(v10_det_head_forward, model.model.model[-1])
    model.export(format="openvino", dynamic=True, half=True)
    config = yaml_load(ov_model_path.parent / "metadata.yaml")
    config["names"] = detection_labels
    yaml_save(ov_model_path.parent / "metadata.yaml", config)

4. 使用 Ultralytics API 在 AUTO 设备上运行 OpenVINO 推理

现在，当我们将模型导出到 OpenVINO 时，我们可以将其直接加载到 YOLOv10 类中，其中自动推理后端将提供易于使用的用户体验，以与原始 PyTorch 模型类似的级别运行 OpenVINO YOLOv10 模型。下面的代码演示了如何使用 Ultralytics API 在单个图像上运行 OpenVINO 导出模型的推理。AUTO 设备将用于启动模型。

ov_yolo_model = YOLOv10(ov_model_path.parent, task="detect")

from PIL import Image

IMAGE_PATH = Path("./data/coco_bike.jpg")
download_file(
    url="https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bike.jpg",
    filename=IMAGE_PATH.name,
    directory=IMAGE_PATH.parent,
)

res = ov_yolo_model(IMAGE_PATH, iou=0.45, conf=0.2)
Image.fromarray(res[0].plot()[:, :, ::-1])

5. 使用 Ultralytics API 在选定设备上运行 OpenVINO 推理

在这一部分，您可以选择推理设备来运行模型推理，以便将结果与 AUTO 设备进行比较。

import openvino as ov

import ipywidgets as widgets

core = ov.Core()

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value="CPU",
    description="Device:",
    disabled=False,
)

device

可以看到，我的设备支持 CPU，GPU（集成显卡）和 AUTO 三种设备设置。如果是第14代 Meteor Lake 平台的话，应该还会有 NPU 可以选择。

ov_model = core.read_model(ov_model_path)

# load model on selected device
if "GPU" in device.value or "NPU" in device.value:
    ov_model.reshape({0: [1, 3, 640, 640]})
ov_config = {}
if "GPU" in device.value:
    ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
det_compiled_model = core.compile_model(ov_model, device.value, ov_config)

ov_yolo_model.predictor.model.ov_compiled_model = det_compiled_model

res = ov_yolo_model(IMAGE_PATH, iou=0.45, conf=0.2)

Image.fromarray(res[0].plot()[:, :, ::-1])

6. 使用 NNCF 训练后量化 API 优化模型

NNCF 提供了一套高级算法，用于 OpenVINO 中的神经网络推理优化，同时将准确度下降降至最低。我们将在训练后模式下使用 8 位量化（不使用微调管道）来优化 YOLOv10。

优化过程包含以下步骤：

创建用于量化的数据集。运行 nncf.quantize 以获取优化的模型。使用 openvino.save_model 函数序列化 OpenVINO IR 模型。量化是一个耗时且耗内存的过程，您可以使用下面的复选框跳过此步骤：

import ipywidgets as widgets

int8_model_det_path = models_dir / "int8" / f"{model_name}_openvino_model/{model_name}.xml"
ov_yolo_int8_model = None

to_quantize = widgets.Checkbox(
    value=True,
    description="Quantization",
    disabled=False,
)

to_quantize

# Fetch skip_kernel_extension module
r = requests.get(
    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
)
open("skip_kernel_extension.py", "w").write(r.text)

%load_ext skip_kernel_extension

7. 准备量化数据集

要开始量化，我们需要准备数据集。我们将使用 MS COCO 数据集中的验证子集进行模型量化，并使用 Ultralytics 验证数据加载器准备输入数据。

%%skip not $to_quantize.value

from zipfile import ZipFile

from ultralytics.data.utils import DATASETS_DIR

if not int8_model_det_path.exists():

    DATA_URL = "http://images.cocodataset.org/zips/val2017.zip"
    LABELS_URL = "https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels-segments.zip"
    CFG_URL = "https://raw.githubusercontent.com/ultralytics/ultralytics/v8.1.0/ultralytics/cfg/datasets/coco.yaml"
    
    OUT_DIR = DATASETS_DIR
    
    DATA_PATH = OUT_DIR / "val2017.zip"
    LABELS_PATH = OUT_DIR / "coco2017labels-segments.zip"
    CFG_PATH = OUT_DIR / "coco.yaml"
    
    download_file(DATA_URL, DATA_PATH.name, DATA_PATH.parent)
    download_file(LABELS_URL, LABELS_PATH.name, LABELS_PATH.parent)
    download_file(CFG_URL, CFG_PATH.name, CFG_PATH.parent)
    
    if not (OUT_DIR / "coco/labels").exists():
        with ZipFile(LABELS_PATH, "r") as zip_ref:
            zip_ref.extractall(OUT_DIR)
        with ZipFile(DATA_PATH, "r") as zip_ref:
            zip_ref.extractall(OUT_DIR / "coco/images")

上述的 warning 信息并不影响使用。如果介意的话，可以修改 ~/jupyter_notebook_config.py 的相应参数。

%%skip not $to_quantize.value

from ultralytics.utils import DEFAULT_CFG
from ultralytics.cfg import get_cfg
from ultralytics.data.converter import coco80_to_coco91_class
from ultralytics.data.utils import check_det_dataset

if not int8_model_det_path.exists():
    args = get_cfg(cfg=DEFAULT_CFG)
    args.data = str(CFG_PATH)
    det_validator = ov_yolo_model.task_map[ov_yolo_model.task]["validator"](args=args)
    
    det_validator.data = check_det_dataset(args.data)
    det_validator.stride = 32
    det_data_loader = det_validator.get_dataloader(OUT_DIR / "coco", 1)

NNCF 提供 nncf.Dataset wrapper，用于在量化管道中使用原生框架数据加载器。此外，我们指定转换函数，该函数将负责以模型预期格式准备输入数据。

%%skip not $to_quantize.value

import nncf
from typing import Dict


def transform_fn(data_item:Dict):
    """
    Quantization transform function. Extracts and preprocess input data from dataloader item for quantization.
    Parameters:
       data_item: Dict with data item produced by DataLoader during iteration
    Returns:
        input_tensor: Input data for quantization
    """
    input_tensor = det_validator.preprocess(data_item)['img'].numpy()
    return input_tensor

if not int8_model_det_path.exists():
    quantization_dataset = nncf.Dataset(det_data_loader, transform_fn)

8. 量化并保存 INT8 模型

nncf.quantize 函数提供了模型量化的接口。它需要 OpenVINO 模型和量化数据集的实例。可选地，可以提供一些用于配置量化过程的附加参数（量化样本数、预设、忽略范围等）。YOLOv10 模型包含非 ReLU 激活函数，这需要对激活进行非对称量化。为了获得更好的结果，我们将使用混合量化预设。它提供权重的对称量化和激活的非对称量化。

注意：模型训练后量化是一个耗时的过程。请耐心等待，这可能需要几分钟，具体取决于您的硬件。

%%skip not $to_quantize.value

import shutil

if not int8_model_det_path.exists():
    quantized_det_model = nncf.quantize(
        ov_model,
        quantization_dataset,
        preset=nncf.QuantizationPreset.MIXED,
    )

    ov.save_model(quantized_det_model,  int8_model_det_path)
    shutil.copy(ov_model_path.parent / "metadata.yaml", int8_model_det_path.parent / "metadata.yaml")

9. 运行优化后的模型推理

INT8 量化模型的使用方法与量化前的模型相同。让我们检查一下量化模型在单张图片上的推理结果。

9.1 在 AUTO 设备上运行优化后的模型

%%skip not $to_quantize.value
ov_yolo_int8_model = YOLOv10(int8_model_det_path.parent, task="detect")

%%skip not $to_quantize.value
res = ov_yolo_int8_model(IMAGE_PATH, iou=0.45, conf=0.2)

Image.fromarray(res[0].plot()[:, :, ::-1])

9.2 在指定的设备上运行优化后的模型

%%skip not $to_quantize.value

device

%%skip not $to_quantize.value

ov_config = {}
if "GPU" in device.value or "NPU" in device.value:
    ov_model.reshape({0: [1, 3, 640, 640]})
ov_config = {}
if "GPU" in device.value:
    ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}

quantized_det_model = core.read_model(int8_model_det_path)
quantized_det_compiled_model = core.compile_model(quantized_det_model, device.value, ov_config)

ov_yolo_int8_model.predictor.model.ov_compiled_model = quantized_det_compiled_model

res = ov_yolo_int8_model(IMAGE_PATH,  iou=0.45, conf=0.2)

Image.fromarray(res[0].plot()[:, :, ::-1])

10. 比较原始模型和优化后的模型

10.1 模型大小

ov_model_weights = ov_model_path.with_suffix(".bin")
print(f"Size of FP16 model is {ov_model_weights.stat().st_size / 1024 / 1024:.2f} MB")
if int8_model_det_path.exists():
    ov_int8_weights = int8_model_det_path.with_suffix(".bin")
    print(f"Size of model with INT8 compressed weights is {ov_int8_weights.stat().st_size / 1024 / 1024:.2f} MB")
    print(f"Compression rate for INT8 model: {ov_model_weights.stat().st_size / ov_int8_weights.stat().st_size:.3f}")

10.2 性能

10.2.1 fp16 性能

!benchmark_app -m $ov_model_path -d $device.value -api async -shape "[1,3,640,640]" -t 15

10.2.2 int8 性能

if int8_model_det_path.exists():
    !benchmark_app -m $int8_model_det_path -d $device.value -api async -shape "[1,3,640,640]" -t 15

可以看到 int8 的性能是 fp16 的性能的两倍左右。

11. 视频演示

以下代码对视频运行模型推理：

import collections
import time
from IPython import display
import cv2
import numpy as np


# Main processing function to run object detection.
def run_object_detection(
    source=0,
    flip=False,
    use_popup=False,
    skip_first_frames=0,
    det_model=ov_yolo_int8_model,
    device=device.value,
):
    player = None
    try:
        # Create a video player to play with target fps.
        player = VideoPlayer(source=source, flip=flip, fps=30, skip_first_frames=skip_first_frames)
        # Start capturing.
        player.start()
        if use_popup:
            title = "Press ESC to Exit"
            cv2.namedWindow(winname=title, flags=cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)

        processing_times = collections.deque()
        while True:
            # Grab the frame.
            frame = player.next()
            if frame is None:
                print("Source ended")
                break
            # If the frame is larger than full HD, reduce size to improve the performance.
            scale = 1280 / max(frame.shape)
            if scale < 1:
                frame = cv2.resize(
                    src=frame,
                    dsize=None,
                    fx=scale,
                    fy=scale,
                    interpolation=cv2.INTER_AREA,
                )
            # Get the results.
            input_image = np.array(frame)

            start_time = time.time()
            detections = det_model(input_image, iou=0.45, conf=0.2, verbose=False)
            stop_time = time.time()
            frame = detections[0].plot()

            processing_times.append(stop_time - start_time)
            # Use processing times from last 200 frames.
            if len(processing_times) > 200:
                processing_times.popleft()

            _, f_width = frame.shape[:2]
            # Mean processing time [ms].
            processing_time = np.mean(processing_times) * 1000
            fps = 1000 / processing_time
            cv2.putText(
                img=frame,
                text=f"Inference time: {processing_time:.1f}ms ({fps:.1f} FPS)",
                org=(20, 40),
                fontFace=cv2.FONT_HERSHEY_COMPLEX,
                fontScale=f_width / 1000,
                color=(0, 0, 255),
                thickness=1,
                lineType=cv2.LINE_AA,
            )
            # Use this workaround if there is flickering.
            if use_popup:
                cv2.imshow(winname=title, mat=frame)
                key = cv2.waitKey(1)
                # escape = 27
                if key == 27:
                    break
            else:
                # Encode numpy array to jpg.
                _, encoded_img = cv2.imencode(ext=".jpg", img=frame, params=[cv2.IMWRITE_JPEG_QUALITY, 100])
                # Create an IPython image.
                i = display.Image(data=encoded_img)
                # Display the image in this notebook.
                display.clear_output(wait=True)
                display.display(i)
    # ctrl-c
    except KeyboardInterrupt:
        print("Interrupted")
    # any different error
    except RuntimeError as e:
        print(e)
    finally:
        if player is not None:
            # Stop capturing.
            player.stop()
        if use_popup:
            cv2.destroyAllWindows()

use_int8 = widgets.Checkbox(
    value=ov_yolo_int8_model is not None,
    description="Use int8 model",
    disabled=ov_yolo_int8_model is None,
)

use_int8

WEBCAM_INFERENCE = False

if WEBCAM_INFERENCE:
    VIDEO_SOURCE = 0  # Webcam
else:
    download_file(
        "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/people.mp4",
        directory="data",
    )
    VIDEO_SOURCE = "data/people.mp4"

run_object_detection(
    det_model=ov_yolo_model if not use_int8.value else ov_yolo_int8_model,
    source=VIDEO_SOURCE,
    flip=True,
    use_popup=False,
)

12. Gradio 交互演示

import gradio as gr


def yolov10_inference(image, int8, conf_threshold, iou_threshold):
    model = ov_yolo_model if not int8 else ov_yolo_int8_model
    results = model(source=image, iou=iou_threshold, conf=conf_threshold, verbose=False)[0]
    annotated_image = Image.fromarray(results.plot())

    return annotated_image


with gr.Blocks() as demo:
    gr.HTML(
        """
    <h1 style='text-align: center'>
    YOLOv10: Real-Time End-to-End Object Detection using OpenVINO
    </h1>
    """
    )
    with gr.Row():
        with gr.Column():
            image = gr.Image(type="numpy", label="Image")
            conf_threshold = gr.Slider(
                label="Confidence Threshold",
                minimum=0.1,
                maximum=1.0,
                step=0.1,
                value=0.2,
            )
            iou_threshold = gr.Slider(
                label="IoU Threshold",
                minimum=0.1,
                maximum=1.0,
                step=0.1,
                value=0.45,
            )
            use_int8 = gr.Checkbox(
                value=ov_yolo_int8_model is not None,
                visible=ov_yolo_int8_model is not None,
                label="Use INT8 model",
            )
            yolov10_infer = gr.Button(value="Detect Objects")

        with gr.Column():
            output_image = gr.Image(type="pil", label="Annotated Image")

        yolov10_infer.click(
            fn=yolov10_inference,
            inputs=[
                image,
                use_int8,
                conf_threshold,
                iou_threshold,
            ],
            outputs=[output_image],
        )
    examples = gr.Examples(
        [
            "data/coco_bike.jpg",
        ],
        inputs=[
            image,
        ],
    )


try:
    demo.launch(debug=True)
except Exception:
    demo.launch(debug=True, share=True)

浏览量: 2,063