一款Google开源的人工智能项目

今天介绍一款Google开源的人工智能项目——Mediapipe。

MediaPipe

Mediapipe是一个基于图的数据处理管线，用于构建使用了多种形式的数据源，如视频、音频、传感器数据以及任何时间序列数据。

下面是提供的该项目名单：

Face Detection)	Face Mesh	Iris	Hands	Pose	Holistic

Hair Segmentation	Object Detection	Box Tracking	Instant Motion Tracking	Objectron	KNIFT

所以这个框架很适合二次开发，开发出自己的项目，详细内容就进入官网查看吧。

Hand

我自己的一个项目中确实使用到了这个框架，但我只使用过Hand项目，接下来我会介绍一下这个项目的使用方法。

介绍

这个项目使用了两个网络，一个网络为目标检测，另一个是姿态回归。这个姿态回归的网络还可以直接进行预测是否是“手”，因此虽然有两个网络。

在视频模式下，在第一次使用目标检测检测到手后，就可以对“手”进行追踪，不断的进行姿态预测，这样可以进一步节省时间。

21个特征点

由于是对手进行回归，所以需要了21个手的特征点，如下图所示：

根据自己的需求可以直接查找自己想用的关键点的坐标。

程序

参数

Hand提供了三个参数：

STATIC_IMAGE_MODE：检测模式，如果是视频检测，设置为False，这样可以追踪，进一步提高速度。
MAX_NUM_HANDS：手的数量
MIN_DETECTION_CONFIDENCE：目标检测的置信度
MIN_TRACKING_CONFIDENCE：姿态检测的置信度

输出

结果会得到21个点的坐标，每个点有三个坐标x,y,z的三个坐标。

没想到吧，这个项目还预测了一部分深度信息，这是最有意思的一个东西。它是根据0号点进行预测，单位是像素。但是深度信息没用，我感觉并不太准。

同时还可以区分左右手。

程序

我拷贝了官方的程序，这里就不再详细讲了。

import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands

# For static images:
IMAGE_FILES = []
with mp_hands.Hands(
    static_image_mode=True,
    max_num_hands=2,
    min_detection_confidence=0.5) as hands:
  for idx, file in enumerate(IMAGE_FILES):
    # Read an image, flip it around y-axis for correct handedness output (see
    # above).
    image = cv2.flip(cv2.imread(file), 1)
    # Convert the BGR image to RGB before processing.
    results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    # Print handedness and draw hand landmarks on the image.
    print('Handedness:', results.multi_handedness)
    if not results.multi_hand_landmarks:
      continue
    image_height, image_width, _ = image.shape
    annotated_image = image.copy()
    for hand_landmarks in results.multi_hand_landmarks:
      print('hand_landmarks:', hand_landmarks)
      print(
          f'Index finger tip coordinates: (',
          f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '
          f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})'
      )
      mp_drawing.draw_landmarks(
          annotated_image,
          hand_landmarks,
          mp_hands.HAND_CONNECTIONS,
          mp_drawing_styles.get_default_hand_landmarks_style(),
          mp_drawing_styles.get_default_hand_connections_style())
    cv2.imwrite(
        '/tmp/annotated_image' + str(idx) + '.png', cv2.flip(annotated_image, 1))

# For webcam input:
cap = cv2.VideoCapture(0)
with mp_hands.Hands(
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5) as hands:
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.
      continue

    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    results = hands.process(image)

    # Draw the hand annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_hand_landmarks:
      for hand_landmarks in results.multi_hand_landmarks:
        mp_drawing.draw_landmarks(
            image,
            hand_landmarks,
            mp_hands.HAND_CONNECTIONS,
            mp_drawing_styles.get_default_hand_landmarks_style(),
            mp_drawing_styles.get_default_hand_connections_style())
    # Flip the image horizontally for a selfie-view display.
    cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()