Politecnico di Torino | Servizi per la didattica

KEYWORD

Automatic generation of ground-truth data for hand-pose estimation algorithms

keywords COMPUTER VISION, DIGITAL SYSTEM DESIGN TEST AND VERIFICATION, FINGER TRACKING, HAND-POSE ESTIMATION

Reference persons PAOLO ERNESTO PRINETTO

External reference persons INDACO Marco (PhD Candidate)

Thesis type EXPERIMENTAL

Description Machine learning algorithms has been successfully applied to the problem of human and body pose reconstruction, usually under the name of appearance based techniques. In particular, the random forests classifiers proved to provide real-time performance in the task of estimating the human body or hand pose from depthmaps (Image 1) generated by consumer range cameras (e.g. the Microsoft Kinect). Like other supervised machine learning algorithms, the random forests classifiers require a set of pre-labeled images (depthmaps in our-case) used to build a ground truth needed for both training and testing. The pre-labeled images consist of images where each pixel is labeled (i.e. colored) depending on the area of the hand it belongs to (Image 2). For training purposes, synthetic data can be used has a replacement of real hand-labeled data. For the testing phase, real data, captured from range cameras and labeled manually by human operators, provide a better way to estimate the accuracy of the classifier. Building the ground truth manually (e.g. using image editing programs) can be a time-consuming task, prone to error and difficult to be performed by a single operator on a vast number of captures.

The goal of the thesis is to investigate and implement tools and algorithms which can be employed in the process of ground-truth generation from real data. The suggested approach consists in:

• the design and the development of a colored glove, similar to the one in Image 3. The glove must be developed with comfort of use in mind. Furthermore, the distribution of color labels across glove surface must minimize mislabeling problems during capture.

• the development of tools for automatizing the process of ground-truth generation using the colored glove: given a sequences of frames from the RGB camera of a Kinect-like device, the tools must automatically segment glove labels.

• (Possibly) The development of a metric to compare the label segmentation with other label segmentation processes. The metric can be based on a pixel by pixel comparison or on the estimation of hand joint position.