Automatic generation of ground-truth data for hand-pose estimation algorithms
Reference persons PAOLO ERNESTO PRINETTO
External reference persons INDACO Marco (PhD Candidate)
Research Groups TESTGROUP - TESTGROUP
Thesis type EXPERIMENTAL
Description Machine learning algorithms has been successfully applied to the problem of human and body pose reconstruction, usually under the name of appearance based techniques. In particular, the random forests classifiers proved to provide real-time performance in the task of estimating the human body or hand pose from depthmaps (Image 1) generated by consumer range cameras (e.g. the Microsoft Kinect). Like other supervised machine learning algorithms, the random forests classifiers require a set of pre-labeled images (depthmaps in our-case) used to build a ground truth needed for both training and testing. The pre-labeled images consist of images where each pixel is labeled (i.e. colored) depending on the area of the hand it belongs to (Image 2). For training purposes, synthetic data can be used has a replacement of real hand-labeled data. For the testing phase, real data, captured from range cameras and labeled manually by human operators, provide a better way to estimate the accuracy of the classifier. Building the ground truth manually (e.g. using image editing programs) can be a time-consuming task, prone to error and difficult to be performed by a single operator on a vast number of captures.
The goal of the thesis is to investigate and implement tools and algorithms which can be employed in the process of ground-truth generation from real data. The suggested approach consists in:
• the design and the development of a colored glove, similar to the one in Image 3. The glove must be developed with comfort of use in mind. Furthermore, the distribution of color labels across glove surface must minimize mislabeling problems during capture.
• the development of tools for automatizing the process of ground-truth generation using the colored glove: given a sequences of frames from the RGB camera of a Kinect-like device, the tools must automatically segment glove labels.
• (Possibly) The development of a metric to compare the label segmentation with other label segmentation processes. The metric can be based on a pixel by pixel comparison or on the estimation of hand joint position.
See also thesis description.docx
Required skills Attendance of a computer-vision class is suggested but not required. Good programming skills and knowledge of computer vision libraries, as well as previous working experience with consumer range-cameras (Microsoft Kinect, Primesense, Asus) and relative SDKs are welcome.
Notes This thesis is part of the PARLOMA project. As such, the thesis will be developed in cooperation with the CNR – Istituto di Elettronica e di Ingeneria Informatica e delle Telecomunicazioni (IEIIT).
Deadline 27/09/2015 PROPONI LA TUA CANDIDATURA