Ricerca CERCA

Is machine learning suitable to evaluate GPU reliability?


Reference persons STEFANO DI CARLO

External reference persons Alessandro Vallero (alessandro.vallero@polito.it)


Thesis type RESEARCH

Description What is this thesis about?

Recent years have witnessed an increase of computational power demand in several application domains. General Purpose computing on Graphics Processing Units (GPGPU) has gained a primary role in the delivery of high computational power leveraging the inherent high parallel architecture of GPUs to accelerate complex tasks. In this scenario, GPUs are no longer employed just for graphics. They have increasingly found application in areas where reliability is a primary concern (i.e., advanced driver assistance systems, aviation, medicine, super computing, etc.). This trend is however threatened by the technology shrinking, which has a detrimental e ect on the susceptibility to faults for new devices. Characterization of the reliability of GPGPU systems is therefore becoming a mandatory task.
One of the main open challenges in evaluating the reliability of GPGPU systems is the development of fast and accurate reliability assessment tools able to properly trade-o simulation time and accuracy and providing information able to guide the system designers in the choice of proper architectural parameters and error protection mechanisms to achieve the target reliability and performance requirements.
This thesis is devoted to the study of a new methodology to evaluate reliability of GPUs through machine learning.

What will I do?
You will evaluate reliability of GPUs by means of machine learning. The new methodology will:
- be a mix between machine learning and state-of-the-art reliability evaluation techniques
- take into account the executed software as well as the performance indicators of the GPU

Experimental results will be obtained resorting to a micro-architecture simulator.

See also  gpu reliability machine learning.pdf 

Required skills - strong skills in programming languages (C/C++, Python, Bash)
- basic knowledge of machine learning
- basic knowledge of computer architecture and/or GPU architecture
- experience with Linux environment

Deadline 18/09/2018      PROPONI LA TUA CANDIDATURA

© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY