Is machine learning suitable to evaluate GPU reliability?
keywords DIGITAL SYSTEM DESIGN TEST AND VERIFICATION
Reference persons STEFANO DI CARLO
External reference persons Alessandro Vallero (firstname.lastname@example.org)
Research Groups TESTGROUP - TESTGROUP
Thesis type RESEARCH
Description What is this thesis about?
Recent years have witnessed an increase of computational power demand in several application domains. General Purpose computing on Graphics Processing Units (GPGPU) has gained a primary role in the delivery of high computational power leveraging the inherent high parallel architecture of GPUs to accelerate complex tasks. In this scenario, GPUs are no longer employed just for graphics. They have increasingly found application in areas where reliability is a primary concern (i.e., advanced driver assistance systems, aviation, medicine, super computing, etc.). This trend is however threatened by the technology shrinking, which has a detrimental e ect on the susceptibility to faults for new devices. Characterization of the reliability of GPGPU systems is therefore becoming a mandatory task.
One of the main open challenges in evaluating the reliability of GPGPU systems is the development of fast and accurate reliability assessment tools able to properly trade-o simulation time and accuracy and providing information able to guide the system designers in the choice of proper architectural parameters and error protection mechanisms to achieve the target reliability and performance requirements.
This thesis is devoted to the study of a new methodology to evaluate reliability of GPUs through machine learning.
What will I do?
You will evaluate reliability of GPUs by means of machine learning. The new methodology will:
- be a mix between machine learning and state-of-the-art reliability evaluation techniques
- take into account the executed software as well as the performance indicators of the GPU
Experimental results will be obtained resorting to a micro-architecture simulator.
See also gpu reliability machine learning.pdf
Required skills - strong skills in programming languages (C/C++, Python, Bash)
- basic knowledge of machine learning
- basic knowledge of computer architecture and/or GPU architecture
- experience with Linux environment
Deadline 18/09/2018 PROPONI LA TUA CANDIDATURA