KEYWORD |
Effective reliability evaluation of Neural Networks
keywords GPUS, NEURAL NETWORKS, RELIABILITY
Reference persons MATTEO SONZA REORDA
External reference persons Esteban Rodriguez, Juan David Guerrero
Research Groups DAUIN - GR-05 - ELECTRONIC CAD & RELIABILITY GROUP - CAD
Description Neural Networks (NNs) are increasingly used in applications (e.g., in the automotive and robotic domains) where reliability and safety are crucial. This means that possible faults affecting the hardware executing the NN (often corresponding to GPUs) may produce failures potentially able to produce critical consequences. In all these applications, standards and regulations (e.g. ISO 26262 for automotive systems) mandate the adoption of procedures to estimate the probability that the system produces a critical failure. Unfortunately, evaluating the effects of all possible faults in complex devices such as GPUs is computationally unfeasible. For this reason, new solutions are required, able to trade-off between the computational complexity and the result accuracy. The CAD Group acting within the Dept. of Control and Computer Engineering works on this topic since several years, in cooperation with semiconductor and system companies (e.g., STMicroelectronics and NVIDIA). To support its activities, the group developed a VHDL model of an NVIDIA GPU (named FlexGripPlus), and used it to prove the advantages and disadvantages of each solution. The current activities, where Master thesis students could play a key role, are based on combining RTL fault simulation (which is accurate but too slow) with microarchitectural simulation (e.g., based on the NVBitFI tool). Students will be asked to perform different tasks, such as
1) setting up an environment for performing the simulation of a NN run on a GPU in the presence of permanent and transient faults
2) performing the fault simulation experiments, thus estimating the percentage of possible faults which could produce a critical failure. Since NNs are often used to classify images, this may correspond to the NN to wrongly perform the classification due to the fault.
3) analyzing the results and possibly proposing solutions based on modifying the HW and/or the SW to reduce the probability of a critical failure.
Required skills The thesis proposal may be performed by students enrolled in either the Computer Engineering or the Electronics curricula. The possible involvement of students enrolled in other curricula will be evaluated case by case.
Deadline 16/01/2024
PROPONI LA TUA CANDIDATURA