KEYWORD |
Reliability evaluation of AI accelerators
keywords ARTIFICIAL NEURAL NETWORKS, NEURAL NETWORKS, RELIABILITY
Reference persons MATTEO SONZA REORDA
External reference persons Esteban Rodriguez, Juan David Guerrero
Research Groups DAUIN - GR-05 - ELECTRONIC CAD & RELIABILITY GROUP - CAD
Description Neural Networks (NNs) are increasingly used in applications (e.g., in the automotive and robotic domains) where reliability and safety are crucial. In order to speed up their executions, hw accelerators (e.g., Google Tensor) are increasingly common. This means that possible faults affecting the accelerator may produce failures potentially able to create critical consequences. In all these applications, standards and regulations (e.g. ISO 26262 for automotive systems) mandate the adoption of procedures to estimate the probability that the system produces a critical failure. Unfortunately, exactly evaluating the effects of all possible faults in complex devices such as AI accelerators is computationally unfeasible. For this reason, new solutions are required, able to trade-off between the computational complexity and the result accuracy. The CAD Group acting within the Dept. of Control and Computer Engineering works on this topic since several years, in cooperation with semiconductor and system companies (e.g., STMicroelectronics and NVIDIA). The current activities, where Master thesis students could play a key role, are based on combining different tools and approaches, ranging from RTL fault simulation (which is accurate but too slow, and requires an HDL model of the target module) up to microarchitectural simulation (e.g., based on tools such as GPGPUSIM). Students will be asked to perform different tasks, such as
1) setting up an environment for performing the simulation of a HW AI accelerator in the presence of permanent and transient faults, selecting suitable NN models and data sets for the experiments
2) performing the fault simulation experiments, thus estimating the percentage of possible faults which could produce a critical failure. Since NNs are often used to classify images, this may correspond to the NN to wrongly perform the classification due to the fault.
3) analyzing the results and possibly proposing solutions based on modifying the HW and/or the SW to reduce the probability of a critical failure.
Required skills The thesis proposal may be performed by students enrolled in either the Computer Engineering or the Electronics curricula. The possible involvement of students enrolled in other curricula will be evaluated case by case.
Deadline 31/03/2024
PROPONI LA TUA CANDIDATURA