Exploration and mapping on ultra-parallel heterogeneous architectures: A performance analysis and exploration of a NAS algorithm for deep neural networks
keywords ARTIFICIAL INTELLIGENCE, C, COMPILERS, CONVOLUTIONAL NEURAL NETWORKS, DEEP LEARNING, DEEP NEURAL NETWORKS, DESIGN SPACE EXPLORATION, EMBEDDED SYSTEMS, ENERGY EFFICIENCY, FIRMWARE, HARDWARE ACCELERATORS, LOW POWER, MICROCONTROLLERS, SOFTWARE, SOFTWARE ACCELERATION
Reference persons DANIELE JAHIER PAGLIARI
External reference persons Matteo Risso (Politecnico di Torino)
Alessio Burrello (Politecnico di Torino)
Angelo Garofalo (Università di Bologna)
Thesis type EXPERIMENTAL, SOFTWARE DEVELOPMENT
Description The evolution of computing architectures has led to the adoption of heterogeneous ultra-parallel systems that combine clusters of digital cores with analog accelerators. These scalable architectures offer several opportunities to accelerate complex algorithms, such as neural networks. However, designing and implementing an optimal mapping of neural networks on such architectures requires accurate knowledge of the underlying hardware. Examples are the latency of the interconnection between the different components and the performance of the different processing units.
In this context, two main problems are the optimal mapping of an architecture fixed to the hardware and the search for a new topology that can make the most of the underlying hardware thanks to Neural Architecture Search techniques.
This thesis aims to explore and develop an optimal mapping method for neural networks on ultra-parallel heterogeneous architectures. The main target will be a scalable hardware architecture, which can be seen in Fig.1. The architecture allows for a basic unit composed of 32 digital cores, an analog accelerator and a shared memory. The goal will be to map on an architecture composed of N units, given neural architectures of benchmarks. Furthermore, we intend to explore a Neural Architecture Search (NAS) to identify the most suitable neural network for the hardware architecture specifications as the number of base units in the hardware varies.
● Study of the heterogeneous architecture, including the constitution of the basic unit, containing the digital cores, the analog accelerator and the L1 memory.
● Analysis of the performance and latency factors associated with the various processing units.
● Exploration of existing mapping algorithms for neural networks on heterogeneous architectures. Performance evaluation and identification of specific challenges related to the considered hardware architecture.
● Development of an optimal mapping method considering the specific hardware architecture. The distinguishing factor will be to identify the allocation position of the various layers, given the difference in communication latency between various units within the hardware.
● Exploration of a Neural Architecture Search (NAS) aimed at finding the best neural network for the considered hardware architecture, taking into account the specifics of the mapping constraints and the performance of the hardware.
● Collection of experimental data through different public benchmarks with different mapping configurations and neural networks.
See also picture 1.png
Required skills Required skills include C and Python programming. Further, a basic knowledge of computer architectures and embedded systems is necessary. Knowledge of deep learning and the corresponding models is also required.
Notes Thesis in collaboration with Prof. Luca Benini’s research group at the University of Bologna and ETH Zurich. The thesis will be carried out in Turin with the addition of supervisors from the other two universities.
Deadline 31/12/2023 PROPONI LA TUA CANDIDATURA