Servizi per la didattica
PORTALE DELLA DIDATTICA

Optimized execution of neural networks at the edge

01DNMIU

A.A. 2021/22

Course Language

Inglese

Course degree

Doctorate Research in Ingegneria Informatica E Dei Sistemi - Torino

Course structure
Teaching Hours
Lezioni 25
Teachers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Jahier Pagliari Daniele   Ricercatore L240/10 ING-INF/05 15 0 0 0 1
Teaching assistant
Espandi

Context
SSD CFU Activities Area context
*** N/A ***    
In an era where the market request for increasingly "smart" devices clashes with the diminishing returns of pure technology scaling, the design of digital systems has become increasingly challenging. The only way to simultaneously respect all non-functional constraints (such as latency, memory occupation, energy consumption, power envelope, etc.) is through the combined optimization of both software and hardware at multiple abstraction levels. The course will introduce some of these optimizations, focusing in particular on the execution of neural networks, which are at the core of many emerging applications across different domains, ranging from healthcare to manufacturing. The first part of the course will focus on artificial neural networks and deep learning. Deep neural networks will be analyzed from a computational standpoint, identifying critical operations in terms of time, memory and energy. We will then survey the main techniques to optimize the execution of these models. Specifically, we will first introduce so-called "static" optimizations, i.e., those performed before deploying the model on the target hardware, either at training time or post-training. These include data quantization, weights and activations pruning, knowledge distillation, neural architecture search, and others. Then, we will describe "dynamic" optimizations, which adapt the execution complexity at runtime, based on external conditions (e.g., the battery state-of-charge) or on the processed data. Many of the optimizations described can be seen as instances of Approximate Computing, a design paradigm for error tolerant applications that trades-off output quality for complexity. Students will have the possibility of trying some of the optimizations seen in class in a practical session, based on popular deep learning frameworks (TensorFlow, PyTorch, etc), attempting to deploy a complex deep neural network onto a real edge device. The second part of the course will focus instead on spiking neural networks and neuromorphic computing that have arisen in recent years to cope with the increasing need for energy efficiency and high performance of modern applications. Neuromorphic computational paradigms and hardware architectures are now mature enough to play an important role in IoT applications running on-edge, because of their ability to learn and adapt to ever-changing conditions and tasks while respecting limited power requirements. During this second part we will provide an introductory overview of the main Neuromorphic Computing techniques proposed by researchers in academia and industry, focusing both on hardware architectures and on software stacks. The overall goal of the course is to hopefully provide the audience with new tools that they can use in their own research whenever they will need to execute applications based on neural networks in an optimized and highly efficient way. Importantly, while the main focus of the course is on edge devices, most of the concepts explained apply also to high-performance systems.
In an era where the market request for increasingly "smart" devices clashes with the diminishing returns of pure technology scaling, the design of digital systems has become increasingly challenging. The only way to simultaneously respect all non-functional constraints (such as latency, memory occupation, energy consumption, power envelope, etc.) is through the combined optimization of both software and hardware at multiple abstraction levels. The course will introduce some of these optimizations, focusing in particular on the execution of neural networks, which are at the core of many emerging applications across different domains, ranging from healthcare to manufacturing. The first part of the course will focus on artificial neural networks and deep learning. Deep neural networks will be analyzed from a computational standpoint, identifying critical operations in terms of time, memory and energy. We will then survey the main techniques to optimize the execution of these models. Specifically, we will first introduce so-called "static" optimizations, i.e., those performed before deploying the model on the target hardware, either at training time or post-training. These include data quantization, weights and activations pruning, knowledge distillation, neural architecture search, and others. Then, we will describe "dynamic" optimizations, which adapt the execution complexity at runtime, based on external conditions (e.g., the battery state-of-charge) or on the processed data. Many of the optimizations described can be seen as instances of Approximate Computing, a design paradigm for error tolerant applications that trades-off output quality for complexity. Students will have the possibility of trying some of the optimizations seen in class in a practical session, based on popular deep learning frameworks (TensorFlow, PyTorch, etc), attempting to deploy a complex deep neural network onto a real edge device. The second part of the course will focus instead on spiking neural networks and neuromorphic computing that have arisen in recent years to cope with the increasing need for energy efficiency and high performance of modern applications. Neuromorphic computational paradigms and hardware architectures are now mature enough to play an important role in IoT applications running on-edge, because of their ability to learn and adapt to ever-changing conditions and tasks while respecting limited power requirements. During this second part we will provide an introductory overview of the main Neuromorphic Computing techniques proposed by researchers in academia and industry, focusing both on hardware architectures and on software stacks. The overall goal of the course is to hopefully provide the audience with new tools that they can use in their own research whenever they will need to execute applications based on neural networks in an optimized and highly efficient way. Importantly, while the main focus of the course is on edge devices, most of the concepts explained apply also to high-performance systems.
A minimum knowledge in the following fields is required to enjoy the course: - Computer architecture - Embedded systems - Embedded software/firmware - Parallel computing - Machine learning
A minimum knowledge in the following fields is required to enjoy the course: - Computer architecture - Embedded systems - Embedded software/firmware - Parallel computing - Machine learning
- Introduction: post-Moore’s Law computing - Deep learning basics - Static optimizations for deep learning: quantization, pruning, approximate computing, NAS, etc. - Dynamic optimizations for deep learning: big/little, N-width networks, dynamic precision scaling, etc. - Optimizing deep learning models: practical examples - Computing with Spiking Neurons: general concepts - Computing with Spiking Neurons: example of applications - Future trends and challenges The final examination consists in a presentation of the student showing how one or more of the techniques introduced in the course can be applied to his/her own research.
- Introduction: post-Moore’s Law computing - Deep learning basics - Static optimizations for deep learning: quantization, pruning, approximate computing, NAS, etc. - Dynamic optimizations for deep learning: big/little, N-width networks, dynamic precision scaling, etc. - Optimizing deep learning models: practical examples - Computing with Spiking Neurons: general concepts - Computing with Spiking Neurons: example of applications - Future trends and challenges The final examination consists in a presentation of the student showing how one or more of the techniques introduced in the course can be applied to his/her own research.
In presenza
On site
Presentazione orale
Oral presentation
P.D.2-2 - Giugno
P.D.2-2 - June


© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY
Contatti