The course aims to provide methodologies for hardware/software co-design solutions that enable the execution of AI models on area- and power-constrained edge devices. In the first part, following an introduction to quantization strategies for reducing the footprint of AI models to make them deployable on resource-limited devices, the course will focus on the open-source and extensible RISC-V Instruction Set Architecture (ISA) as a method for creating energy-efficient, domain-specialized, yet software-programmable edge AI processors.
In the core section, leveraging the massive parallelism inherent in AI kernel operations, as well as their tolerance to low-precision integer arithmetic, the course will focus on parallel optimization strategies to execute low-precision integer AI kernels on multi-core, low-power edge platforms with high computational efficiency. To address the limitations of fetch-execute architectures highlighted by Amdahl’s Law, the course will introduce the concept of tightly-coupled accelerators as cooperative coprocessors, designed to enhance specific sections of AI kernels and further reduce end-to-end execution latency. Each lecture on hardware blocks will be complemented with the introduction of related computing paradigms and practical programming examples.
In the final section, the course will explore the impact of data movement on the overall execution performance of AI models and introduce tiling strategies to mitigate the "memory wall" effect. To conclude, all concepts will be integrated into an end-to-end execution of a DNN model on a multi-core edge device.
The course will close with a discussion of emerging computing paradigms for AI, such as Analog-in-Memory Computing (AIMC), and the system-level challenges of integrating AIMC into heterogeneous analog/digital systems.
Tentative syllabus (subject to adjustment based on class needs):
1. Introduction to DNNs, CNNs, and Quantized CNNs for deployment on resource-constrained edge devices (2h)
2. Introduction to the Parallel Ultra Low Power (PULP) RISC-V compute platform and the RISC-V ISA (2h)
3. RISC-V ISA extensions for edge AI and parallel execution of optimized AI kernels on PULP (3h)
4. Tightly-coupled AI accelerators and how to program them (3h)
5. Efficient data movement: introduction to tiling strategies and execution of end-to-end DNN models on PULP (3h)
6. Outlook: energy efficiency promises of Analog-in-Memory Computing and introduction to system-level challenges in heterogeneous integration (2h)
The course aims to provide methodologies for hardware/software co-design solutions that enable the execution of AI models on area- and power-constrained edge devices. In the first part, following an introduction to quantization strategies for reducing the footprint of AI models to make them deployable on resource-limited devices, the course will focus on the open-source and extensible RISC-V Instruction Set Architecture (ISA) as a method for creating energy-efficient, domain-specialized, yet software-programmable edge AI processors.
In the core section, leveraging the massive parallelism inherent in AI kernel operations, as well as their tolerance to low-precision integer arithmetic, the course will focus on parallel optimization strategies to execute low-precision integer AI kernels on multi-core, low-power edge platforms with high computational efficiency. To address the limitations of fetch-execute architectures highlighted by Amdahl’s Law, the course will introduce the concept of tightly-coupled accelerators as cooperative coprocessors, designed to enhance specific sections of AI kernels and further reduce end-to-end execution latency. Each lecture on hardware blocks will be complemented with the introduction of related computing paradigms and practical programming examples.
In the final section, the course will explore the impact of data movement on the overall execution performance of AI models and introduce tiling strategies to mitigate the "memory wall" effect. To conclude, all concepts will be integrated into an end-to-end execution of a DNN model on a multi-core edge device.
The course will close with a discussion of emerging computing paradigms for AI, such as Analog-in-Memory Computing (AIMC), and the system-level challenges of integrating AIMC into heterogeneous analog/digital systems.
Tentative syllabus (subject to adjustment based on class needs):
1. Introduction to DNNs, CNNs, and Quantized CNNs for deployment on resource-constrained edge devices (2h)
2. Introduction to the Parallel Ultra Low Power (PULP) RISC-V compute platform and the RISC-V ISA (2h)
3. RISC-V ISA extensions for edge AI and parallel execution of optimized AI kernels on PULP (3h)
4. Tightly-coupled AI accelerators and how to program them (3h)
5. Efficient data movement: introduction to tiling strategies and execution of end-to-end DNN models on PULP (3h)
6. Outlook: energy efficiency promises of Analog-in-Memory Computing and introduction to system-level challenges in heterogeneous integration (2h)
-
-
The course aims to provide methodologies for hardware/software co-design solutions that enable the execution of AI models on area- and power-constrained edge devices. In the first part, following an introduction to quantization strategies for reducing the footprint of AI models to make them deployable on resource-limited devices, the course will focus on the open-source and extensible RISC-V Instruction Set Architecture (ISA) as a method for creating energy-efficient, domain-specialized, yet software-programmable edge AI processors.
Guest lecturer:
Angelo Garofalo (Post-doctoral Researcher at ETH Zurich, Switzerland): Angelo Garofalo currently holds a position as Junior Assistant Professor (RTD-A) at the University of Bologna, Italy, and he is a post-doctoral researcher at the ETH Zurich, Switzerland. Research interests include heterogeneous compute architectures for mixedcriticality edge systems, AI/ML acceleration, custom extensions to RISC-V ISA, timepredictable hardware, safety solutions for critical processors, hardware for security. During his career he gained experience in ASIC SoC Design (frontend, backend and sign-off), Hardware-Software co-design of heterogeneous systems, design of digital accelerators, RISC-V Instruction Set Architecture, RISC-V processors’ design. He published more than 30
contributions in relevant international conference and journal venues.
The course aims to provide methodologies for hardware/software co-design solutions that enable the execution of AI models on area- and power-constrained edge devices. In the first part, following an introduction to quantization strategies for reducing the footprint of AI models to make them deployable on resource-limited devices, the course will focus on the open-source and extensible RISC-V Instruction Set Architecture (ISA) as a method for creating energy-efficient, domain-specialized, yet software-programmable edge AI processors.
Guest lecturer:
Angelo Garofalo (Post-doctoral Researcher at ETH Zurich, Switzerland): Angelo Garofalo currently holds a position as Junior Assistant Professor (RTD-A) at the University of Bologna, Italy, and he is a post-doctoral researcher at the ETH Zurich, Switzerland. Research interests include heterogeneous compute architectures for mixedcriticality edge systems, AI/ML acceleration, custom extensions to RISC-V ISA, timepredictable hardware, safety solutions for critical processors, hardware for security. During his career he gained experience in ASIC SoC Design (frontend, backend and sign-off), Hardware-Software co-design of heterogeneous systems, design of digital accelerators, RISC-V Instruction Set Architecture, RISC-V processors’ design. He published more than 30
contributions in relevant international conference and journal venues.