The course aims to introduce the challenges related to the deployment and efficient execution of artificial intelligence (AI) applications, especially those based on deep neural networks (DNNs).
It discusses how the various steps of an AI application’s lifecycle can be distributed among edge and cloud devices (from data collection to model training and inference), focusing in particular on how to efficiently execute the inference phase on hardware different from the high-performance servers available in the cloud, such as mobile or , embedded Internet of Things (IoT) devices.
This encompasses three main objectives:
1) Understanding the complex ecosystem behind AI applications deployed in the real world, and the characteristics of the compute devices available at the edge and in the cloud. Understanding the challenges arising from the use of portable, low-power devices as (i) the main gateway to collect, pre-process, and exchange raw data, and as (ii) the main hosting node for the execution of DNN models.
2) Introducing techniques (such as quantization, pruning, neural architecture search, etc) to train and optimize efficient DNNs, carefully considering their memory occupation, execution latency, and energy (or CO2) consumption. Describing how efficient inference engines optimize the execution of DNN workloads. All the techniques that will be presented are currently applied in real-world AI deployments at any scale, from tiny models executing at the edge, to LLMs in the cloud.
3) Introducing the software stacks, communication protocols and data formats employed in distributed platforms for AI-based applications, to let multiple inference nodes communicate with each other and/or with a private server.
The course aims to introduce the challenges related to the deployment and efficient execution of artificial intelligence (AI) applications, especially those based on deep neural networks (DNNs).
It discusses how the various steps of an AI application’s lifecycle can be distributed among edge and cloud devices (from data collection to model training and inference), focusing in particular on how to efficiently execute the inference phase on hardware different from the high-performance servers available in the cloud, such as mobile or embedded Internet of Things (IoT) devices.
This encompasses two main objectives:
1) Understanding the complex ecosystem behind AI applications deployed in the real world, and the characteristics of the compute devices available at the edge and in the cloud. Understanding the challenges arising from the use of portable, low-power devices as (i) the main gateway to collect, pre-process, and exchange raw data, and as (ii) the main hosting node for the execution of DNN models.
2) Introducing techniques (such as quantization, pruning, neural architecture search, etc) to train and optimize efficient DNNs, carefully considering their memory occupation, execution latency, and energy (or CO2) consumption. Describing how efficient inference engines optimize the execution of DNN workloads. All the techniques that will be presented are currently applied in real-world AI deployments at any scale, from tiny models executing at the edge, to LLMs in the cloud.
The contents and thus the skills acquired at the end of the course include both the hardware aspects of the problem (architectures of "edge" devices) and the software aspects (programming models, protocols, and related APIs).
The skills acquired will allow a correct understanding of complex AI deployments (such as those in the IoT domain), in which the flow of data is processed not only on servers but also, partially or entirely, on local devices with reduced computational resources and energy budget. At the end of the course, each student will be able to train, optimize, deploy an AI system on a low-power commercial board, and to implement the data-exchange protocols to communicate raw-data and distilled information among distributed end-nodes.
Specific Learning Outcomes:
- Understanding of the topics covered and specifically of the hardware and software technologies involved in real-world deployments of AI applications, with a focus on the IoT domain.
- Understanding and use of AI model design and optimization tools, with particular attention to non-functional metrics (such as latency, memory, energy, and power consumed) and to DNN models.
- Being able to recognize and use adequate programming tools and communication protocols to share and distribute data across connected devices.
- Being able to design, integrate, and evaluate the main components of a distributed IoT solution using the appropriate programming tools.
The contents and thus the skills acquired at the end of the course include both the hardware aspects of the problem (architectures of "edge" devices) and the software aspects (programming models, protocols, and related APIs).
The skills acquired will allow a correct understanding of complex AI deployments (such as those in the IoT domain), in which the flow of data is processed not only on servers but also, partially or entirely, on local devices with reduced computational resources and energy budget. At the end of the course, each student will be able to train, optimize, deploy an AI system on a low-power commercial board.
Specific Learning Outcomes:
- Understanding of the topics covered and specifically of the hardware and software technologies involved in real-world deployments of AI applications, with a focus on the IoT domain.
- Understanding and use of AI model design and optimization tools, with particular attention to non-functional metrics (such as latency, memory, energy, and power consumed) and to DNN models.
- Theory and basic concepts of machine learning and deep learning
- Software programming theories and tools
- Object-oriented programming
- Basic concepts on computer architectures and networks
- Theory and basic concepts of machine learning and deep learning
- Software programming theories and tools
- Basics of Object-oriented programming
- Basic concepts on computer architectures and networks
The course topics are organized in three main parts:
1. HW and SW technologies adopted in a typical AI-based application; computer architectures used for running and developing machine learning algorithms at the edge and in the cloud; basic knowledge about sensors as data sources; modeling of non-functional metrics (performance, memory occupation, energy consumption).
2. Efficient AI: resource-driven model optimization of deep neural networks (quantization, pruning, NAS, etc), industrial frameworks and engines for efficient model training, optimization and deployment.
3. Data exchange: Distributed software platforms for edge computing, management of edge-fog-cloud interfaces (web programming/network programming of IoT protocols - REST response and Publish subscribe and MQTT), Microservices design patterns, cloud/edge workload balancing.
Lab practices will touch upon the above topics, teaching students to implement their own optimization and communication applications using as benchmark real-life use-cases where IoT data will be sampled, pre-processed, distilled and communicated with a dedicated server node.
The course topics are organized in two main parts:
1. HW and SW technologies adopted in a typical AI-based application; computer architectures used for running and developing machine learning algorithms at the edge and in the cloud; basic knowledge about sensors as data sources; modeling of non-functional metrics (performance, memory occupation, energy consumption).
2. Efficient AI: resource-driven model optimization of deep neural networks (quantization, pruning, NAS, etc), industrial frameworks and engines for efficient model training, optimization and deployment.
Lab practices will touch upon the above topics, teaching students to implement their own optimization applications using as benchmark real-life use-cases where IoT data will be sampled, pre-processed, distilled and communicated with a dedicated server node.
This course corresponds to the first part of the longer version (8 CFU) offered to Data Science and Engineering students (code: 02VRVSM), which also covers aspects related to communication protocols, data formats, etc, for distributed applications.
This course corresponds to the first part of the longer version (8 CFU) offered to Data Science and Engineering students (code: 02VRVSM), which also covers aspects related to communication protocols, data formats, etc, for distributed applications.
The structure of the course reflects the organization of the main topics, with three main teaching blocks organized as follows:
Part 1 [12h] – Deployment of AI-based Applications (definitions, architectures, challenges, and technologies);
Part 2 [24h] – Efficient AI (commercial training, optimization and inference frameworks, resource-driven optimization techniques for deep neural networks);
Part 3 [12h] - Communication protocols and their implementation (how to send/receive data from/to edge/remote servers)
Within each block, there are lab sessions [30h] during which students (groups of 2-3 people) can practice with a real software implementation of the "theoretic" concepts and strategies introduced during the regular classes.
The structure of the course reflects the organization of the main topics, with two main teaching blocks organized as follows:
Part 1 [15h] – Deployment of AI-based Applications (definitions, architectures, challenges, and technologies);
Part 2 [25h] – Efficient AI (commercial training, optimization and inference frameworks, resource-driven optimization techniques for deep neural networks);
Within each block, there are lab sessions [20h] during which students (groups of 2-3 people) can practice with a real software implementation of the "theoretic" concepts and strategies introduced during the regular classes.
Class handouts and additional material will be made available on the course webpage. User guides and tutorials for lab sessions will be made available as well, including code templates and all the needed tools/library.
Class handouts and additional material will be made available on the course webpage. User guides and tutorials for lab sessions will be made available as well, including code templates and all the needed tools/library.
Slides; Esercitazioni di laboratorio;
Lecture slides; Lab exercises;
Modalità di esame: Elaborato progettuale in gruppo; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;
Exam: Group project; Computer-based written test in class using POLITO platform;
...
The exam consists of two mandatory parts:
1. the evaluation of three (3) group projects assigned during the course, one assignment for each of the three main parts of the course; the maximum score for each delivered project is 6 points (total: 18 points).
2. a written test on the theoretical aspects introduced during the course, including numerical exercises and open-ended questions. The time allowed for the test is 1 hours, closed books; the maximum score for this part is 14 points.
The final score is the sum of the score obtained in the two parts. A minimum of 6 points at the written test is required for passing. A score of 30 cum laude is awarded for students obtaining more than 31 total points (written test + labs).
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Group project; Computer-based written test in class using POLITO platform;
The exam consists of two mandatory parts:
1. the evaluation of two (2) group projects assigned during the course, one assignment for each of the two main parts of the course; the maximum score for each delivered project is 9 points (total: 18 points).
2. a written test on the theoretical aspects introduced during the course, including numerical exercises and open questions. The time allowed for the test is 1 hour, closed books; the maximum score for this part is 14 points.
The final score is the sum of the score obtained in the two parts. A minimum of 6 points at the written test is required for passing the exam. A score of 30 cum laude is awarded for students obtaining more than 31 total points (written test + labs).
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.