The course is taught in Italian.
Oggetto del corso sono l'acquisizione, l' elaborazione, l'analisi e la comprensione del contenuto di immagini e sequenze di immagini digitali di oggetti 2D e 3D (computer vision). Tra le molte applicazioni citiamo l'ispezione industriale, la sorveglianza, l'identificazione biometrica (impronte digitali, retiniche, immagini del viso, iride), l'analisi del movimento umano per l'intrattenimento o scopi medici e sportivi, l'analisi del territorio da immagini aeree o da satellite, la scansione 3D, la navigazione robotica. Il corso illustrerà le tecniche fondamentali e il loro uso in alcune delle principali applicazioni pratiche.
The growing demand for efficient information processing is well-supported by advanced hardware accelerators—particularly Graphics Processing Units (GPUs)—which offer high programming flexibility and can significantly accelerate a wide range of algorithms and applications. In particular, in an era defined by data-intensive applications and computational challenges, learning GPU programming opens the door to solving problems at unprecedented speed and scale. From accelerating scientific simulations and powering artificial intelligence to enabling real-time graphics and financial modeling, GPUs have become essential across a wide range of industries. By mastering GPU programming, the student will gain the ability to harness these powerful devices to create efficient, high-performance solutions.
This course is designed to enable students to work with parallel hardware accelerators, specifically Graphics Processing Units (GPUs) and General-Purpose Graphics Processing Units (GPGPUs), which have seen a continuous rise in utility and popularity over the past fifteen years (as of 2025) in applications ranging from embedded and edge computing up to High Performance Computing (HPC) domains. To address this goal, the course will explore key topics, such as the fundamentals of parallel computing and its core paradigms, a comparative study of GPU and CPU architectures and their interaction, the CUDA programming model, memory models, effective debugging and profiling strategies, current performance analysis models, and real-world applications of GPU-accelerated computing.
During the laboratory sessions, if possible, the student will explore the development and adaptation of several algorithms for embedded and HPC-class GPUs, by using parallelism strategies, the clever use of specialized libraries, and the use of sophisticated frameworks, such as Tensorflow and the Python programming language for artificial intelligence algorithms. These competencies are highly valued in today’s job market, making them a strong asset for students pursuing careers in embedded, high-performance, and parallel computing.
Lo studente acquisirà le conoscenze fondamentali relative a:
- sensori e sistemi per l'acquisizione dell'immagine
- modellazione dei sistemi ottici e loro funzioni di trasferimento
- analisi in frequenza delle immagini
- tecniche per il miglioramento e la ricostruzione di immagini affette da vari tipi di rumore ed alterazioni (rumore termico dei sensori, imperfezioni dell'ottica, movimento relativo, etc)
- tecniche per la segmentazione e l'estrazione di elementi caratteristici di un'immagine - tecniche per il riconoscimento di oggetti 2D e 3D
- tecniche per l'analisi del movimento
Tramite queste conoscenze e numerosi esempi applicativi, lo studente verrà messo in grado di progettare un sistema di visione mediante calcolatore.
At the end of the course, students will be able to master:
- Parallel computing fundamentals and its rules
- GPU and CPU architectures (comparison)
- The CUDA programming model, strategies for debugging, executing, and profiling on GPUs
- Performance analysis models for parallel programs
- Applications of GPU programming
- Realistic applications on embedded GPU cards
Elementi di analisi matematica, analisi di segnali monodimensionali, algebra lineare, calcolo delle probabilità.
C/C++ programming;
Python programming (suggested, but not required);
Distributed computing (suggested, but not required);
Computer architectures;
Operating systems.
Argomenti principali e peso in crediti
- Sistemi di acquisizione immagini(1 cr)
- Elaborazione immagini. Trasformate 2D e funzioni di trasferimento(1 cr)
- Miglioramento e ricostruzione immagini(0.5 cr)
- Segmentazione ed estrazione dati significativi(0.5 cr)
- Riconoscimento 2D e 3D(1 cr)
- Analisi movimento(0.5 cr)
- Casi di studio(1.5 cr)
- Introduction to parallel computing: (1 cfu)
Classification of Parallel Computers.
Amdahl's law, Flynn's taxonomy: SISD, SIMD, MISD, and MIMD.
General architecture of classical and modern GPUs.
Comparison and interaction of CPU with GPU architectures.
- GPGPU concurrency and organization: (1 cfu)
Multiprocessors, streaming processors, and SIMT cores, internal general-purpose cores (CUDA cores), and specialized in-chip hardware accelerators (Tensor Cores, matrix cores, and Special Function Units).
GPU memory hierarchy advantages and constraints: Global, local, shared, constant, and cache memories. Memory models for GPU-accelerated computing programs (Pageable, Unified, Pinned, and Mapped). Brief overview and comparison of the architecture of embedded and HPC-class GPUs.
Fundamentals of the GPGPU programming model: The concepts of Grids, Blocks, Warps, and Threads.
GPGPU profiling.
- GPGPU-programming fundamentals: (2 cfu)
The CUDA programming model
Threading optimization and the trade-offs between memory and computing bottlenecks.
The roofline model: a practical performance model for GPU-accelerated programs.
GPGPU convolution and memory management,
The GPGPU task parallelism (Streams).
Debugging, tiling, ray tracing, and libraries
- Classical and modern applications: Reduction, Scan, Sorting, Matrix Multiplication, Convolution, and Stencil-based applications, including Image filtering (2 cfu).
Le esercitazioni di laboratorio prevedono l'uso di programmi per l'elaborazione e l'analisi dell'immagine. Le esercitazioni sono propedeutiche allo sviluppo di un progetto, individuale o di gruppo, che concorrerà a determinare il voto finale.
The laboratory sessions will build on the material covered in the class and aim to solidify your understanding of concepts through hands-on experimentation.
During the laboratory sessions, the experiments will be done on specific GPU development cards programmable with CUDA and controlled by means of scripting languages, including sbatch, bash, and Python.
A project will be assigned individually or to small groups. The results will be evaluated and will contribute to the final mark.
A PC-based exam is intended to examine the general aspects of parallel programming with GPUs and fundamental concepts of GPGPU architectures.
Slide del corso ed altro materiale presso: http://didattica.polito.it
Testi suggeriti:
- R.C. Gonzales and R.E. Woods: Digital Image Processing, Pearson International Edition, 2008
- C. Steger, M. Ulrich, C. Wiedermann: Machine Vision Algorithms and Applications, Wiley-VCH, 2008
- G.C. Holst and T.S. Lomheim: CMOS/CCD Sensors and Camera Systems, SPIE Press, 2007
- E.R. Davies, Machine Vision: Elsevier, 2005
Course transparencies and other material at http://didattica.polito.it
Supporting material:
- Cook, Shane. CUDA programming: a developer's guide to parallel computing with GPUs. Newnes, 2012.
- Tuomanen, Brian. Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA. Packt Publishing Ltd, 2018.
- Wen-Mei, W. Hwu, David B. Kirk, and Izzat El Hajj. Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2022.
- Sanders, Jason, and Edward Kandrot. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional, 2010.
- Thomas, Gareth Morgan. Advanced CUDA Programming: High Performance Computing with GPUs. 2025.
- Aamodt, Tor M., Wilson Wai Lun Fung, Timothy G. Rogers, and Margaret Martonosi. General-purpose graphics processor architectures. Morgan & Claypool Publishers, 2018.
Complementary (optional) material:
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/
- https://developer.nvidia.com/
Slides; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni dell’anno corrente; Video lezioni tratte da anni precedenti;
Lecture slides; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (current year); Video lectures (previous years);
Modalità di esame: Prova orale obbligatoria; Elaborato progettuale individuale; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;
Exam: Compulsory oral exam; Individual project; Computer-based written test in class using POLITO platform;
...
L'esame si compone di una prova scritta della durata indicativa di 80 minuti, nella quale sarà richiesto di rispondere ad una serie di domande, normalmente 5. A discrezione del docente può inoltre svolgersi una prova orale, integrativa o sostitutiva.
È necessario prenotarsi all'esame e presentarsi muniti di un documento d'identità.
Durante l'esame non è possibile usare computer, telefonini o smartphone, oppure consultare libri e appunti.
È inoltre previsto che venga svolto un lavoro obbligatorio, individuale o di gruppo, volto a realizzare un'applicazione grafica sfruttando le nozioni acquisite durante le esercitazioni di laboratorio. La correttezza delle risposte all'esame scritto e/o orale e la corretta esecuzione della tesina concorreranno al voto finale.
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Compulsory oral exam; Individual project; Computer-based written test in class using POLITO platform;
The exam is composed of three parts.
- The first part consists of a PC-based written exam to evaluate the fundamental concepts of parallel programming with GPUs and the basics of GPU architectures.
- The second part consists of a mandatory assignment for one to three students, focused on creating an application using the knowledge gained during the course.
- The third part consists of an oral presentation of the assignment to verify the knowledge acquired during the course.
The process of the exam would be the following: students who intend to obtain the evaluation must book the exam. On the exam day, the students take an individual PC-based exam regarding the fundamental concepts of parallel programming with GPUs and the basics of GPU architectures. The exam will produce a mark of up to 30 and represent 30% of the final mark.
Before the exam date, the students must submit the project files and related documentation. In the following days, the professors will communicate the oral examination dates. The correctness and accuracy of the assignment, the completeness of the presentation, and the correctness of the answers to the oral exam will produce a mark, up to 30L, and represent 70% of the final mark.
The final mark is calculated as the sum of the scores obtained in the PC-based exam and the project assignment.
It must be noted that the mandatory assignment is communicated to the students during the first weeks of the lessons.
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.