Portale della Didattica

GPU programming

01URVOV, 01URVYG

A.A. 2025/26

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Ingegneria Informatica (Computer Engineering) - Torino

Course structure

Teaching	Hours
Lezioni	30
Esercitazioni in aula	10
Esercitazioni in laboratorio	20

Lecturers

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut	Years teaching
Rodriguez Condia Josie Esteban	Ricercatore L240/10	IINF-05/A	30	10	10	0	1

Co-lectures

Espandi

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut
Limas Sierra Robert Alexander	Docente esterno e/o collaboratore		0	0	10	0

Context

SSD	CFU	Activities	Area context
ING-INF/05	6	B - Caratterizzanti	Ingegneria informatica

Statistiche superamento esami

Anno accademico di inizio validit�

2025/26

Presentazione
Course description

The course is taught in Italian. Oggetto del corso sono l'acquisizione, l' elaborazione, l'analisi e la comprensione del contenuto di immagini e sequenze di immagini digitali di oggetti 2D e 3D (computer vision). Tra le molte applicazioni citiamo l'ispezione industriale, la sorveglianza, l'identificazione biometrica (impronte digitali, retiniche, immagini del viso, iride), l'analisi del movimento umano per l'intrattenimento o scopi medici e sportivi, l'analisi del territorio da immagini aeree o da satellite, la scansione 3D, la navigazione robotica. Il corso illustrer� le tecniche fondamentali e il loro uso in alcune delle principali applicazioni pratiche.

The growing demand for efficient information processing is well-supported by advanced hardware accelerators�particularly Graphics Processing Units (GPUs)�which offer high programming flexibility and can significantly accelerate a wide range of algorithms and applications. In particular, in an era defined by data-intensive applications and computational challenges, learning GPU programming opens the door to solving problems at unprecedented speed and scale. From accelerating scientific simulations and powering artificial intelligence to enabling real-time graphics and financial modeling, GPUs have become essential across a wide range of industries. By mastering GPU programming, the student will gain the ability to harness these powerful devices to create efficient, high-performance solutions. This course is designed to enable students to work with parallel hardware accelerators, specifically Graphics Processing Units (GPUs) and General-Purpose Graphics Processing Units (GPGPUs), which have seen a continuous rise in utility and popularity over the past fifteen years (as of 2025) in applications ranging from embedded and edge computing up to High Performance Computing (HPC) domains. To address this goal, the course will explore key topics, such as the fundamentals of parallel computing and its core paradigms, a comparative study of GPU and CPU architectures and their interaction, the CUDA programming model, memory models, effective debugging and profiling strategies, current performance analysis models, and real-world applications of GPU-accelerated computing. During the laboratory sessions, if possible, the student will explore the development and adaptation of several algorithms for embedded and HPC-class GPUs, by using parallelism strategies, the clever use of specialized libraries, and the use of sophisticated frameworks, such as Tensorflow and the Python programming language for artificial intelligence algorithms. These competencies are highly valued in today�s job market, making them a strong asset for students pursuing careers in embedded, high-performance, and parallel computing.

Risultati attesi
Expected Learning Outcomes

Lo studente acquisir� le conoscenze fondamentali relative a: - sensori e sistemi per l'acquisizione dell'immagine - modellazione dei sistemi ottici e loro funzioni di trasferimento - analisi in frequenza delle immagini - tecniche per il miglioramento e la ricostruzione di immagini affette da vari tipi di rumore ed alterazioni (rumore termico dei sensori, imperfezioni dell'ottica, movimento relativo, etc) - tecniche per la segmentazione e l'estrazione di elementi caratteristici di un'immagine - tecniche per il riconoscimento di oggetti 2D e 3D - tecniche per l'analisi del movimento Tramite queste conoscenze e numerosi esempi applicativi, lo studente verr� messo in grado di progettare un sistema di visione mediante calcolatore.

At the end of the course, students will be able to master: - Parallel computing fundamentals and its rules - GPU and CPU architectures (comparison) - The CUDA programming model, strategies for debugging, executing, and profiling on GPUs - Performance analysis models for parallel programs - Applications of GPU programming - Realistic applications on embedded GPU cards

Prerequisiti
Pre-requirements

Elementi di analisi matematica, analisi di segnali monodimensionali, algebra lineare, calcolo delle probabilit�.

C/C++ programming; Python programming (suggested, but not required); Distributed computing (suggested, but not required); Computer architectures; Operating systems.

Programma
Course topics

Argomenti principali e peso in crediti - Sistemi di acquisizione immagini(1 cr) - Elaborazione immagini. Trasformate 2D e funzioni di trasferimento(1 cr) - Miglioramento e ricostruzione immagini(0.5 cr) - Segmentazione ed estrazione dati significativi(0.5 cr) - Riconoscimento 2D e 3D(1 cr) - Analisi movimento(0.5 cr) - Casi di studio(1.5 cr)

- Introduction to parallel computing: (1 cfu) Classification of Parallel Computers. Amdahl's law, Flynn's taxonomy: SISD, SIMD, MISD, and MIMD. General architecture of classical and modern GPUs. Comparison and interaction of CPU with GPU architectures. - GPGPU concurrency and organization: (1 cfu) Multiprocessors, streaming processors, and SIMT cores, internal general-purpose cores (CUDA cores), and specialized in-chip hardware accelerators (Tensor Cores, matrix cores, and Special Function Units). GPU memory hierarchy advantages and constraints: Global, local, shared, constant, and cache memories. Memory models for GPU-accelerated computing programs (Pageable, Unified, Pinned, and Mapped). Brief overview and comparison of the architecture of embedded and HPC-class GPUs. Fundamentals of the GPGPU programming model: The concepts of Grids, Blocks, Warps, and Threads. GPGPU profiling. - GPGPU-programming fundamentals: (2 cfu) The CUDA programming model Threading optimization and the trade-offs between memory and computing bottlenecks. The roofline model: a practical performance model for GPU-accelerated programs. GPGPU convolution and memory management, The GPGPU task parallelism (Streams). Debugging, tiling, ray tracing, and libraries - Classical and modern applications: Reduction, Scan, Sorting, Matrix Multiplication, Convolution, and Stencil-based applications, including Image filtering (2 cfu).

Sustainable development goals

Costruire un'infrastruttura resiliente e promuovere l'innovazione ed una industrializzazione equa, responsabile e sostenibile

Fornire un�educazione di qualit�, equa ed inclusiva, e opportunit� di apprendimento per tutti

Note
Additional information

Organizzazione dell'insegnamento
Course structure

Le esercitazioni di laboratorio prevedono l'uso di programmi per l'elaborazione e l'analisi dell'immagine. Le esercitazioni sono propedeutiche allo sviluppo di un progetto, individuale o di gruppo, che concorrer� a determinare il voto finale.

The laboratory sessions will build on the material covered in the class and aim to solidify your understanding of concepts through hands-on experimentation. During the laboratory sessions, the experiments will be done on specific GPU development cards programmable with CUDA and controlled by means of scripting languages, including sbatch, bash, and Python. A project will be assigned individually or to small groups. The results will be evaluated and will contribute to the final mark. A PC-based exam is intended to examine the general aspects of parallel programming with GPUs and fundamental concepts of GPGPU architectures.

Bibliografia
Reading materials

Slide del corso ed altro materiale presso: http://didattica.polito.it Testi suggeriti: - R.C. Gonzales and R.E. Woods: Digital Image Processing, Pearson International Edition, 2008 - C. Steger, M. Ulrich, C. Wiedermann: Machine Vision Algorithms and Applications, Wiley-VCH, 2008 - G.C. Holst and T.S. Lomheim: CMOS/CCD Sensors and Camera Systems, SPIE Press, 2007 - E.R. Davies, Machine Vision: Elsevier, 2005

Course transparencies and other material at http://didattica.polito.it Supporting material: - Cook, Shane. CUDA programming: a developer's guide to parallel computing with GPUs. Newnes, 2012. - Tuomanen, Brian. Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA. Packt Publishing Ltd, 2018. - Wen-Mei, W. Hwu, David B. Kirk, and Izzat El Hajj. Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2022. - Sanders, Jason, and Edward Kandrot. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional, 2010. - Thomas, Gareth Morgan. Advanced CUDA Programming: High Performance Computing with GPUs. 2025. - Aamodt, Tor M., Wilson Wai Lun Fung, Timothy G. Rogers, and Margaret Martonosi. General-purpose graphics processor architectures. Morgan & Claypool Publishers, 2018. Complementary (optional) material: - https://docs.nvidia.com/cuda/cuda-c-programming-guide/ - https://developer.nvidia.com/

Materiale di supporto allo studio
Study materials

Slides; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni dell�anno corrente; Video lezioni tratte da anni precedenti;

Lecture slides; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (current year); Video lectures (previous years);

Criteri, regole e procedure per l'esame
Assessment and grading criteria

Modalit� di esame: Prova orale obbligatoria; Elaborato progettuale individuale; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;

Exam: Compulsory oral exam; Individual project; Computer-based written test in class using POLITO platform;

... L'esame si compone di una prova scritta della durata indicativa di 80 minuti, nella quale sar� richiesto di rispondere ad una serie di domande, normalmente 5. A discrezione del docente pu� inoltre svolgersi una prova orale, integrativa o sostitutiva. � necessario prenotarsi all'esame e presentarsi muniti di un documento d'identit�. Durante l'esame non � possibile usare computer, telefonini o smartphone, oppure consultare libri e appunti. � inoltre previsto che venga svolto un lavoro obbligatorio, individuale o di gruppo, volto a realizzare un'applicazione grafica sfruttando le nozioni acquisite durante le esercitazioni di laboratorio. La correttezza delle risposte all'esame scritto e/o orale e la corretta esecuzione della tesina concorreranno al voto finale.

Gli studenti e le studentesse con disabilit� o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unit� Special Needs, al fine di permettere al/la docente la declinazione pi� idonea in riferimento alla specifica tipologia di esame.

Exam: Compulsory oral exam; Individual project; Computer-based written test in class using POLITO platform;

The exam is composed of three parts. - The first part consists of a PC-based written exam to evaluate the fundamental concepts of parallel programming with GPUs and the basics of GPU architectures. - The second part consists of a mandatory assignment for one to three students, focused on creating an application using the knowledge gained during the course. - The third part consists of an oral presentation of the assignment to verify the knowledge acquired during the course. The process of the exam would be the following: students who intend to obtain the evaluation must book the exam. On the exam day, the students take an individual PC-based exam regarding the fundamental concepts of parallel programming with GPUs and the basics of GPU architectures. The exam will produce a mark of up to 30 and represent 30% of the final mark. Before the exam date, the students must submit the project files and related documentation. In the following days, the professors will communicate the oral examination dates. The correctness and accuracy of the assignment, the completeness of the presentation, and the correctness of the answers to the oral exam will produce a mark, up to 30L, and represent 70% of the final mark. The final mark is calculated as the sum of the scores obtained in the PC-based exam and the project assignment. It must be noted that the mandatory assignment is communicated to the students during the first weeks of the lessons.

In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.