01DSKBG

A.A. 2023/24

2023/24

Signal, image and video processing and learning (Image and video processing and learning)

The medium of instruction is English. This course addresses a few foundational aspects related to vision, and particularly compression of data, image and video sequences, including the related signal models, and processing of visual information for image interpretation and authentication purposes. The course will deal with compression starting from theoretical fundamentals and moving on to its application to the most important international standards, covering different “media” such as images and video. Image processing concepts will then be used to introduce more complex vision systems employing “deep” neural networks to extract and describe important image properties. Image classification will be considered as a use case of a realistic scenario. Finally, the image forensics problem will be introduced, aiming at authenticating the origin of an image.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The course gives the basis for the processing of random signals (random processes), which represent the most common type of signals in the fields of the Communication and Computer Networks Engineering, as well as, in general, in the fields of engineering where random quantities are measured. We consider both the case of deterministic signals affected by noise, generated for instance by the measurement system, as well as that of signals whose nature is inherently random, such as 1/f noise. The course begins by reviewing the foundations of discrete-time random processes, particularly by discussing the quantities that describe them, such as the autocorrelation function, the power spectrum, and the time-frequency spectrum, useful for signals whose frequency content changes with time. We consider both stationary and nonstationary random processes commonly encountered in nature. We then give the basis of estimation theory, and we derive and discuss the main estimators for the mean, variance, autocorrelation function, and power spectrum of stationary and nonstationary random processes. Furthermore, we introduce random dynamical systems and we derive the Kalman filter, which allows their optimal estimation. Finally, we introduce the basis of detection theory and we illustrate how to design a detector according to the Neyman-Pearson criterion. Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Signal, image and video processing and learning (Image and video processing and learning)

The medium of instruction is English. This course addresses a few foundational aspects related to vision, and particularly processing and compression of data, image and video sequences, and deep learning for data analysis and image interpretation. The course will address multidimensional image processing and then quantization and compression of images and video sequences. Deep neural networks will also be covered, along with their applications. The course will cover the design and training of a neural network, with focus on convolutional neural networks and generative adversarial networks, and applications to image classification, image segmentation, object detection, as well as inverse problems such as image denoising and superresolution.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

In this course we give the basis for processing random signals (random processes), which represent the most common type of signals in the fields of Communications Engineering, as well as in all of the fields of Engineering where random quantities are measured. We consider both the case of deterministic signals affected by noise, generated for instance by the measurement system, as well as that of signals whose nature is inherently random, such as 1/f noise. We begin by reviewing the foundations of discrete-time random processes, particularly by discussing the quantities that describe them, such as the autocorrelation function, the power spectrum, and the time-frequency spectrum, useful for signals whose frequency content changes with time. We consider both stationary and nonstationary random processes commonly encountered in nature. We then give the basis of estimation theory, and we derive and discuss the main estimators for the mean, variance, autocorrelation function, and power spectrum of stationary and nonstationary random processes. Furthermore, we introduce random dynamical systems and we derive the Kalman filter, which allows their optimal estimation. Finally, we introduce the basis of detection theory and we illustrate how to design a detector according to the Neyman-Pearson criterion. Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Signal, image and video processing and learning (Image and video processing and learning)

In general, the course is expected to provide the student with a solid background in three areas, namely signal compression, authentication and analysis, with an approach such that this information is “usable” in practical applications. For this purpose, each area is coupled with 6 hours of computer lab where students are expected to implement and test algorithms and learn the effect of various parameters. In particular, the course will allow the student to achieve the following results. 1. Knowledge of fundamentals of lossless and lossy compression and data compression techniques. 2. Knowledge of transform coding. 3. Knowledge of motion estimation techniques and their applications to video compression. This includes 4. Knowledge of a few reference standards for audio, image and video compression 5. Knowledge of feedforward and convolutional deep neural networks, and their application to image classification. 6. Knowledge of fundamentals of image forensics.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

1. Knowledge of the foundations of discrete-time random processes 2. Knowledge of the basis of time-frequency analysis 3. Knowledge of the basis of estimation theory 4. Knowledge of the basis of Kalman filtering 5. Knowledge of the basis of detection theory 6. Ability to classify stationary and nonstationary random processes 7. Ability to design estimation algorithms for signals affected by noise 8. Ability to use the Kalman filter for the estimation of random processes and systems 9. Ability to design a detector Judgment and communication skills are strengthened during the laboratories thank to the continual interaction with the teacher. To improve the learning skill, we teach how to search scientific and tutorial references on the main online search engines, such as IEEE XPlore.

Signal, image and video processing and learning (Image and video processing and learning)

In general, the course is expected to provide the student with a solid background in the areas of multidimensional image processing, image and video compression, deep learning, with an approach such that this information is “usable” in practical applications. For this purpose, each area is coupled with computer labs where students are expected to implement and test algorithms and learn the effect of various parameters. A project will also be carried out by the students. In particular, the course will allow the student to achieve the following results. 1. Knowledge of multidimensional image processing. 2. Knowledge of transform coding, data and image compression. 3. Knowledge of motion estimation techniques and their applications to video compression. 4. Knowledge of feedforward and convolutional deep neural networks, and their training methods. 5. Knowledge of generative adversarial networks. 6. Knowledge of deep learning methods for image classification, image segmentation, object detection and various inverse problems.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

1. Knowledge of the foundations of discrete-time random processes 2. Knowledge of the basis of time-frequency analysis 3. Knowledge of the basis of estimation theory 4. Knowledge of the basis of Kalman filtering 5. Knowledge of the basis of detection theory 6. Ability to classify stationary and nonstationary random processes 7. Ability to design estimation algorithms for signals affected by noise 8. Ability to use the Kalman filter for the estimation of random processes and systems 9. Ability to design a detector Judgment and communication skills are strengthened during the laboratories thank to the continual interaction with the teacher. To improve the learning skill, we teach how to search scientific and tutorial references on the main online search engines, such as IEEE XPlore.

Signal, image and video processing and learning (Image and video processing and learning)

Students are expected to have some knowledge of basic continuous-time and discrete-time signals and systems, as well as random processes. In particular, the following concepts will be employed during the course. Fourier transforms, signals and systems in the time and frequency domain, LTI filters. Discrete-time systems, their spectrum, and relation with the Fourier transform of continuous-time signals, “z” transform, discrete-time LTI filters of FIR and IIR type. Gaussian processes, wide-sense stationary processes, autocorrelation function and covariance, powerspectral density of random processes, white noise. The course will also use concepts presented in the “Elaborazione di immagine e video” course (3rd year of B.Sc. degree), which will be briefly reviewed in this course. Moreover, in order to work at the computer labs, students are expected to be able to write programs in the C and Matlab languages.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The student must know the following concepts of probability theory and signal processing: 1. Random variable 2. Probability density function 3. Mean 4. Variance 5. Frequency analysis 6. Linear time-invariant (LTI) systems However, at the beginning of the course these notions are reviewed with an intuitive approach.

Signal, image and video processing and learning (Image and video processing and learning)

Students are expected to have some knowledge of basic continuous-time and discrete-time signals and systems, as well as random processes. In particular, the following concepts will be employed during the course. Fourier transforms, signals and systems in the time and frequency domain, LTI filters. Discrete-time systems, their spectrum, and relation with the Fourier transform of continuous-time signals, “z” transform, discrete-time LTI filters of FIR and IIR type. Gaussian processes, wide-sense stationary processes, autocorrelation function and covariance, powerspectral density of random processes, white noise. Moreover, in order to work at the computer labs, students are expected to be able to write programs in the Python language.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The student must know the following concepts of probability theory and signal processing: 1. Random variable 2. Probability density function 3. Mean 4. Variance 5. Frequency analysis 6. Linear time-invariant (LTI) systems However, at the beginning of the course these notions are reviewed with an intuitive approach.

Signal, image and video processing and learning (Image and video processing and learning)

Fundamentals of information theory and compression, data compression techniques (1.2 CFU). This includes: • Basics of information theory for lossless/lossy compression • Data compression techniques such as Huffman coding, arithmetic coding, dictionary coding and run-length coding Transform coding and prediction; the JPEG standard (1.5 CFU). This includes: • Separable extension of multidimensional transforms, the 2-dimensional Fourier Transform, discrete cosine transform, Karhunen-Loeve transform, signals over graphs and the Graph Fourier transform • Quantization techniques such as scalar quantization, robust quantization, Lloyd-Max quantization, entropy-constrained quantization and vector quantization. • Application to the JPEG image compression standard. Motion estimation and compensation techniques for video compression, and the H.264/AVC and H.265/HEVC standards (0.8 CFU). This includes: • 3D transforms versus temporal prediction for multidimensional data compression • Motion models • Definition a prototype video coder • Scalable video coding • The H.264/AVC video compression standard • The H.265/HEVC video compression standard Image forensics (1 CFU). This includes: • Photo response non uniformity • Application to detection of picture origin Feedforward and convolutional neural networks and their applications (1.5 CFU). This includes: • Neural networks architecture • Backpropagation algorithm • Cost functions, overfitting and regularization • Convolutional neural networks and deep learning

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Introduction. Discrete-time random processes (15 hours) Nonstationary random processes (9 hours) Introduction to estimation theory (9 hours) Spectral estimation (6 hours) Time-frequency analysis (6 hours) The Kalman filter (9 hours) Introduction to detection theory (6 hours)

Signal, image and video processing and learning (Image and video processing and learning)

Image processing and data compression (3.3 CFU including labs). This includes: - Basics of multidimensional signal and image processing (0.6 CFU) - Multidimensional transforms (0.2 CFU) - Scalar quantization (0.2 CFU) - Lossless compression (0.4 CFU) - JPEG image compression standard (0.1 CFU) - Video compression (0.3 CFU) - Labs (1.5 CFU) Deep neural networks and their applications (2.7 CFU including labs). This includes: - Introduction to neural networks (0.2 CFU) - Backpropagation algorithm (0.2 CFU) - Cost functions, overfitting and regularization (0.2 CFU) - Convolutional neural networks and deep learning (0.3 CFU) - Classification, segmentation and object detection (0.3 CFU) - Inverse problems (denoising, superresolution) (0.2 CFU) - Neural network quantization (0.1 CFU) - Transformers and generative adversarial networks (0.3 CFU) - Labs (0.9 CFU) Lab hours include 6 hours for the course project (to be completed at home)

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Discrete-time signals and systems (15 hours) Nonstationary random processes (9 hours) Introduction to estimation theory (9 hours) Spectral estimation (6 hours) Time-frequency analysis (6 hours) The Kalman filter (9 hours) Introduction to detection theory (6 hours)

Signal, image and video processing and learning (Image and video processing and learning)

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Signal, image and video processing and learning (Image and video processing and learning)

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Signal, image and video processing and learning (Image and video processing and learning)

The course will be based on lectures (42 hours). Three computer labs will be organized, respectively in the areas of data/image compression, image forensics, and neural networks. Each computer lab will last 6 hours over the span of two weeks. For each computer lab, a short report will have to be prepared (up to 5 pages). Reports of the 3 computer labs (up to 15 pages overall) will contribute to the final exam grade. Students will also be asked to provide a self-assessment of the work performed during the computer labs. Computer labs are a mandatory part of the course; students are expected to attend a minimum number of hours that will be specified at the beginning of the course. Students who have a problem attending the labs must provide proof of a concurrent official work activity that they must attend to, and they have to contact the course instructor by the first week of the course in order to address this issue.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Signal, image and video processing and learning (Image and video processing and learning)

The course will be based on lectures (36 hours). Computer labs will be organized, respectively in the areas of data/image compression, image processing, and neural networks. Each computer lab will last 3-6 hours over the span of one or two weeks. A course project will be done by the students in groups; the project will contribute to the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Signal, image and video processing and learning (Image and video processing and learning)

Regarding compression techniques, the reference book is: K. Sayood, “Introduction to data compression, 3rd edition”, Kluwer Regarding neural networks, the reference book is the online book “Neural networks and deep learning” (2017), available at link: http://neuralnetworksanddeeplearning.com

Signal, image and video processing and learning (Signal processing: methods and algorithms)

[1] D. G. Manolakis, V. K. Ingle, and S. M. Kogon, Statistical and Adaptive Signal Processing, Artech House, 2011. [2] L. Cohen, Time-frequency analysis, Prentice Hall, 1995. [3] A. Gelb (Editor), Applied Optimal Estimation, The MIT Press, 1974. [4] Steven M. Kay, Fundamentals of Statistical signal processing: Estimation Theory, Prentice Hall,1993 [5] Steven M. Kay, Fundamentals of Statistical signal processing: Detection Theory, Prentice Hall,1993

Signal, image and video processing and learning (Image and video processing and learning)

Regarding compression techniques, the reference book is: K. Sayood, “Introduction to data compression, 3rd edition”, Kluwer Regarding neural networks, there is no reference book and all topics will be covered using slides. The following online book can be useful for some of the topics: “Neural networks and deep learning” (2017), available at link: http://neuralnetworksanddeeplearning.com

Signal, image and video processing and learning (Signal processing: methods and algorithms)

[1] D. G. Manolakis, V. K. Ingle, and S. M. Kogon, Statistical and Adaptive Signal Processing, Artech House, 2011. [2] L. Cohen, Time-frequency analysis, Prentice Hall, 1995. [3] A. Gelb (Editor), Applied Optimal Estimation, The MIT Press, 1974. [4] Steven M. Kay, Fundamentals of Statistical signal processing: Estimation Theory, Prentice Hall, 1993 [5] Steven M. Kay, Fundamentals of Statistical signal processing: Detection Theory, Prentice Hall, 1993

Signal, image and video processing and learning (Image and video processing and learning)

Lucidi delle lezioni; Video lezioni tratte da anni precedenti;

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Lucidi delle lezioni; Esercitazioni di laboratorio risolte; Video lezioni dell’anno corrente;

Signal, image and video processing and learning (Image and video processing and learning)

Lecture slides; Video lectures (previous years);

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Lecture slides; Lab exercises with solutions; Video lectures (current year);

Signal, image and video processing and learning (Image and video processing and learning)

**Modalità di esame:** Prova orale obbligatoria; Prova orale facoltativa; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;

Signal, image and video processing and learning (Signal processing: methods and algorithms)

**Modalità di esame:** Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;

Signal, image and video processing and learning (Image and video processing and learning)

**Exam:** Compulsory oral exam; Optional oral exam; Computer-based written test in class using POLITO platform;

Signal, image and video processing and learning (Signal processing: methods and algorithms)

**Exam:** Computer-based written test in class using POLITO platform;

...

Signal, image and video processing and learning (Image and video processing and learning)

The exam aims at verifying the knowledge and understanding of the topics treated during the course, and the ability of the students to critically discuss such topics. The final exam will be a written exam. It lasts 1 hour and consists in discussing up to 2 topics, each topic discussion having limited size (one page of text). The written exam contributes to the final score for up to 24 points, and the discussion of each topic usually contributes equally to the score. The text of the questions may be provided through the Exam platform, but the answers have to be provided on paper. During the written exam, students are not allowed to use any books, lecture notes or any material other than a calculator. They must avoid having any active cell phone, tablet or other electronic means. The exam grade will be a weighted average of the score of the written exam and the report of the computer labs (up to 6 points). The exam is passed if the score is equal to or above 18/30. The computer lab reports will be scored according to technical quality of the work done and the understanding of the concepts learned during the course. The reports must be delivered to the course instructor within a deadline; the deadline, to be communicated during the course, will typically be the end of the course or a few days later. Student self-assessment will also be used to generate the computer lab score, i.e. students are going to score the contribution of other students in the same group towards achieving the objectives of the computer labs; sending self-assessments to the course instructor is mandatory in order to obtain a score for the computer labs. While the exam is typically written, the course instructor reserves the right to perform an oral examination in specific cases. The grade of this exam will be averaged with the grade of the first module of the course (Signal processing: methods and algorithms), yielding the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The written exam with a duration of one hour is based on approximately 12-15 multiple choice questions which span the content of both the lectures and the laboratories. Every correct answer gives a positive score, which is identical for all of the questions. The final mark is the sum of all of the positive scores. During the exam it is not possible to use support material, such as notes or books. The highest mark which can be obtained at the written exam is 30 cum laude. If the number of students booked for the exam is smaller or equal than 10, the written exam can be replaced by an oral exam of approximately 30 minutes, focused on the topics taught both during the lectures and at the laboratories. The highest mark which can be obtained with the oral exam is 30 cum laude. The final mark for the Signal, image and video processing and learning course is the arithmetic average of the mark for this course (Signal processing: methods and algorithms) and of the mark for the Image and video processing and learning course. The highest final mark is 30 cum laude.

Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.

Signal, image and video processing and learning (Image and video processing and learning)

**Exam:** Compulsory oral exam; Optional oral exam; Computer-based written test in class using POLITO platform;

Signal, image and video processing and learning (Signal processing: methods and algorithms)

**Exam:** Computer-based written test in class using POLITO platform;

Signal, image and video processing and learning (Image and video processing and learning)

The exam aims at verifying the knowledge and understanding of the topics treated during the course, and the ability of the students to critically discuss such topics. The final exam will consist of two parts. The first part will be a written exam. It lasts 1 hour and consists in discussing up to 2 topics, each topic discussion having limited size (one page of text). The written exam contributes to the final score for up to 20 points, and the discussion of each topic usually contributes equally to the score. The text of the questions may be provided through the Exam platform, but the answers have to be provided on paper. During the written exam, students are not allowed to use any books, lecture notes or any material other than a calculator. They must avoid having any active cell phone, tablet or other electronic means. While the first part of the exam is typically written, the course instructor reserves the right to perform an oral examination in specific cases (including but not limited to the case that there are very few students signed up for the exam). The second part will be the evaluation of the course project. The scoring will be based on the technical merit of the work done, as well as the quality of the project presentation (including the understanding of the concepts learned during the course as applied to the project). The exact details of the project and its scoring rules, as well as the project deadline and presentation schedule, may change from year to year and will be explained during the course. The exam grade will be the sum of the score of the written exam (up to 20 points) and the project score (up to 13 points). The exam is passed if the score is equal to or above 18/30, and if the score of the written exam is at least equal to 10 points. The grade of this exam will be averaged with the grade of the first module of the course (Signal processing: methods and algorithms), yielding the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The written exam with a duration of one hour is based on approximately 12-15 multiple choice questions which span the content of both the lectures and the laboratories. Every correct answer gives a positive score, which is identical for all of the questions. The final mark is the sum of all of the positive scores. During the exam it is not possible to use support material, such as notes or books. The highest mark which can be obtained at the written exam is 30 cum laude. If the number of students booked for the exam is smaller or equal than 10, the written exam can be replaced by an oral exam of approximately 30 minutes, focused on the topics taught both during the lectures and at the laboratories. The highest mark which can be obtained with the oral exam is 30 cum laude. The final mark for the Signal, image and video processing and learning course is the arithmetic average of the mark for this course (Signal processing: methods and algorithms) and of the mark for the Image and video processing and learning course. The highest final mark is 30 cum laude.

In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.

© Politecnico di Torino

Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY

Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY