Portale della Didattica

Signal, image and video processing and learning

01DSKBG

A.A. 2023/24

Signal, image and video processing and learning (Image and video processing and learning)

Teaching	Hours
Lezioni	36
Esercitazioni in laboratorio	24

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut	Years teaching
Magli Enrico	Professore Ordinario	IINF-03/A	15	0	0	0	2

Teacher	Status	SSD	h.Les	h.Lab
Montanaro Antonio	Docente esterno e/o collaboratore		0	12
Savant Aira Luca	Dottorando		0	12
Valsesia Diego	Ricercatore a tempo det. L.240/10 art.24-B	IINF-03/A	21	0

SSD	CFU	Activities	Area context
ING-INF/03	6	B - Caratterizzanti	Ingegneria delle telecomunicazioni

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Teaching	Hours
Lezioni	39
Esercitazioni in aula	6
Esercitazioni in laboratorio	15

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut	Years teaching
Galleani Lorenzo	Professore Associato	IINF-03/A	39	6	15	0	4

SSD	CFU	Activities	Area context
ING-INF/03	6	B - Caratterizzanti	Ingegneria delle telecomunicazioni

Anno accademico di inizio validità

2023/24

Course description

Signal, image and video processing and learning (Image and video processing and learning)

The medium of instruction is English. This course addresses a few foundational aspects related to vision, and particularly compression of data, image and video sequences, including the related signal models, and processing of visual information for image interpretation and authentication purposes. The course will deal with compression starting from theoretical fundamentals and moving on to its application to the most important international standards, covering different “media” such as images and video. Image processing concepts will then be used to introduce more complex vision systems employing “deep” neural networks to extract and describe important image properties. Image classification will be considered as a use case of a realistic scenario. Finally, the image forensics problem will be introduced, aiming at authenticating the origin of an image.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The course gives the basis for the processing of random signals (random processes), which represent the most common type of signals in the fields of the Communication and Computer Networks Engineering, as well as, in general, in the fields of engineering where random quantities are measured. We consider both the case of deterministic signals affected by noise, generated for instance by the measurement system, as well as that of signals whose nature is inherently random, such as 1/f noise. The course begins by reviewing the foundations of discrete-time random processes, particularly by discussing the quantities that describe them, such as the autocorrelation function, the power spectrum, and the time-frequency spectrum, useful for signals whose frequency content changes with time. We consider both stationary and nonstationary random processes commonly encountered in nature. We then give the basis of estimation theory, and we derive and discuss the main estimators for the mean, variance, autocorrelation function, and power spectrum of stationary and nonstationary random processes. Furthermore, we introduce random dynamical systems and we derive the Kalman filter, which allows their optimal estimation. Finally, we introduce the basis of detection theory and we illustrate how to design a detector according to the Neyman-Pearson criterion. Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Signal, image and video processing and learning (Image and video processing and learning)

This course addresses a few foundational aspects related to vision, and particularly processing and compression of data, image and video sequences, and deep learning for data analysis and image interpretation. The course will address multidimensional image processing and then quantization and compression of images and video sequences. Deep neural networks will also be covered, along with their applications. The course will cover the design and training of a neural network, with focus on convolutional neural networks and generative adversarial networks, and applications to image classification, image segmentation, object detection, as well as inverse problems such as image denoising and superresolution.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

In this course we give the basis for processing random signals (random processes), which represent the most common type of signals in the fields of Communications Engineering, as well as in all of the fields of Engineering where random quantities are measured. We consider both the case of deterministic signals affected by noise, generated for instance by the measurement system, as well as that of signals whose nature is inherently random, such as 1/f noise. We begin by reviewing the foundations of discrete-time random processes, particularly by discussing the quantities that describe them, such as the autocorrelation function, the power spectrum, and the time-frequency spectrum, useful for signals whose frequency content changes with time. We consider both stationary and nonstationary random processes commonly encountered in nature. We then give the basis of estimation theory, and we derive and discuss the main estimators for the mean, variance, autocorrelation function, and power spectrum of stationary and nonstationary random processes. Furthermore, we introduce random dynamical systems and we derive the Kalman filter, which allows their optimal estimation. Finally, we introduce the basis of detection theory and we illustrate how to design a detector according to the Neyman-Pearson criterion. Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Expected Learning Outcomes

Signal, image and video processing and learning (Image and video processing and learning)

In general, the course is expected to provide the student with a solid background in the areas of multidimensional image processing, image and video compression, deep learning, with an approach such that this information is “usable” in practical applications. For this purpose, each area is coupled with computer labs where students are expected to implement and test algorithms and learn the effect of various parameters. A project will also be carried out by the students. In particular, the course will allow the student to achieve the following results. 1. Knowledge of multidimensional image processing. 2. Knowledge of transform coding, data and image compression. 3. Knowledge of motion estimation techniques and their applications to video compression. 4. Knowledge of feedforward and convolutional deep neural networks, and their training methods. 5. Knowledge of generative adversarial networks. 6. Knowledge of deep learning methods for image classification, image segmentation, object detection and various inverse problems.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

1. Knowledge of the foundations of discrete-time random processes 2. Knowledge of the basis of time-frequency analysis 3. Knowledge of the basis of estimation theory 4. Knowledge of the basis of Kalman filtering 5. Knowledge of the basis of detection theory 6. Ability to classify stationary and nonstationary random processes 7. Ability to design estimation algorithms for signals affected by noise 8. Ability to use the Kalman filter for the estimation of random processes and systems 9. Ability to design a detector Judgment and communication skills are strengthened during the laboratories thank to the continual interaction with the teacher. To improve the learning skill, we teach how to search scientific and tutorial references on the main online search engines, such as IEEE XPlore.

Pre-requirements

Signal, image and video processing and learning (Image and video processing and learning)

Students are expected to have some knowledge of basic continuous-time and discrete-time signals and systems, as well as random processes. In particular, the following concepts will be employed during the course. Fourier transforms, signals and systems in the time and frequency domain, LTI filters. Discrete-time systems, their spectrum, and relation with the Fourier transform of continuous-time signals, “z” transform, discrete-time LTI filters of FIR and IIR type. Gaussian processes, wide-sense stationary processes, autocorrelation function and covariance, powerspectral density of random processes, white noise. Moreover, in order to work at the computer labs, students are expected to be able to write programs in the Python language.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The student must know the following concepts of probability theory and signal processing: 1. Random variable 2. Probability density function 3. Mean 4. Variance 5. Frequency analysis 6. Linear time-invariant (LTI) systems However, at the beginning of the course these notions are reviewed with an intuitive approach.

Course topics

Signal, image and video processing and learning (Image and video processing and learning)

Fundamentals of information theory and compression, data compression techniques (1.2 CFU). This includes: • Basics of information theory for lossless/lossy compression • Data compression techniques such as Huffman coding, arithmetic coding, dictionary coding and run-length coding Transform coding and prediction; the JPEG standard (1.5 CFU). This includes: • Separable extension of multidimensional transforms, the 2-dimensional Fourier Transform, discrete cosine transform, Karhunen-Loeve transform, signals over graphs and the Graph Fourier transform • Quantization techniques such as scalar quantization, robust quantization, Lloyd-Max quantization, entropy-constrained quantization and vector quantization. • Application to the JPEG image compression standard. Motion estimation and compensation techniques for video compression, and the H.264/AVC and H.265/HEVC standards (0.8 CFU). This includes: • 3D transforms versus temporal prediction for multidimensional data compression • Motion models • Definition a prototype video coder • Scalable video coding • The H.264/AVC video compression standard • The H.265/HEVC video compression standard Image forensics (1 CFU). This includes: • Photo response non uniformity • Application to detection of picture origin Feedforward and convolutional neural networks and their applications (1.5 CFU). This includes: • Neural networks architecture • Backpropagation algorithm • Cost functions, overfitting and regularization • Convolutional neural networks and deep learning

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Introduction. Discrete-time random processes (15 hours) Nonstationary random processes (9 hours) Introduction to estimation theory (9 hours) Spectral estimation (6 hours) Time-frequency analysis (6 hours) The Kalman filter (9 hours) Introduction to detection theory (6 hours)

Signal, image and video processing and learning (Image and video processing and learning)

Image processing and data compression (3.3 CFU including labs). This includes: - Basics of multidimensional signal and image processing (0.6 CFU) - Multidimensional transforms (0.2 CFU) - Scalar quantization (0.2 CFU) - Lossless compression (0.4 CFU) - JPEG image compression standard (0.1 CFU) - Video compression (0.3 CFU) - Labs (1.5 CFU) Deep neural networks and their applications (2.7 CFU including labs). This includes: - Introduction to neural networks (0.2 CFU) - Backpropagation algorithm (0.2 CFU) - Cost functions, overfitting and regularization (0.2 CFU) - Convolutional neural networks and deep learning (0.3 CFU) - Classification, segmentation and object detection (0.3 CFU) - Inverse problems (denoising, superresolution) (0.2 CFU) - Neural network quantization (0.1 CFU) - Transformers and generative adversarial networks (0.3 CFU) - Labs (0.9 CFU) The course project is presented during the labs and is to be completed at home

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Discrete-time signals and systems (15 hours) Nonstationary random processes (9 hours) Introduction to estimation theory (9 hours) Spectral estimation (6 hours) Time-frequency analysis (6 hours) The Kalman filter (9 hours) Introduction to detection theory (6 hours)

Sustainable development goals

Costruire un'infrastruttura resiliente e promuovere l'innovazione ed una industrializzazione equa, responsabile e sostenibile

Rendere le città e gli insediamenti umani inclusivi, sicuri, duraturi e sostenibili

Additional information

Signal, image and video processing and learning (Image and video processing and learning)

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Course structure

Signal, image and video processing and learning (Image and video processing and learning)

The course will be based on lectures (37.5 hours). Computer labs will be organized, respectively in the areas of data/image compression, image processing, and neural networks. Each computer lab will last 3-6 hours over the span of one or two weeks. A course project will be done by the students in groups; the project will contribute to the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Half of the course takes place in the LAIB laboratories, where students implement and characterize in the Matlab environment all of the methods discussed during the lectures.

Reading materials

Signal, image and video processing and learning (Image and video processing and learning)

Regarding compression techniques, the reference book is: K. Sayood, “Introduction to data compression, 3rd edition”, Kluwer Regarding neural networks, there is no reference book and all topics will be covered using slides. The following online book can be useful for some of the topics: “Neural networks and deep learning” (2017), available at link: http://neuralnetworksanddeeplearning.com

Signal, image and video processing and learning (Signal processing: methods and algorithms)

[1] D. G. Manolakis, V. K. Ingle, and S. M. Kogon, Statistical and Adaptive Signal Processing, Artech House, 2011. [2] L. Cohen, Time-frequency analysis, Prentice Hall, 1995. [3] A. Gelb (Editor), Applied Optimal Estimation, The MIT Press, 1974. [4] Steven M. Kay, Fundamentals of Statistical signal processing: Estimation Theory, Prentice Hall, 1993 [5] Steven M. Kay, Fundamentals of Statistical signal processing: Detection Theory, Prentice Hall, 1993

Study materials

Signal, image and video processing and learning (Image and video processing and learning)

Lecture slides; Video lectures (previous years);

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Lecture slides; Lab exercises with solutions; Video lectures (current year);

Assessment and grading criteria

...

Signal, image and video processing and learning (Image and video processing and learning)

The exam aims at verifying the knowledge and understanding of the topics treated during the course, and the ability of the students to critically discuss such topics. The final exam will be a written exam. It lasts 1 hour and consists in discussing up to 2 topics, each topic discussion having limited size (one page of text). The written exam contributes to the final score for up to 24 points, and the discussion of each topic usually contributes equally to the score. The text of the questions may be provided through the Exam platform, but the answers have to be provided on paper. During the written exam, students are not allowed to use any books, lecture notes or any material other than a calculator. They must avoid having any active cell phone, tablet or other electronic means. The exam grade will be a weighted average of the score of the written exam and the report of the computer labs (up to 6 points). The exam is passed if the score is equal to or above 18/30. The computer lab reports will be scored according to technical quality of the work done and the understanding of the concepts learned during the course. The reports must be delivered to the course instructor within a deadline; the deadline, to be communicated during the course, will typically be the end of the course or a few days later. Student self-assessment will also be used to generate the computer lab score, i.e. students are going to score the contribution of other students in the same group towards achieving the objectives of the computer labs; sending self-assessments to the course instructor is mandatory in order to obtain a score for the computer labs. While the exam is typically written, the course instructor reserves the right to perform an oral examination in specific cases. The grade of this exam will be averaged with the grade of the first module of the course (Signal processing: methods and algorithms), yielding the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The written exam with a duration of one hour is based on approximately 12-15 multiple choice questions which span the content of both the lectures and the laboratories. Every correct answer gives a positive score, which is identical for all of the questions. The final mark is the sum of all of the positive scores. During the exam it is not possible to use support material, such as notes or books. The highest mark which can be obtained at the written exam is 30 cum laude. If the number of students booked for the exam is smaller or equal than 10, the written exam can be replaced by an oral exam of approximately 30 minutes, focused on the topics taught both during the lectures and at the laboratories. The highest mark which can be obtained with the oral exam is 30 cum laude. The final mark for the Signal, image and video processing and learning course is the arithmetic average of the mark for this course (Signal processing: methods and algorithms) and of the mark for the Image and video processing and learning course. The highest final mark is 30 cum laude.

Signal, image and video processing and learning (Image and video processing and learning)

Exam: Compulsory oral exam; Optional oral exam; Group project; Computer-based written test in class using POLITO platform;

Signal, image and video processing and learning (Signal processing: methods and algorithms)

Exam: Computer-based written test in class using POLITO platform;

Signal, image and video processing and learning (Image and video processing and learning)

The exam aims at verifying the knowledge and understanding of the topics treated during the course, and the ability of the students to critically discuss such topics. The final exam will consist of two parts. The first part will be a written exam. It lasts 1 hour and consists in discussing up to 2 topics, each topic discussion having limited size (one page of text). The written exam contributes to the final score for up to 24 points, and the discussion of each topic usually contributes equally to the score. The text of the questions may be provided through the Exam platform, but the answers have to be provided on paper. During the written exam, students are not allowed to use any books, lecture notes or any material other than a calculator. They must avoid having any active cell phone, tablet or other electronic means. While the first part of the exam is typically written, the course instructor reserves the right to perform an oral examination in specific cases (including but not limited to the case that there are very few students signed up for the exam). The second part will be the evaluation of the course project. The score will be up to 9 points and it will be awarded based on the technical merit of the work done, as well as the quality of the project presentation (including the understanding of the concepts learned during the course as applied to the project). The exact details of the project and its scoring rules, as well as the project deadline and presentation schedule, may change from year to year and will be explained during the course. The exam grade will be the sum of the score of the written exam (up to 24 points) and the project score (up to 9 points). The exam is passed if the total score is equal to or above 18/30, and if the score of the written exam is at least equal to 11 points. It is possible to give the exam without doing the course project (which will imply score equal to 0 for the project). The grade of this exam will be averaged with the grade of the first module of the course (Signal processing: methods and algorithms), yielding the final exam score.

Signal, image and video processing and learning (Signal processing: methods and algorithms)

The written exam with a duration of one hour is based on approximately 12-15 multiple choice questions which span the content of both the lectures and the laboratories. Every correct answer gives a positive score, which is identical for all of the questions. The final mark is the sum of all of the positive scores. During the exam it is not possible to use support material, such as notes or books. The highest mark which can be obtained at the written exam is 30 cum laude. If the number of students booked for the exam is smaller or equal than 10, the written exam can be replaced by an oral exam of approximately 30 minutes, focused on the topics taught both during the lectures and at the laboratories. The highest mark which can be obtained with the oral exam is 30 cum laude. The final mark for the Signal, image and video processing and learning course is the arithmetic average of the mark for this course (Signal processing: methods and algorithms) and of the mark for the Image and video processing and learning course. The highest final mark is 30 cum laude.

In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.