Portale della Didattica

Machine learning for networking

01DSMUV, 01DSMBG, 01DSMUW

A.A. 2024/25

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Cybersecurity - Torino
Master of science-level of the Bologna process in Communications Engineering - Torino
Master of science-level of the Bologna process in Cybersecurity - Torino

Course structure

Teaching	Hours
Lezioni	40
Esercitazioni in laboratorio	40

Lecturers

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut	Years teaching
Vassio Luca	Ricercatore a tempo det. L.240/10 art.24-B	IINF-05/A	35	0	15	0	2

Co-lectures

Espandi

Teacher	Status	h.Les	h.Lab
Ciravegna Gabriele	Assegnista di Ricerca	5	14
Song Tailai	Dottorando	0	25,5
Wang Zhihao	Collaboratore Esterno	0	25,5

Context

SSD	CFU	Activities	Area context
ING-INF/05	8	D - A scelta dello studente	A scelta dello studente

Statistiche superamento esami

Anno accademico di inizio validit�

2024/25

Presentazione
Course description

The course aims at providing a solid introduction to machine learning, a branch of artificial intelligence that deals with the development of algorithms able to extract knowledge from data, with a focus on pattern recognition and classification problems. The course will cover the basic concepts of statistical machine learning, both from the frequentist and the Bayesian perspectives, and will be focused on the broad class of generative linear Gaussian models and discriminative classifiers based on logistic regression and support vector machines. The objective of the course is to provide the students with solid theoretical bases that will allow them to select, apply and evaluate different classification methods on real tasks. The students will also acquire the required competencies to devise novel approaches based on the frameworks that will be presented during the classes. The course will include laboratory activities that will allow the students to practice the theoretical notions on real data using modern programming frameworks that are widely employed both by research communities and companies.

This course explores how Machine Learning can help engineers solve problems in the world of networking. The course introduces the data science process and then provides theoretical and practical knowledge about the machine learning approach and algorithms commonly used to analyze large and heterogeneous data. The students will also acquire Python programming competencies and learn how to use its main libraries related to ML. Many practical examples will be focused on how to address inference problems in the field of communication networks and cybersecurity. A significant part of the courses will be devoted to laboratory activities allowing the students to practice the theoretical notions on real problems, from traffic classification to anomaly detection. Many laboratory sessions, based on a learning-by-doing approach, allow experimental activities on all the phases of a machine learning pipeline (e.g., data preparation and cleaning, data visualization and characterization, ML algorithm selection, tuning, and result evaluation).

Risultati attesi
Expected Learning Outcomes

At the end of the course the students will - know and understand the basic principles of statistical machine learning applied to pattern recognition and classification; - know the principal techniques for classification, including generative linear Gaussian models and discriminative approaches based on logistic regression and support vector machines, among others; - understand the theoretical motivations behind different classification approaches, their main properties and domain of application, and their limitations; - be able to implement the different algorithms using wide-spread programming frameworks (Python) - be able to apply different methods to real tasks, to critically evaluate their effectiveness and to analyze which strategies are better suited to different applications; - be able to transfer the acquired knowledge and capabilities to solve novel classification problems, developing novel methods based on the frameworks that will be discussed during classes

Knowledge and abilities: � Knowledge of Python programming language and the main Python libraries for machine learning; � Knowledge of the main phases characterizing a data science and ML process; � Knowledge of the different data exploration, visualization and pre-processing techniques; � Knowledge of the basic theoretical principles of machine learning; � Knowledge of the principal models for supervised and unsupervised learning; � Knowledge of the main theoretical properties, domains of application, and limitations of different machine learning approaches; � Knowledge of networking problems that can be approached with ML; � Ability to design, implement and evaluate analytics scripts in the Python language. � Ability to manage large datasets of data, from pre-processing to visualization. � Ability to employ the Python machine learning libraries to devise complete solutions for inference problems; � Ability to design, implement and evaluate a machine learning pipeline; � Ability to apply different methods to real (networking and cybersecurity) tasks, to critically evaluate their effectiveness and to analyze which strategies are better suited to different applications;

Prerequisiti
Pre-requirements

The students should have basic knowledge of probability and statistics, linear algebra and calculus.

The students should have basic knowledge of: � Programming skills (whatever the language) � Communication networks � Probability theory and statistics � Linear algebra � Calculus � Operational research

Programma
Course topics

Machine learning and pattern recognition - Introduction and definitions Probability theory concepts - Random Variables - Estimators - The Bayesian framework Introduction to Python - The language - Main numerical libraries Decision Theory - Inference, expected loss - Model taxonomy: generative and discriminative approaches - Model optimization, hyperparameter selection, cross-validation Model evaluation - Classification scores and log-likelihood ratios - Detection Cost Functions and optimal Bayes decisions Dimensionality reduction - Principal Component Analysis (PCA) - Linear Discriminant Analysis (LDA) Generative Gaussian models - Generative Gaussian classifiers: univariate Gaussian, Naive Bayes, multivariate Gaussian (MVG) - Tied covariance MVG and LDA Logistic Regression (LR) - From Tied MVG to LR - LR as ML solution for class labels - Binary and multiclass cross-entropy - From MVG to Quadratic LR - LR as empirical risk minimization - Overfitting and regularization Support Vector Machines (SVM) - Optimal classification hyperplane: the maximum margin definition - Margin maximization and L2 regularization - SVM as minimization of classification errors - Primal and dual SVM formulation - Non linear extension: brief introduction to kernels Density estimation and latent variable models - Gaussian mixture models (GMM) - The Expectation Maximization algorithm Continuous latent variable models: Linear-Gaussian Models - Linear regression - Linear regression and Tied MVG - MVG with unknown class means: Probabilistic LDA (PLDA) - Bayesian MVG - Factor Analysis: PLDA, Probabilistic PCA Approximated inference basics - Variational Bayes

Introduction to Machine Learning and its application to Networking (0.5 CFU) � Definitions of pipeline and taxonomy of Machine Learning tasks � Problems in networking: from traffic classification to anomaly detection Python usage and libraries (2.0 CFU) � The Python language � Numerical libraries: Numpy, Pandas and Matplotlib � ML libraries (Scikit-learn, PyTorch) Data exploration and preprocessing (1.5 CFU) � Data visualization � Data transformation and feature extraction � Dimensionality reduction techniques Basics of ML (1 CFU) � Empirical risk minimization � Loss functions and performance metrics � Gradient-based learning � Model selection and validation Supervised and unsupervised ML (3 CFU) � Classification � Regression � Clustering � Algorithms: from linear models to deep neural networks � Regularization

Sustainable development goals

Fornire un�educazione di qualit�, equa ed inclusiva, e opportunit� di apprendimento per tutti

Note
Additional information

Organizzazione dell'insegnamento
Course structure

The course will include 3 hours of lectures and 1,5 hours of laboratory per week. The lectures will focus both on theoretical and practical aspects, and will include open discussions aimed at developping suitable solutions for different problems. The laboratories will allow the students to implement most of the techniques that will be presented during the lectures, and to apply the learned methods to real data.

The course will include 40 hours of lectures and 40 hours of laboratory activities. The lectures will focus both on theoretical and practical aspects of the course topics and will include open discussions aimed at developing suitable solutions for different problems. Some simple practical exercises will be solved in the classroom. The course includes laboratory sessions on data science processes and machine learning algorithms for engineering applications. The laboratories will allow the students to apply the methods presented during lectures to real data and tasks, with a particular focus on networking and cybersecurity applications. Students will prepare a written report on a group project assigned during the course.

Bibliografia
Reading materials

[1] Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg. [2] Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press. Additional material, including slides and code fragments, will be made available on the course website.

Copies of the slides used during the lectures, exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal. Suggested books: [1] A. Jung, Machine Learning: The Basics, Springer, 2022 [2] Jake VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data, O�Reilly, 2016 [3] Christopher M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006 [4] Kent D. Lee, Python Programming Fundamentals, Springer, 2015

Materiale di supporto allo studio
Study materials

Slides; Esercitazioni di laboratorio; Video lezioni dell�anno corrente;

Lecture slides; Lab exercises; Video lectures (current year);

Criteri, regole e procedure per l'esame
Assessment and grading criteria

Modalit� di esame: Prova scritta (in aula); Elaborato progettuale individuale; Elaborato progettuale in gruppo;

Exam: Written test; Individual project; Group project;

... The exam will assess the knowledge of the course topics, and the ability of the candidate to apply such knowledge and the developed skills to solve specific problems. The exam will consist in two parts: - A project to be developed during the course. The students will be able to choose individual or (small) group projects among a set of possible choices (max. 18 points). - A written examination (max. 12 points). The final mark will be the sum of the report and written exam marks. To pass the exam, the report mark must be at least 9/18, the written exam mark must be at least 6/12, and the final mark must be at least 18/30. The projects will address machine learning tasks. For each project, a dataset will be provided, and the students will have to develop suitable models for the specific task based on the topics and tools presented during lectures and laboratories. Each candidate will have to provide a technical report detailing the employed methodology and a critical analysis of the obtained results. The report will assess: - The degree of understanding of the theoretical principles of different machine learning approaches - The ability of the student to analyze a specific problem, assessing which approaches, among those that have been presented, are more suited to solve the task - The ability of the student to apply the studied methods to devise suitable solutions for the specific case study - The ability of the student to critically evaluate the effectiveness of the proposed approaches. The written examination will consists of open questions covering the topics presented during the lectures. The written examination will assess: - The theoretical understanding of the basic principles of the presented machine learning approaches - The knowledge and understanding of the different approaches that have been presented during the lectures - The ability of the student to critically analyze and evaluate the different approaches.

Gli studenti e le studentesse con disabilit� o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unit� Special Needs, al fine di permettere al/la docente la declinazione pi� idonea in riferimento alla specifica tipologia di esame.

Exam: Written test; Individual project; Group project;

The exam includes two mandatory parts. The two mandatory parts are (i) a written exam and (ii) the evaluation of a group project. The final score is defined by considering both the evaluation of the group project and the written part. The teacher may request an integrative oral test to confirm the evaluations that were obtained. The written examination lasts 90 minutes and will consist of open and closed questions and exercises covering the topics presented during the lectures. A single-sided page of notes is allowed. Textbooks and electronic devices of any kind are not allowed. The written examination will assess the following: � The theoretical understanding of the basic principles of the presented machine learning approaches � The knowledge and understanding of the different approaches that have been presented during the lectures � The ability of the students to apply ML techniques to a simple numerical case study. � The ability of the students to design, implement and evaluate code in the Python language and its ML libraries The projects will address machine learning tasks. For each project, a dataset will be provided, and the students will have to develop a pipeline using suitable models for the specific tasks based on the topics and tools presented during lectures and laboratories. Each group will have to provide a technical report detailing the methodology employed and critically analyzing the results. The report will assess: � The degree of understanding of the theoretical principles of different machine-learning approaches � The ability of the students to analyze a specific problem, assessing which approaches, among those that have been presented, are more suited to solve the task � The working knowledge of the Python language and the major data mining and machine learning libraries � The ability of the students to apply the studied methods to devise suitable solutions for the specific case study � The ability of the students to critically evaluate the effectiveness of the proposed approaches After submitting their report, the students will have the possibility to peer-review other reports from the other group to obtain bonus points. The final grade will be given by the weighted average of the written exam (40%) and the project report grade (60%). Each part will have a grade between 0 and 30 cum laude. Both parts must be sufficient to pass the exam. � Individual written exam (40%, at least 18/30) � Project (60%, at least 18/30)

In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.