PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

Elenco notifiche



Mathematics in Machine Learning

01TXGSM

A.A. 2024/25

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino

Course structure
Teaching Hours
Lezioni 60
Esercitazioni in aula 20
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Vaccarino Francesco Professore Associato MATH-02/B 30 10 0 0 6
Co-lectures
Espandi

Context
SSD CFU Activities Area context
MAT/03
SECS-S/01
4
4
C - Affini o integrative
F - Altre attività (art. 10)
Attività formative affini o integrative
Altre conoscenze utili per l'inserimento nel mondo del lavoro
2024/25
This course introduces the students to a solid mathematical foundation of Machine Learning (ML) by blending learning theory, geometry, topology, and statistics. Starting by introducing the algebraic and geometric structures used to represent and manipulate data, we will move to the more geometrical aspects of ML, paying particular attention to the various concepts of dimension and learnability. Linear algebra-based methods will be thoroughly presented. At the same time, (generalized) linear models, their selection, regularization, and validation will be presented in full detail from a rigorous statistical point of view and some Bayesian methods. These two aspects of the theory, the geometrical and the statistical, will be merged via case studies on data.
The student will learn the basic concepts of machine and statistical learning from both the frequentist and the Bayesian viewpoint, the main techniques for multivariate data, and the critical use of specialized software (R, SAS, BUGS, STAN, MATLAB, ORANGE, R, Python, Rapid Miner and the like), being able to tell the pros and cons.
The prerequisites for this course are knowledge of basic probability theory and statistics; linear algebra, in particular, SVD; basic metric geometry, and calculus.
• Mathematical representations of data: spaces (including Hilbert spaces), metrics, distances, dissimilarities, and kernels. The geometry of very high dimensional spaces and the curse of dimensionality. • Learning theory, PAC, and VC dimension. Trade-off Bias vs Model Variance and Model Complexity. • Cross-validation, bootstrap, and applications. Ensemble methods: bagging, random forest, and boosting. • Linear algebra-based methods: Principal Component Analysis, Linear Discriminant Analysis, Stochastic projections (Johnson - Lindenstrauss Transform); Support Vector machines and kernel methods. • Linear Models (regression, ANOVA, DOE). • Generalized linear models (categorical data, logistic, and multinomial regression). • Model and feature selection (e.g., lasso, AIC, BIC, ridge). • Bayesian networks (basic concepts, exact and MCMC-based computations).
There will be 60 hours of lessons and 20 hours of practice. Exercises are presented and solved in the class.
The course website will provide slides from the lectures, examples of R and Python scripts, and exercises with solutions. A list of suggested books: MAIN - Data Science and Machine Learning: Mathematical and Statistical Methods Dirk P. Kroese, Zdravko I. Botev, Thomas Taimre, Radislav Vaisman CRC Press, 2019 - 510 pages. - Understanding machine learning: From theory to algorithms. Shalev-Shwartz, Shai, and Shai Ben-David. Cambridge University Press, 2014. FURTHER LECTURES - Machine Learning A Probabilistic Perspective By Kevin P. Murphy ISBN: 9780262018029 Publisher: The MIT Press
Lecture slides; Text book; Practice book; Exercises; Exercise with solutions ; Video lectures (current year); Video lectures (previous years);
You can take this exam before attending the course
Exam: Written test; Optional oral exam;
The goal of the exam is to test the candidate's knowledge of the topics included in the official program and their skills in analyzing data using the methods explained in the course. The exam consists of a written examination and an optional oral examination. The written examination consists of 6 exercises. The exercises will be similar to those presented during the lectures and will involve modeling some practical problems, answering multiple-choice questions, and solving some conceptual exercises. The written exam lasts two hours. During the test, textbooks, student notes, or formularies provided by the teacher during the year are allowed. The maximum possible score will be 30/30. For students who get a positive mark (greater than or equal to 18/30) in the written exam, the oral exam is possible under request by the student (or by the professor). The students will be asked methodological and theoretical questions about the course’s contents. After the oral test, the mark obtained in the first part of the exam can be increased or decreased by no more than 6 points.
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.
Esporta Word