PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

Elenco notifiche



Data science and machine learning lab

01VSCWS, 01VSCSM, 01VSCYH

A.A. 2025/26

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino
Master of science-level of the Bologna process in Ingegneria Matematica - Torino

Course structure
Teaching Hours
Lezioni 35
Esercitazioni in aula 15
Esercitazioni in laboratorio 30
Tutoraggio 60
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Giobergia Flavio   Ricercatore L240/10 IINF-05/A 35 15 0 0 1
Co-lectures
Espandi

Context
SSD CFU Activities Area context
ING-INF/05 8 B - Caratterizzanti Ingegneria informatica
2025/26
The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course is focused on the design and implementation of data-driven processes, which are commonly exploited today to extract knowledge from data and support decision making. Specifically, the course initially introduces the data science process, focusing on all its main phases, and then provides the theoretical and practical knowledge about the data mining and basic machine learning algorithms that are commonly used for analyzing large and heterogeneous data. The course introduces state-of-the-art data mining and machine learning libraries in Python. Through laboratory sessions, all the phases of a standard data science process (e.g., data preparation and cleaning, data exploration and characterization, data mining algorithm selection and tuning, result evaluation) will be explored with a learning-by-doing fashion.
The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course is focused on the design and implementation of data-driven processes, which are commonly exploited today to extract knowledge from data and support decision making. Specifically, the course initially introduces the data science process, focusing on all its main phases, and then provides the theoretical and practical knowledge about the data mining and basic machine learning algorithms that are commonly used for analyzing large and heterogeneous data. The course introduces state-of-the-art data mining and machine learning libraries in Python. Through laboratory sessions, all the phases of a standard data science process (e.g., data preparation and cleaning, data exploration and characterization, data mining algorithm selection and tuning, result evaluation) will be explored with a learning-by-doing fashion.
• Knowledge of the main phases characterizing a data science process. • Knowledge of the major data mining algorithms for classification, regression, clustering, association rule mining and anomaly detection. • Knowledge of the major data mining and machine learning libraries. • Ability to design, implement and evaluate a data science process. • Ability to design, implement and evaluate analytics scripts in the Python language. • Ability to use and tune data mining and machine learning algorithms.
• Knowledge of the main phases characterizing a data science process. • Knowledge of the major data mining algorithms for classification, regression, clustering, association rule mining and anomaly detection. • Knowledge of the major data mining and machine learning libraries. • Ability to design, implement and evaluate a data science process. • Ability to design, implement and evaluate analytics scripts in the Python language. • Ability to use and tune data mining and machine learning algorithms.
• Basic programming skills in Python.
• Basic programming skills in Python.
• Data science process: main phases (0.4 cfu) • Data collection, cleaning, transformation and enrichment, and feature engineering (0.5 cfu) • Data mining algorithms: classification, regression, clustering, association rule mining, anomaly detection (1.8 cfu) • Introduction to data mining and machine learning libraries in Python (e.g., scikit-learn) (1.2 cfu) • Case study analysis (0.6 cfu) • Data science process design in the lab (3.5 cfu)
• Data science process: main phases (0.4 cfu) • Data collection, cleaning, transformation and enrichment, and feature engineering (0.5 cfu) • Data mining algorithms: classification, regression, clustering, association rule mining, anomaly detection (1.8 cfu) • Introduction to data mining and machine learning libraries in Python (e.g., scikit-learn) (1.2 cfu) • Case study analysis (0.6 cfu) • Data science process design in the lab (3.5 cfu)
The course includes lectures and practices on the lecture topics, and in particular on data science process design, data preprocessing and data mining algorithms (4.5 cfu). Students will prepare a written report on a group project assigned during the course. The course includes laboratory sessions on data science process design and data analytics (3.5 cfu). Laboratory sessions allow experimental activities using Python and its ecosystem of libraries for machine learning and data mining.
The course includes lectures and practices on the lecture topics, and in particular on data science process design, data preprocessing and data mining algorithms (4.5 cfu). Students will prepare a written report on a group project assigned during the course. The course includes laboratory sessions on data science process design and data analytics (3.5 cfu). Laboratory sessions allow experimental activities using Python and its ecosystem of libraries for machine learning and data mining.
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is available on the course website. Book (only a few chapters needed) - Tan, Steinbach, Karpatne, Kumar, 'Introduction to data mining', 2 ed., Pearson, 2019
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is available on the course website. Book (only a few chapters needed) - Tan, Steinbach, Karpatne, Kumar, 'Introduction to data mining', 2 ed., Pearson, 2019
Slides; Libro di testo; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni dell’anno corrente; Video lezioni tratte da anni precedenti;
Lecture slides; Text book; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (current year); Video lectures (previous years);
E' possibile sostenere l’esame in anticipo rispetto all’acquisizione della frequenza
You can take this exam before attending the course
Modalità di esame: Elaborato progettuale in gruppo; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;
Exam: Group project; Computer-based written test in class using POLITO platform;
... The exam includes a group project and a written part. The final score is defined by considering both the evaluation of the group project and the written part. The teacher may request an additional oral test to confirm the obtained evaluation. * Learning objectives assessment * The group project will assess: - the ability to design, implement and evaluate a complete data science process, including the evaluation and tuning of machine learning algorithms, - the ability to design, implement and evaluate analytics scripts in the python language, - the ability to effectively communicate the adopted methodology and experimental results. The written part will assess: - the knowledge of the data preparation techniques and the major data mining algorithms for classification, regression, clustering, anomaly detection and association rule mining, - the working knowledge of the Python language and the major data mining and machine learning libraries, - the ability to autonomously assess a simple dataset and to correctly apply a data science pipeline. * Exam structure and grading criteria * The group project consists in designing and implementing a data science process for solving a data analytics task. The project is assigned before the start of the exam session and its score is valid for the entire exam session. The evaluation of the group project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers theoretical and practical aspects of the course. The written test includes (a) multiple choice questions (a penalty is applied for wrong answers), (b) box-to-fill questions, based on solving exercises related to the theoretical part of the course (data cleaning, data pre-processing, data mining algorithms, data science process), and (c) a practical exercise where a simple data science pipeline must be implemented. The score of each question will be specified in the exam text. The written exam lasts 2.5 hours. Textbooks, notes, electronic devices of any kind are not allowed. The maximum grade for the group project is 10. The maximum grade for the written part is 22. The final grade is given by the sum of the two parts. The exam is passed if the grade of the group project is greater than or equal to 6, the grade of the written part is greater than or equal to 12, and the overall grade is greater than or equal to 18. If the final score (before rounding) is greater than or equal to 31, the registered score will be 30 with honor.
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Group project; Computer-based written test in class using POLITO platform;
The exam includes a group project and a written part. The final score is defined by considering both the evaluation of the group project and the written part. The teacher may request an additional oral test to confirm the obtained evaluation. * Learning objectives assessment * The group project will assess: - the ability to design, implement and evaluate a complete data science process, including the evaluation and tuning of machine learning algorithms, - the ability to design, implement and evaluate analytics scripts in the python language, - the ability to effectively communicate the adopted methodology and experimental results. The written part will assess: - the knowledge of the data preparation techniques and the major data mining algorithms for classification, regression, clustering, anomaly detection and association rule mining, - the working knowledge of the Python language and the major data mining and machine learning libraries, - the ability to autonomously assess a simple dataset and to correctly apply a data science pipeline. * Exam structure and grading criteria * The group project consists in designing and implementing a data science process for solving a data analytics task. The project is assigned before the start of the exam session and its score is valid for the entire exam session. The evaluation of the group project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers theoretical and practical aspects of the course. The written test includes (a) multiple choice questions (a penalty is applied for wrong answers), (b) box-to-fill questions, based on solving exercises related to the theoretical part of the course (data cleaning, data pre-processing, data mining algorithms, data science process), and (c) a practical exercise where a simple data science pipeline must be implemented. The score of each question will be specified in the exam text. The written exam lasts 2.5 hours. Textbooks, notes, electronic devices of any kind are not allowed. The maximum grade for the group project is 10. The maximum grade for the written part is 22. The final grade is given by the sum of the two parts. The exam is passed if the grade of the group project is greater than or equal to 6, the grade of the written part is greater than or equal to 12, and the overall grade is greater than or equal to 18. If the final score (before rounding) is greater than or equal to 31, the registered score will be 30 with honor.
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.
Esporta Word