Portale della Didattica

Data science lab: process and methods

01TWZSM

A.A. 2020/21

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino

Course structure

Teaching	Hours
Lezioni	35
Esercitazioni in aula	15
Esercitazioni in laboratorio	30
Tutoraggio	57

Lecturers

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut	Years teaching
Baralis Elena Maria	Professore Ordinario	IINF-05/A	20,5	0	0	4	5

Co-lectures

Espandi

Teacher	Status	SSD	h.Les	h.Ex	h.Lab	h.Tut
Cerquitelli Tania	Professore Ordinario	IINF-05/A	7,5	0	0	0
Giobergia Flavio	Ricercatore L240/10	IINF-05/A	3	0	21	8

Context

SSD	CFU	Activities	Area context
ING-INF/05	8	B - Caratterizzanti	Ingegneria informatica

Date d'appello

Orario delle lezioni

Statistiche superamento esami

Anno accademico di inizio validita

2020/21

Presentazione
Course description

The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course is focused on the design and implementation of data-driven processes, which are commonly exploited today to extract knowledge from data and support decision-making. Specifically, the course initially introduces the data science process, focusing on all its main phases, and then provides the theoretical and practical knowledge about the data mining and basic machine learning algorithms that are commonly used for analyzing large and heterogeneous data. The course introduces also the Python language and the state of the art data mining and machine learning libraries. Many laboratory sessions, based on a learning-by-doing approach, allow experimental activities on all the phases of a standard data science process (e.g., data preparation and cleaning, data exploration and characterization, data mining algorithm selection and tuning, result evaluation) on the most widespread commercial and open-source products.

The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course is focused on the design and implementation of data-driven processes, which are commonly exploited today to extract knowledge from data and support decision making. Specifically, the course initially introduces the data science process, focusing on all its main phases, and then provides the theoretical and practical knowledge about the data mining and basic machine learning algorithms that are commonly used for analyzing large and heterogeneous data. The course introduces also the Python language and the state of the art data mining and machine learning libraries. Many laboratory sessions, based on a learning-by-doing approach, allow experimental activities on all the phases of a standard data science process (e.g., data preparation and cleaning, data exploration and characterization, data mining algorithm selection and tuning, result evaluation) on the most widespread commercial and open-source products.

Risultati attesi
Expected Learning Outcomes

� Knowledge of the main phases characterizing a data science process. � Knowledge of the major data mining algorithms for classification, regression, clustering, and association rule mining. � Knowledge of the Python language. � Knowledge of the major data mining and machine learning libraries. � Ability to design, implement and evaluate a data science process. � Ability to design, implement and evaluate analytics scripts in the python language. � Ability to use and tune data mining and machine learning algorithms.

Prerequisiti
Pre-requirements

� Basic programming skills.

Programma
Course topics

� Data science process: main phases (0.4 cr.) � Data collection, cleaning, transformation and enrichment and feature engineering (0.5 cr.) � Data mining algorithms: classification, regression, clustering, and association rule mining (1.5 cr.) � Introduction to Python and data mining and machine learning libraries (e.g., scikit-learn) (1.5 cr.) � Case study analysis (0.6 cr.) � Data science process design in the lab (3.5 cr.)

� Data science process: main phases (0.4 cr.) � Data collection, cleaning, transformation and enrichment, and feature engineering (0.5 cr.) � Data mining algorithms: classification, regression, clustering, and association rule mining (1.5 cr.) � Introduction to Python and data mining and machine learning libraries (e.g., scikit-learn) (1.5 cr.) � Case study analysis (0.6 cr.) � Data science process design in the lab (3.5 cr.)

Sustainable development goals

Fornire un�educazione di qualit�, equa ed inclusiva, e opportunit� di apprendimento per tutti

Note
Additional information

Organizzazione dell'insegnamento
Course structure

The course includes lectures and practices on the lecture topics, and in particular on data science process design, data preprocessing and data mining algorithms. Students will prepare an individual written report on an individual project assigned during the course. The course includes laboratory sessions on data science process design and data analytics. Laboratory sessions allow experimental activities on the most widespread commercial and open-source products.

The course includes lectures and practices on the lecture topics, and in particular on data science process design, data preprocessing and data mining algorithms (4.5 cr.). Students will prepare an individual written report on an individual project assigned during the course. The course includes laboratory sessions on data science process design and data analytics (3.5 cr.). Laboratory sessions allow experimental activities on the most widespread commercial and open-source products.

Bibliografia
Reading materials

Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal. Book (only a few chapters needed) - Tan, Steinbach, Kumar, 'An introduction to data mining', 2 ed., Addison Wesley, 2005.

Criteri, regole e procedure per l'esame esclusivamente IN REMOTO
Assessment and grading criteria for ONLINE exam

Modalita di esame: Prova scritta tramite PC con l'utilizzo della piattaforma di ateneo; Elaborato progettuale individuale;

Risultati di apprendimento attesi Conoscenza e capacit� di comprensione - Conoscenza delle tecniche di rappresentazione digitale del segnale audio; capacit� di analizzare e calcolare i parametri caratterizzanti delle varie tipologie di segnali audio sapendo valutare i vantaggi dell'impiego di ciascuna tecnica in differenti contesti. - Conoscenza delle fondamentali nozioni di psicoacustica per la comprensione delle tecniche di elaborazione e analisi del suono nel dominio del tempo e della frequenza. - Conoscenza degli algoritmi e standard di codifica e compressione del segnale vocale e audio (standard ITU, MPEG, Dolby); capacit� di utilizzare, modificare e configurare in modo appropriato i diversi standard in funzione dei vari scenari applicativi. - Conoscenza delle tecnologie per la sintesi e modellizzazione dei segnali audio ivi compresa la descrizione dell'audio sintetico e partiture musicali. Capacit� di applicare conoscenza e comprensione - Capacit� di applicare filtri di elaborazione su segnali audio digitali attraverso programmi di calcolo numerici o linguaggi di programmazione. - Capacit� di applicare le tecniche di elaborazione dell�audio per la creazione di effetti audio digitali. - Comprensione dei fattori di merito nella scelta e progettazione di un codificatore audio e voce; capacit� di affrontare le principali questioni legate all'utilizzo di diversi formati per contenuti multimediali. The exam includes an individual project assigned during the course and a written part. The final score is defined by considering the evaluation of the individual project and the written part. The individual project consists in designing and implementing a data science process for solving a data analytics task. The evaluation of the individual project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers the theoretical part of the course and lasts 1 hour. It includes multiple choice questions, based on solving exercises related to the theoretical part of the course (data cleaning, data preprocessing, data mining algorithms, data science process). The maximum grade for the individual project is 20. The maximum grade for the written part is 12. The final grade is given by the sum of the two parts. The exam is passed if the grade of the individual project is greater than or equal to 12, the grade of the written part is greater than or equal to 7, and the overall grade is greater than or equal to 18. Students can use textbooks or notes during the written part.

Exam: Computer-based written test using the PoliTo platform; Individual project;

The exam includes an individual project and a written part. The final score is defined by considering both the evaluation of the individual project and the written part. Learning objectives assessment The written part will assess - the knowledge of the data preparation techniques and the major data mining algorithms for classification, regression, clustering, and association rule mining. - the working knowledge of the Python language and the major data mining and machine learning libraries. The individual project will assess - the ability to design, implement and evaluate a complete data science process, including the evaluation and tuning of machine learning algorithms. - the ability to design, implement and evaluate analytics scripts in the python language. Exam structure and grading criteria The individual project consists in designing and implementing a data science process for solving a data analytics task. The project is assigned before the start of the exam session and its score is valid for the entire exam session. The evaluation of the individual project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers the theoretical part of the course. It includes multiple choice and box-to-fill questions, based on solving exercises related to the theoretical part of the course (data cleaning, data pre-processing, data mining algorithms, data science process). For multiple choice questions, wrong answers are penalized. The score of each question will be specified in the exam text. The written exam lasts 80 minutes. Textbooks, notes, electronic devices of any kind are not allowed. The maximum grade for the individual project is 16. The maximum grade for the written part is 16. The final grade is given by the sum of the two parts. The exam is passed if the grade of the individual project is greater than or equal to 10, the grade of the written part is greater than or equal to 8, and the overall grade is greater than or equal to 18.

Criteri, regole e procedure per l'esame IN MODALITA' MISTA (in remoto e in presenza)
Assessment and grading criteria for BLENDED exam (online and onsite)

Modalita di esame: Test informatizzato in laboratorio; Prova scritta tramite PC con l'utilizzo della piattaforma di ateneo; Elaborato progettuale individuale;

The exam includes an individual project and a written part. The final score is defined by considering both the evaluation of the individual project and the written part. Learning objectives assessment The written part will assess - the knowledge of the data preparation techniques and the major data mining algorithms for classification, regression, clustering, and association rule mining. - the working knowledge of the Python language and the major data mining and machine learning libraries. The individual project will assess - the ability to design, implement and evaluate a complete data science process, including the evaluation and tuning of machine learning algorithms. - the ability to design, implement and evaluate analytics scripts in the python language. Exam structure and grading criteria The individual project consists in designing and implementing a data science process for solving a data analytics task. The project is assigned before the start of the exam session and its score is valid for the entire exam session. The evaluation of the individual project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers the theoretical part of the course. It includes multiple choice and box-to-fill questions, based on solving exercises related to the theoretical part of the course (data cleaning, data pre-processing, data mining algorithms, data science process). For multiple choice questions, wrong answers are penalized. The score of each question will be specified in the exam text. The written exam lasts 80 minutes. Textbooks, notes, electronic devices of any kind are not allowed. The maximum grade for the individual project is 16. The maximum grade for the written part is 16. The final grade is given by the sum of the two parts. The exam is passed if the grade of the individual project is greater than or equal to 10, the grade of the written part is greater than or equal to 8, and the overall grade is greater than or equal to 18.

Exam: Computer lab-based test; Computer-based written test using the PoliTo platform; Individual project;

The exam includes an individual project and a written part. The final score is defined by considering both the evaluation of the individual project and the written part. Learning objectives assessment The written part will assess - the knowledge of the data preparation techniques and the major data mining algorithms for classification, regression, clustering, and association rule mining. - the working knowledge of the Python language and the major data mining and machine learning libraries. The individual project will assess - the ability to design, implement and evaluate a complete data science process, including the evaluation and tuning of machine learning algorithms. - the ability to design, implement and evaluate analytics scripts in the python language. Exam structure and grading criteria The individual project consists in designing and implementing a data science process for solving a data analytics task. The project is assigned before the start of the exam session and its score is valid for the entire exam session. The evaluation of the individual project is based on the performance and accuracy of the proposed solution, in terms of standard quality measures (e.g., prediction accuracy), and completeness (i.e., in depth analysis of each phase of the designed process and motivation for selecting given techniques and algorithms). The written part covers the theoretical part of the course. It includes multiple choice and box-to-fill questions, based on solving exercises related to the theoretical part of the course (data cleaning, data pre-processing, data mining algorithms, data science process). For multiple choice questions, wrong answers are penalized. The score of each question will be specified in the exam text. The written exam lasts 80 minutes. Textbooks, notes, electronic devices of any kind are not allowed. The maximum grade for the individual project is 16. The maximum grade for the written part is 16. The final grade is given by the sum of the two parts. The exam is passed if the grade of the individual project is greater than or equal to 10, the grade of the written part is greater than or equal to 8, and the overall grade is greater than or equal to 18.