PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

Elenco notifiche



Data Science and Database Technology

01SQJOV

A.A. 2023/24

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Ingegneria Informatica (Computer Engineering) - Torino

Course structure
Teaching Hours
Lezioni 46
Esercitazioni in aula 25
Esercitazioni in laboratorio 9
Tutoraggio 60
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Chiusano Silvia Anna Professore Ordinario IINF-05/A 31 0 0 0 7
Co-lectures
Espandi

Context
SSD CFU Activities Area context
ING-INF/05
ING-INF/05
3
5
F - Altre attività (art. 10)
B - Caratterizzanti
Abilità informatiche e telematiche
Ingegneria informatica
2023/24
The course is taught in English. The course, compulsory for the Master degree in Computer Engineering, is offered on the 1st semester of the 1st year. The course addresses the fundamental issues in the technology of database management systems and introduces database management techniques for data warehouses (database systems specialized in strategic decision support), typically characterized by the need of managing very large databases. Both traditional OLAP (On Line Analytical Processing) analysis techniques and complex data mining techniques will be addressed. Laboratory sessions allow experimental activities, both on technological characteristics and data analysis, on the most widespread commercial and open-source products.
The course is taught in English. The course, compulsory for the Master degree in Computer Engineering, is offered on the 1st semester of the 1st year. The course addresses the fundamental issues in the technology of database management systems and introduces database management techniques for data warehouses (database systems specialized in strategic decision support), typically characterized by the need of managing very large databases. Both traditional OLAP (On Line Analytical Processing) analysis techniques and complex data mining techniques will be addressed. Laboratory sessions allow experimental activities, both on technological characteristics and data analysis, on the most widespread commercial and open-source products.
- Knowledge of the main technological characteristics of a database management system: concurrent data access management, reliability, physical level structures, data access optimization. - Ability to design the physical data structures for a relational database. - Knowledge of distributed database system architecture and replication management. - Knowledge of active database systems and SQL statements for trigger definition. - Ability to write triggers in the SQL language. - Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse. - Ability to design a data warehouse. - Knowledge of the SQL statements for OLAP queries in a data warehouse. - Ability to write OLAP queries in the SQL language. - Knowledge of the major data mining algorithms for classification, clustering, and association rule mining.
- Knowledge of the main technological characteristics of a database management system: concurrent data access management, reliability, physical level structures, data access optimization. - Ability to design the physical data structures for a relational database. - Knowledge of distributed database system architecture and replication management. - Knowledge of non-relational databases (NoSQL). - Ability to write queries on non-relational databases (NoSQL). - Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse. - Ability to design a data warehouse. - Knowledge of the SQL statements for OLAP queries in a data warehouse. - Ability to write OLAP queries in the SQL language. - Knowledge of the main machine learning algorithms for classification, clustering, and association rule mining.
Knowledge of the relational model and SQL language and basic programming skills.
Knowledge of the relational model. Ability to design queries using relational algebra. Ability to design complex instructions using SQL language. Basic programming skills.
- Technological characteristics of a database management system: concurrent data access management, reliability, physical level structures, data access optimization (1.8 cr.) - Active database systems and SQL statements for trigger definition (0.4 cr.) - Distributed database system architecture and replication management (0.4 cr.) - Data warehouses: architecture, methodology for conceptual, logical, and physical design, SQL statements for OLAP queries (1.4 cr.) - Data mining algorithms: classification, clustering, and association rule mining (1.6 cr.)
- Technological characteristics of a database management system: concurrent data access management, reliability, physical level structures, data access optimization (1.6 cfu) - NoSQL databases (MongoDB, Elastic): data model and query language (0.6 cfu) - Distributed database system architecture and replication management (0.4 cfu) - Data warehouses: architecture, methodology for conceptual, logical, and physical design, data preparation, SQL statements for OLAP queries (1.4 cfu) - Data science process (0.4 cfu) - Machine learning algorithms: classification, clustering, and association rule mining (1.2 cfu) - Analysis of case studies through classroom exercises and laboratories on the topics covered during lessons (2.4 cfu)
The course includes practices on the lecture topics, and in particular SQL language, physical database design, and conceptual, logical, and physical data warehouse design (1.8 cr.). Students will prepare an individual written report on exercises proposed during the course. The report will contribute to the final exam grade. The course includes laboratory sessions on the SQL language (also for database physical design) and data warehouse design (1.2 cr.). Laboratory sessions allow experimental activities on the most widespread commercial and open-source products.
The course includes lessons, classroom exercises and laboratories on the covered topics. Classroom exercises focus on the lecture topics, and in particular on physical design of a relational database; design and query of NoSQL databases; conceptual, logical, and physical data warehouse design and related processes for data query and preparation based on SQL language (1.2 cfu). Students will have individual practices during the course. The course includes laboratory sessions on the SQL language (also for database physical design), data warehouse design, design of NoSQL databases and process for data analysis (1.2 cfu). Laboratory sessions allow experimental activities on the most widespread commercial and open-source products. Four individual homeworks will be proposed, mainly aimed at encouraging critical thinking and problem solving skills on the teaching topics. For the homework, written reports must be submitted. Carrying out the homework is optional. If delivered, the homework will contribute to the final grade.
Reference books: - Atzeni, Ceri, Paraboschi, Torlone, 'Database systems', 1 ed., McGraw Hill, 1999. - Golfarelli, Rizzi, 'Data warehouse: teoria e pratica della progettazione', 2 ed., McGraw Hill, 2006. - Tan, Steinbach, Kumar, 'An introduction to data mining', Addison Wesley, 2005. Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the Portal.
Reference books: - Atzeni, Ceri, Fraternali, Paraboschi, Torlone, 'Basi di dati ', 5 ed., McGraw Hill, 2018. - Golfarelli, Rizzi, 'Data Warehouse Design: modern principles and methodologies', McGraw Hill, 2021. - Tan, Steinbach, Karpatne, Kumar, 'Introduction to data mining', 2 ed., Pearson, 2019. - Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. Third edition, McGraw Hill, 2003 - Dan Sullivan, NoSQL for Mere Mortals, Addison-Wesley Professional, 2015 - Kristina Chodorow, Shannon Bradshaw, MongoDB: The Definitive Guide (Powerful and Scalable Data Storage), 3 ed. O'Reilly Media, 2018 - Gormley, Tong, Elastic Search: The Definitive Guide, O’Reilly, 2015 Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the Portal.
Slides; Libro di testo; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni dell’anno corrente; Video lezioni tratte da anni precedenti; Strumenti di auto-valutazione;
Lecture slides; Text book; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (current year); Video lectures (previous years); Self-assessment tools;
E' possibile sostenere l’esame in anticipo rispetto all’acquisizione della frequenza
You can take this exam before attending the course
Modalità di esame: Elaborato progettuale individuale; Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;
Exam: Individual project; Computer-based written test in class using POLITO platform;
... The exam includes a written part, the evaluation of the reports on the individual practices assigned during the course, and an oral part. The individual practices and the oral part are optional. The written part lasts 2 hours. The final score is defined by considering the evaluation of the written part, and, optionally, of the individual practices and the oral part. The individual practices are considered only if the grade of the written part is 18 or above. Without the oral part, the maximum final grade given by the written part and the evaluation of the reports on the individual practices is 26. Otherwise, the final grade is the (approximated) average computed on the grade on the written part, the evaluation of the report on the individual practices, and the grade on the oral part. The written part includes - 2 multiple choice theory questions on the main course topics (technological characteristics of a database management system, SQL language, physical database design, conceptual, logical, and physical data warehouse design, data mining algorithms) (max 2 points) - 1 exercise on physical design (max 7 points) - 1 exercise on trigger design (max 8 points) - 1 exercise on data warehousing, including the design of a data warehouse and SQL queries for data access (max 13 points) Students can use textbooks or notes during the exam. Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies. The oral part includes questions on the main topics of the lectures (max 30 points). Reports on the individual practices assigned during the course are on the main topics of the lectures (max 2 points).
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Individual project; Computer-based written test in class using POLITO platform;
The exam includes a written part and the evaluation of the reports on the individual homework assigned during the course. The homework are optional. The teacher may request an integrative test to confirm the obtained evaluation. Learning objectives assessment The written part will assess by means of design exercises - the ability to design the physical structure of a database - the ability to design a data warehouse - the ability to define operations for data preparation in the SQL language - the ability to write OLAP queries in the SQL language The written part will assess by means of theory questions and exercises - the knowledge of the main technological characteristics of a database management system (concurrent data access, reliability) - the ability to write queries for non-relational databases - the knowledge of the main technological characteristics of distributed database systems - the knowledge of the main machine learning algorithms for classification, clustering, and association rule mining. Exam structure and grading criteria The exam includes a written part and the evaluation of reports on the homework assigned during the course. The homework are optional. The teacher may request an integrative test to confirm the obtained evaluation. The written part lasts 95 minutes. The final score is defined by considering the evaluation of the written part, and, optionally, of the reports on the homework. If the final score is strictly greater than 31 the registered score will be 30 with honor. The reports on the homework are considered only if the grade of the written part is 18 or above. The written part includes box-to-fill and multiple choice questions. For multiple choice questions, wrong answers are penalized. Missing answers are evaluated zero. Textbooks, notes, electronic devices of any kind are not allowed during the written part. Structure and topics of the written part. - 6-8 questions on the main topics of the course (technological characteristics of a database management system, distributed database systems, data preparation, data analytics algorithms) (max 8 points) - 1-3 exercises on physical design (max 5 points) - 1-3 exercises on data warehouse design (max. 3 points) - 1 exercises on data preparation for data warehouse (max. 5 points) - 3 exercises on the design of SQL instructions for data access in a data warehouse (max. 11 points) The score of each question will be specified in the exam text. Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies. Reports on the optional homework are assigned and must be delivered at predefined deadlines during the course. They deal with the main topics of the lectures (max 2 points).
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.
Esporta Word