The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course introduces database management techniques for data warehouses (database systems specialized in strategic decision support), typically characterized by the need of managing very large databases. It addresses the fundamental issues in the technology of relational and NoSQL database management systems for very large data collections. The ability to visually present information correctly and effectively is a fundamental aspect not only in the engineering and scientific fields, but it represents an essential skill in communication in general. The course covers also the fundamental theoretical and methodological techniques for the effective management of quantitative (mainly numerical) information, which are used to effectively visualize and inspect input data and compute the KPIs by means of data warehouses. Laboratory sessions allow experimental activities, both on technological characteristics and data analysis and visualization, on widespread commercial and open-source products.
The course is offered for the Master degree in Data science and Engineering, 1st semester of the 1st year.
The course covers three core topics in Data Management:
(i) it introduces database management techniques for data warehouses: database systems specialized in strategic decision support, typically characterized by the need of managing very large historical data sets and efficiently extracting KPIs (Key Performance Indicators);
(ii) it addresses the fundamental issues in the technology of relational and NoSQL database management systems for very large data collections, from NoSQL storage solutions to non-relational database design;
(iii) it provides the ability to visually present information correctly and effectively, as a fundamental aspect not only in the engineering and scientific fields, but as an essential skill in scientific and professional communication.
The course provides the fundamental theoretical and methodological techniques for the effective management of quantitative (mainly numerical) information, which are used to effectively visualize and inspect input data and compute the KPIs by means of data warehouses.
Laboratory sessions allow experimental activities, both on technological characteristics and data analysis and visualization, on widespread commercial and open-source products (e.g., Oracle, MongoDB, Tableau).
- Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse.
- Ability to design a data warehouse.
- Knowledge of the SQL statements for OLAP queries in a data warehouse.
- Ability to write OLAP queries by means of the SQL language.
- Knowledge of the main technological characteristics of NoSQL databases.
- Ability to design the conceptual model and define the physical data structures for NoSQL databases.
- Ability to design dashboards and KPIs
- Knowledge of the basic principles of cognitive and perceptive aspects related to visualization, and knowledge of the main visualization techniques.
- Ability to design and develop simple systems for visualizing quantitative information.
- Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse.
- Ability to design a data warehouse.
- Knowledge of the SQL statements for OLAP queries in a data warehouse.
- Ability to write OLAP queries by means of the SQL language.
- Knowledge of the main technological characteristics of NoSQL databases.
- Ability to design the conceptual model and define the physical data structures for NoSQL databases.
- Knowledge for querying NoSQL databases.
- Ability to design dashboards and KPIs
- Knowledge of the basic principles of cognitive and perceptive aspects related to visualization, and knowledge of the main visualization techniques.
- Ability to design and develop simple systems for visualizing quantitative information.
• Knowledge of the relational model and SQL language and basic programming skills.
- Knowledge of the relational model, SQL language and basic programming skills.
- Basic knowledge of the JSON data format is preferred even though not strictly required.
• Data cleaning and data integration (0.6 cr.)
• Data warehouses: architecture, methodology for conceptual, logical, and physical design, SQL statements for OLAP queries (3 cr.)
• NoSQL databases: Conceptual modeling, technological characteristics, and query languages (2.4 cr.)
• Cognitive aspects of visualization and visual integrity principles (1 cr.)
• Data visualization tools (1 cr.)
• Data warehouses: architecture, methodology for conceptual, logical, and physical design, SQL statements for OLAP queries (3 credits)
• NoSQL databases: conceptual modeling, technological characteristics, data management issues in distributed (non-relational) databases and query languages (3 credits)
• Cognitive aspects of visualization and visual integrity principles (1 credit)
• Data visualization tools (1 credit)
The course includes practices on the lecture topics, and in particular conceptual, logical, and physical data warehouse design, extended SQL language, and NoSQL database design and query (6 cr.), and data visualization (2 cr.). Students will prepare individual written reports on exercises proposed during the course. Reports will contribute to the final exam grade. The course includes laboratory sessions on data warehouse design, extended SQL language, NoSQL database design and query, and data visualization. Laboratory sessions allow experimental activities on the most widespread commercial and open-source products.
The course includes practices on the lecture topics, and in particular conceptual, logical, and physical data warehouse design, extended SQL language, and NoSQL database design and query (6 cr.), and data visualization (2 cr.).
The course includes laboratory sessions for hands-on experience on data warehouse design, extended SQL language, NoSQL database design and query, and different data visualization aspects.
Laboratory practices allow experimental activities on the most widespread commercial and open-source products, such as Oracle, MongoDB, and Tableau, and they will be interleaved with lectures.
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal.
Reference books:
• Matteo Golfarelli, Stefano Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Education, 2009
• Kristina Chodorow, Shannon Bradshaw. MongoDB: The Definitive Guide (Powerful and Scalable Data Storage), 3 ed. O'Reilly Media, 2018.
• Stephen Few. Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd Edition. Analytics Press, 2012
• Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal.
Reference books:
- Matteo Golfarelli, Stefano Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Education, 2009
- Dan Sullivan, NoSQL for Mere Mortals, Addison-Wesley Professional, 2015
- Kristina Chodorow, Shannon Bradshaw. MongoDB: The Definitive Guide (Powerful and Scalable Data Storage), 3 ed. O'Reilly Media, 2018
- Stephen Few. Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd Edition. Analytics Press, 2012
- Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.
Slides; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni dell’anno corrente;
Lecture slides; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (current year);
Modalità di esame: Prova scritta in aula tramite PC con l'utilizzo della piattaforma di ateneo;
Exam: Computer-based written test in class using POLITO platform;
...
Exam: individual practice assignments; written test.
The exam includes a written part and the evaluation of the reports on the individual practices assigned during the course. The written part lasts 2 hours. The final score is defined by considering the evaluation of the written part and of the individual practices. The individual practices are considered only if the grade of the written part is 18 or above.
The written part includes
- theory questions on the main course topics (conceptual, logical, and physical data warehouse design, extended SQL language, technological characteristics of NoSQL databases, data visualization techniques)
- 1 exercise on data warehousing, including the design of a data warehouse and SQL queries for data access
- 1 exercise on NoSQL database design and queries for data access
- 1 exercise on visualization analysis and design
Students can use textbooks or notes during the exam.
Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies.
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Computer-based written test in class using POLITO platform;
The exam lasts 90 minutes and consists of theoretical questions and written exercises, as described in the following:
- [max 6 points] 3-6 multiple-choice questions on theoretical topics of the course, such as conceptual, logical, and physical data warehouse design, extended SQL language, technological characteristics of NoSQL databases and their usage, data management issues in distributed (non-relational) databases, data visualization techniques
- [max 12 points] exercises on data warehousing, including 2-4 open and/or multiple-choice questions on data warehouse design, and 2-3 queries for data access in extended SQL (open questions with answers to be provided in a text box)
- [max 9 points] 1-2 exercises on NoSQL database design and 1-2 queries for data access (open questions with answers to be provided in a text box)
- [max 5 points] 1 exercise on visualization analysis and design with open questions (answers to be provided in a text box)
Students are not allowed to use textbooks, notes, or additional electronic devices during the exam, besides their own notebook with Lockdown/Respondus.
Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies.
Specific points for each exercise are indicated in the exam text.
Multiple-choice questions have a penalty for wrong answers, whereas no-penalty no-points in case no answer is provided.
Learning objectives assessment.
The exam will assess:
- the knowledge of data warehouse architectures and of their design methodologies (conceptual, logical, and physical)
- the ability to design a data warehouse in a provided use case
- the ability to write extended SQL queries to extract data of interest from a data warehouse
- the knowledge of the main technological characteristics of NoSQL databases
- the ability to design NoSQL databases and to query NoSQL databases
- the ability to design dashboards and KPIs
- the knowledge of the basic principles of cognitive and perceptive aspects related to visualization, and of the main visualization techniques
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.