PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

Elenco notifiche



Data management and visualization

01TXASM

A.A. 2020/21

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino

Course structure
Teaching Hours
Lezioni 63,5
Esercitazioni in laboratorio 16,5
Tutoraggio 40
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Apiletti Daniele   Professore Associato IINF-05/A 41 0 0 0 5
Co-lectures
Espandi

Context
SSD CFU Activities Area context
ING-INF/05 8 B - Caratterizzanti Ingegneria informatica
2020/21
The course, compulsory for the Master degree in Data science and Engineering, is offered on the 1st semester of the 1st year. The course introduces database management techniques for data warehouses (database systems specialized in strategic decision support), typically characterized by the need of managing very large databases. It addresses the fundamental issues in the technology of relational and NoSQL database management systems for very large data collections. The ability to visually present information correctly and effectively is a fundamental aspect not only in the engineering and scientific fields, but it represents an essential skill in communication in general. The course covers also the fundamental theoretical and methodological techniques for the effective management of quantitative (mainly numerical) information, which are used to effectively visualize and inspect input data and compute the KPIs by means of data warehouses. Laboratory sessions allow experimental activities, both on technological characteristics and data analysis and visualization, on widespread commercial and open-source products.
The course is offered for the Master degree in Data science and Engineering, 1st semester of the 1st year. The course covers three core topics in Data Management: (i) it introduces database management techniques for data warehouses: database systems specialized in strategic decision support, typically characterized by the need of managing very large historical data sets and efficiently extracting KPIs (Key Performance Indicators); (ii) it addresses the fundamental issues in the technology of relational and NoSQL database management systems for very large data collections; (iii) it provides the ability to visually present information correctly and effectively, as a fundamental aspect not only in the engineering and scientific fields, but as an essential skill in scientific and professional communication. The course provides the fundamental theoretical and methodological techniques for the effective management of quantitative (mainly numerical) information, which are used to effectively visualize and inspect input data and compute the KPIs by means of data warehouses. Laboratory sessions allow experimental activities, both on technological characteristics and data analysis and visualization, on widespread commercial and open-source products (e.g., Oracle, MongoDB, Tableau).
- Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse. - Ability to design a data warehouse. - Knowledge of the SQL statements for OLAP queries in a data warehouse. - Ability to write OLAP queries by means of the SQL language. - Knowledge of the main technological characteristics of NoSQL databases. - Ability to design the conceptual model and define the physical data structures for NoSQL databases. - Ability to design dashboards and KPIs - Knowledge of the basic principles of cognitive and perceptive aspects related to visualization, and knowledge of the main visualization techniques. - Ability to design and develop simple systems for visualizing quantitative information.
- Knowledge of data warehouse architecture and of the methodology for conceptual, logical, and physical design of a data warehouse. - Ability to design a data warehouse. - Knowledge of the SQL statements for OLAP queries in a data warehouse. - Ability to write OLAP queries by means of the SQL language. - Knowledge of the main technological characteristics of NoSQL databases. - Ability to design the conceptual model and define the physical data structures for NoSQL databases. - Knowledge for querying NoSQL databases. - Ability to design dashboards and KPIs - Knowledge of the basic principles of cognitive and perceptive aspects related to visualization, and knowledge of the main visualization techniques. - Ability to design and develop simple systems for visualizing quantitative information.
• Knowledge of the relational model and SQL language and basic programming skills.
- Knowledge of the relational model and SQL language and basic programming skills. - Basic knowledge of the JSON data format.
• Data cleaning and data integration (0.6 cr.) • Data warehouses: architecture, methodology for conceptual, logical, and physical design, SQL statements for OLAP queries (3 cr.) • NoSQL databases: Conceptual modeling, technological characteristics, and query languages (2.4 cr.) • Cognitive aspects of visualization and visual integrity principles (1 cr.) • Data visualization tools (1 cr.)
• Data cleaning and data integration (0.6 cr.) • Data warehouses: architecture, methodology for conceptual, logical, and physical design, SQL statements for OLAP queries (3 cr.) • NoSQL databases: conceptual modeling, technological characteristics, data management issues in distributed (non-relational) databases and query languages (2.4 cr.) • Cognitive aspects of visualization and visual integrity principles (1 cr.) • Data visualization tools (1 cr.)
The course includes practices on the lecture topics, and in particular conceptual, logical, and physical data warehouse design, extended SQL language, and NoSQL database design and query (6 cr.), and data visualization (2 cr.). Students will prepare individual written reports on exercises proposed during the course. Reports will contribute to the final exam grade. The course includes laboratory sessions on data warehouse design, extended SQL language, NoSQL database design and query, and data visualization. Laboratory sessions allow experimental activities on the most widespread commercial and open-source products.
The course includes practices on the lecture topics, and in particular conceptual, logical, and physical data warehouse design, extended SQL language, and NoSQL database design and query (6 cr.), and data visualization (2 cr.). The course includes many laboratory sessions for hands-on experience on data warehouse design, extended SQL language, NoSQL database design and query, and different data visualization aspects. Laboratory practices allow experimental activities on the most widespread commercial and open-source products, and they will be interleaved with lectures.
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal. Reference books: • Matteo Golfarelli, Stefano Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Education, 2009 • Kristina Chodorow, Shannon Bradshaw. MongoDB: The Definitive Guide (Powerful and Scalable Data Storage), 3 ed. O'Reilly Media, 2018. • Stephen Few. Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd Edition. Analytics Press, 2012 • Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.
Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the teaching Portal. Reference books: - Matteo Golfarelli, Stefano Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Education, 2009 - Dan Sullivan, NoSQL for Mere Mortals, Addison-Wesley Professional, 2015 - Kristina Chodorow, Shannon Bradshaw. MongoDB: The Definitive Guide (Powerful and Scalable Data Storage), 3 ed. O'Reilly Media, 2018 - Stephen Few. Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd Edition. Analytics Press, 2012 - Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.
Modalità di esame: Prova scritta tramite PC con l'utilizzo della piattaforma di ateneo;
- teoria, risposta chiusa - exercises, risposta aperta o risposta chiusa - no books
Exam: Computer-based written test using the PoliTo platform;
The exam lasts 90 minutes and consists of theoretical questions and written exercises, as described in the following: - [max 5 points] 3 multiple-choice questions on theoretical topics of the course, such as conceptual, logical, and physical data warehouse design, extended SQL language, technological characteristics of NoSQL databases and their usage, data management issues in distributed (non-relational) databases, data visualization techniques - [max 12 points] exercises on data warehousing, including 2 open and/or multiple-choice questions on data warehouse design, and 2 queries for data access in extended SQL (open questions with answers to be provided in a text box) - [max 9 points] 1 exercise on NoSQL database design and 1 query for data access (open questions with answers to be provided in a text box) - [max 5 points] 1 exercise on visualization analysis and design with open questions (answers to be provided in a text box) Students are not allowed to use textbooks, notes, or additional electronic devices (besides the PC with Respondus) during the exam. Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies. Specific points for each exercise are indicated in the exam text. Multiple-choice questions have a penalty for wrong answers, whereas no-penalty no-points in case no answer is provided.
Modalità di esame: Test informatizzato in laboratorio; Prova scritta tramite PC con l'utilizzo della piattaforma di ateneo;
- teoria, risposta chiusa - exercises, risposta aperta o risposta chiusa - no books
Exam: Computer lab-based test; Computer-based written test using the PoliTo platform;
The exam lasts 90 minutes and consists of theoretical questions and written exercises, as described in the following: - [max 5 points] 3 multiple-choice questions on theoretical topics of the course, such as conceptual, logical, and physical data warehouse design, extended SQL language, technological characteristics of NoSQL databases and their usage, data management issues in distributed (non-relational) databases, data visualization techniques - [max 12 points] exercises on data warehousing, including 2 open and/or multiple-choice questions on data warehouse design, and 2 queries for data access in extended SQL (open questions with answers to be provided in a text box) - [max 9 points] 1 exercise on NoSQL database design and 1 query for data access (open questions with answers to be provided in a text box) - [max 5 points] 1 exercise on visualization analysis and design with open questions (answers to be provided in a text box) Students are not allowed to use textbooks, notes, or additional electronic devices (besides the PC with Respondus) during the exam. Exercises are evaluated according to the correctness of the proposed solution and to the appropriateness of the adopted resolution methodologies. Specific points for each exercise are indicated in the exam text. Multiple-choice questions have a penalty for wrong answers, whereas no-penalty no-points in case no answer is provided.
Esporta Word