Politecnico di Torino
Politecnico di Torino
Politecnico di Torino
Academic Year 2017/18
Big data: architectures and data analytics
Master of science-level of the Bologna process in Computer Engineering - Torino
Master of science-level of the Bologna process in Ict For Smart Societies - Torino
Master of science-level of the Bologna process in Mathematical Engineering - Torino
Teacher Status SSD Les Ex Lab Years teaching
Garza Paolo ORARIO RICEVIMENTO RB ING-INF/05 40 5 15 3
Garza Paolo ORARIO RICEVIMENTO RB ING-INF/05 40 5 15 3
SSD CFU Activities Area context
ING-INF/05 6 B - Caratterizzanti Ingegneria informatica
ORA-01722: invalid number
Subject fundamentals
In the big data era traditional data management and analytic systems are no more adequate. Hence, to manage and fruitfully exploit the huge amount of available heterogeneous data, novel data models, programming paradigms, information systems, and network architectures are needed.
The course addresses the challenges arising in the Big Data era. Specifically, the course will cover how to collect, store, retrieve, and analyze big data to mine useful knowledge and insightful hints. The course covers not only data model and data analytics aspects but also novel programming paradigms (e.g., MapReduce, Spark RDDs) and discusses how they can be exploit to support big data scientists to extract insights from data.
Expected learning outcomes
The course aims at providing:
Knowledge of the main problems and opportunities arising in the big data context and technological characteristics of the infrastructures and distributed systems used to deal with big data (e.g., Hadoop and Spark).
Ability to write distributed programs to process and analyze data by means of novel programming paradigms: Map Reduce and Spark programming paradigms
Knowledge of the (relational and non-relational) databases systems that are used to store big data
Prerequisites / Assumed knowledge
Object-oriented programming skills, Java language, and basic knowledge of traditional database concepts (relational model and SQL language).
Lectures (45 hours)
Introduction to Big data: characteristics, problems, opportunities (3 hours)
Hadoop and its ecosystem: infrastructure and basic components (3 hours)
Map Reduce programming paradigm (10.5 hours)
Spark: Spark Architecture and RDD-based programming paradigm (13. hours)
Spark Steaming: Streaming data analysys (4.5 ore).
Data mining and Machine learning libraries: MLlib (4.5 hours)
Databases for Big data: data models, design, and querying (e.g., HBase) (6 hours)

Laboratory activities (15 hours)
Developing of applications by means of Hadoop and Spark (15 hours)