Politecnico di Torino
Politecnico di Torino
Politecnico di Torino
Academic Year 2015/16
Big data: architectures and data analytics
Master of science-level of the Bologna process in Electronic Engineering - Torino
Master of science-level of the Bologna process in Ict For Smart Societies - Torino
Master of science-level of the Bologna process in Mathematical Engineering - Torino
Teacher Status SSD Les Ex Lab Years teaching
Garza Paolo ORARIO RICEVIMENTO RB ING-INF/05 40 5 15 3
SSD CFU Activities Area context
ING-INF/05 6 D - A scelta dello studente A scelta dello studente
ORA-01722: invalid number
Subject fundamentals
In the big data era traditional data management and analytics systems are no more adequate. Hence, to manage and fruitfully exploit the huge amount of available heterogeneous data, novel data models, programming paradigms, information systems, and network architectures are needed.
The course addresses the challenges arising in the Big Data era, mainly from a data prospective. Specifically, the course will cover how to collect, store, retrieve, and analyze big data to mine useful knowledge and insightful hints. The course covers not only data model and data analytics aspects but also novel programming paradigms (e.g., Map Reduce, Spark RDDs), distributed systems (e.g., Hadoop), cloud computing and network infrastructures, and discusses how they can be exploit to support big data scientists to extract insights from data.
Laboratory sessions allow experimental activities on the most widespread open-source products.
Expected learning outcomes
The course aims at providing:
Knowledge of the main problems and opportunities arising in the big data context and technological characteristics of the infrastructures and distributed systems used to deal with big data (e.g., Hadoop).
Ability to write distributed programs to process and analyze data by means of novel programming paradigms: Map Reduce and Spark programming paradigms
Knowledge of non-relational database systems (e.g., Hive and HBase) and ability to design databases based on non-relational data models.
Knowledge of the main characteristics of cloud computing platforms and network infrastructures for Big Data applications.
Prerequisites / Assumed knowledge
Basic programming skills (Java language) and basic knowledge of traditional database concepts (i.e., the relational model and the SQL language).
Lectures (51 hours)
Introduction to Big data: characteristics, problems, opportunities (3 ore)
Hadoop and its ecosystem: infrastructure and basic components (3 ore)
Map Reduce programming paradigm (12 ore)
Spark: Spark Architecture and RDD-based programming paradigm (13.5 ore)
NoSQL databases: data models, design, and querying (Hive and HBase) (9 ore)
Data acquisition: Sqoop, Flume, ... (3 ore)
Data mining and Machine learning libraries: MLlib and Mahout (3 ore)
Cloud computing platforms and Network infrastructures for Big Data applications (4.5 ore)

Laboratory activities (9 hours)
Developing of applications by means of Hadoop, Spark and NoSQL databases (9 ore)