Politecnico di Torino
Politecnico di Torino
   
Login  
it
Politecnico di Torino
Academic Year 2017/18
01QYDOV, 01QYDBH, 01QYDNG, 01QYDOQ, 01QYDPE
Big data: architectures and data analytics
Master of science-level of the Bologna process in Computer Engineering - Torino
Master of science-level of the Bologna process in Ict For Smart Societies - Torino
Master of science-level of the Bologna process in Mathematical Engineering - Torino
Espandi...
Teacher Status SSD Les Ex Lab Years teaching
Garza Paolo ORARIO RICEVIMENTO RB ING-INF/05 40 5 15 3
Garza Paolo ORARIO RICEVIMENTO RB ING-INF/05 40 5 15 3
SSD CFU Activities Area context
ING-INF/05 6 B - Caratterizzanti Ingegneria informatica
Subject fundamentals
In the big data era traditional data management and analytic systems are no more adequate. Hence, to manage and fruitfully exploit the huge amount of available heterogeneous data, novel data models, programming paradigms, information systems, and network architectures are needed.
The course addresses the challenges arising in the Big Data era. Specifically, the course will cover how to collect, store, retrieve, and analyze big data to mine useful knowledge and insightful hints. The course covers not only data model and data analytics aspects but also novel programming paradigms (e.g., MapReduce, Spark RDDs) and discusses how they can be exploit to support big data scientists to extract insights from data.
Expected learning outcomes
The course aims at providing:
• Knowledge of the main problems and opportunities arising in the big data context and technological characteristics of the infrastructures and distributed systems used to deal with big data (e.g., Hadoop and Spark).
• Ability to write distributed programs to process and analyze data by means of novel programming paradigms: Map Reduce and Spark programming paradigms
• Knowledge of the (relational and non-relational) databases systems that are used to store big data
Prerequisites / Assumed knowledge
Object-oriented programming skills, Java language, and basic knowledge of traditional database concepts (relational model and SQL language).
Contents
Lectures (45 hours)
• Introduction to Big data: characteristics, problems, opportunities (3 hours)
• Hadoop and its ecosystem: infrastructure and basic components (3 hours)
• Map Reduce programming paradigm (10.5 hours)
• Spark: Spark Architecture and RDD-based programming paradigm (13. hours)
• Spark Steaming: Streaming data analysys (4.5 ore).
• Data mining and Machine learning libraries: MLlib (4.5 hours)
• Databases for Big data: data models, design, and querying (e.g., HBase) (6 hours)

Laboratory activities (15 hours)
• Developing of applications by means of Hadoop and Spark (15 hours)
Delivery modes
The course consists of Lectures (45 hours) and Laboratory sessions (15 hours). The laboratory sessions are focused on the main topics of the course (Map Reduce, Spark, and MLlib) (15 hours). The Laboratory sessions allow experimental activities on the most widespread open-source products.
Texts, readings, handouts and other learning resources
Reference books:
• Tom White. Hadoop, The Definitive Guide. (Third edition). O’Reilly, Yahoo Press, 2012.
• Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly, 2015.
• Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills. Advanced Analytics with Spark. O’Reilly, 2014.

Copies of the slides used during the lectures, examples of written exams and exercises, and manuals for the activities in the laboratory will be made available. All teaching material is downloadable from the course website or the Portal.
Assessment and grading criteria
The exam consists of a written exam that lasts 2 hours.

The written exam is composed of two parts:
- 2 programming exercises (Map Reduce- and RDDs-based programming) to be solved using the Java language (27 points)
- 2 multiple choice questions on all the topics addressed during the course (4 points).

The evaluation of the programming exercises is based on the correctness and efficiency of the proposed solutions.
The multiple choice questions aim at evaluating the knowledge of the theoretical concepts of the course.

The exam is open book (notes and books can be used during the exam).

The exam is passed if the mark of the written exam is greater than or equal to 18 points.

Programma definitivo per l'A.A.2017/18
Back



© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY
WCAG 2.0 (Level AA)
Contatti