KEYWORD |
Design, Implementation and Analysis of a Monitoring System for a Big Data Cluster
Parole chiave BIG DATA ANALYSIS, MONITORING, HADOOP, SPARK
Riferimenti MARTINO TREVISAN
Riferimenti esterni Idilio Drago (Prof. Università di Torino)
Gruppi di ricerca SmartData@PoliTO
Descrizione Large quantities of data are typically processed on Big Data clusters, sets of servers that are configured to operate coordinately. The so-called Big Data clusters run various frameworks to orchestrate the resources. The most popular are Apache Hadoop, Spark and Kubernetes.
To ensure the proper operation of a Big Data cluster it is fundamental to collect, process and analyze the big amount of log files generated by servers, agents and applications. The goal of the thesis is to design and implement a system to collect the telemetry of the Big Data cluster of Politecnico di Torino. Once data are collected, they shall be analyzed using state-of-the-art Data Science approaches to mine rules with the goal of unveiling and understanding malfunctionings and misconfigurations.
Conoscenze richieste Linux systems
Data Science
Big Data
Basic Networking
Scadenza validita proposta 04/10/2022
PROPONI LA TUA CANDIDATURA