PORTALE DELLA DIDATTICA

Ricerca CERCA
  KEYWORD

Clustering di pagine web per esperimenti realistici

Parole chiave CLUSTERING, MACHINE LEARNING, WEB

Riferimenti DANILO GIORDANO, MARTINO TREVISAN, LUCA VASSIO

Gruppi di ricerca SmartData@PoliTO, Telecommunication Networks Group

Tipo tesi EXPERIMENTAL THESIS

Descrizione Experimenting networked systems is fundamental for the development of novel techniques, assessing the impact of design choices and improve users' Quality of Experience. Testing the Web is typically done using lists of popular websites -- e.g., the Alexa rank (https://www.alexa.com/topsites), which however only offer a list of homepages of the target websites. This is a strong limitation, as websites are known to have a diverse webpage structure depending for example, on the subsections in which content is organized. The goal of this thesis is to develop a system able to select a subset of the pages of a website so that they are representative of the diversity of the internal structure. To this end, it is necessary to leverage Data Science and Machine Learning techniques, clustering among all, to group together similar pages and choose the right (and right number of) representatives. Using open datasets, and collecting additional if needed, the student will apply Machine Learning tools to achieve this goal, using Big Data approaches if the size of the dataset becomes large.

Conoscenze richieste Machine Learning and Data Science
Python programming, using Scikit Learn and Pandas libraries
Networking fundamentals: HTTP, HTML, TCP


Scadenza validita proposta 29/07/2021      PROPONI LA TUA CANDIDATURA




© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY
Contatti