KEYWORD |
Empirical investigation of the usage of large pre-trained models in open source projects
Parole chiave ARTIFICIAL INTELLIGENCE, BIG DATA ANALYSIS, BIG TECH, DATA MINING, MACHINE LEARNING, NATURAL LANGUAGE PROCESSING, OPEN SOURCE
Riferimenti ANTONIO VETRO'
Gruppi di ricerca DAUIN - GR-16 - SOFTWARE ENGINEERING GROUP - SOFTENG, DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA
Tipo tesi DATA ANALYSIS, DATA MINING, OPEN SOURCE SOFTWARE, RESEARCH, SVILUPPO SOFTWARE
Descrizione Large-scale pre-trained models (such as BERT and GPT ) brought relevant achievements in the field of natural language processing (NLP). However, they also created high dependence from the few actors in the world that have the resources to build them, creating a large divide between big tech corporation and elite universities and other companies/universities of medium and smaller size. The main goal of the thesis is to mine open source repositories to understand the level of diffusion ( and thus dependence on) of the most common large-scale pre-trained models.
Initial readings include: https://arxiv.org/pdf/2003.08271.pdf , https://doi.org/10.1016/j.aiopen.2021.08.002 , https://www.youtube.com/watch?v=Fmi3fq3Q3Bo
Conoscenze richieste The thesis requires very good development skills, knowledge of fundamental NLP and ML techniques, knowledge of how to interact with repositories (e.g., git). Grade point average equal to or higher than 26 can play a relevant role in the selection.
Note When sending your application, we kindly ask you to attach the following information:
- list of exams taken in you master degree, with grades and grade point average
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Scadenza validita proposta 30/11/2023
PROPONI LA TUA CANDIDATURA