KEYWORD |
Empirical investigation of the usage of large pre-trained models in open source projects
Thesis in external company
keywords ARTIFICIAL INTELLIGENCE, BIG DATA ANALYSIS, BIG TECH, DATA MINING, MACHINE LEARNING, NATURAL LANGUAGE PROCESSING, OPEN SOURCE
Reference persons ANTONIO VETRO'
Research Groups DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA
Thesis type DATA ANALYSIS, DATA MINING, OPEN SOURCE SOFTWARE, RESEARCH / EXPERIMENTAL, SOFTWARE DEVELOPMENT
Description Large-scale pre-trained models (such as BERT and GPT ) brought relevant achievements in the field of natural language processing (NLP). However, they also created high dependence from the few actors in the world that have the resources to build them, creating a large divide between big tech corporation and elite universities and other companies/universities of medium and smaller size. The main goal of the thesis is to mine open source repositories to understand the level of diffusion ( and thus dependence on) of the most common large-scale pre-trained models.
Initial readings include: https://arxiv.org/pdf/2003.08271.pdf , https://doi.org/10.1016/j.aiopen.2021.08.002 , https://www.youtube.com/watch?v=Fmi3fq3Q3Bo
Required skills The thesis requires very good development skills, knowledge of fundamental NLP and ML techniques, knowledge of how to interact with repositories (e.g., git)
Notes When sending your application, we kindly aask you to attach the following information:
- list of exams taken in you master degree, with grades
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Deadline 13/05/2023
PROPONI LA TUA CANDIDATURA