PORTALE DELLA DIDATTICA

Ricerca CERCA
  KEYWORD

Empirical investigation of the usage of large pre-trained models in open source projects

keywords ARTIFICIAL INTELLIGENCE, BIG DATA ANALYSIS, BIG TECH, DATA MINING, MACHINE LEARNING, NATURAL LANGUAGE PROCESSING, OPEN SOURCE

Reference persons ANTONIO VETRO'

Research Groups DAUIN - GR-16 - SOFTWARE ENGINEERING GROUP - SOFTENG, DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA

Thesis type DATA ANALYSIS, DATA MINING, OPEN SOURCE SOFTWARE, RESEARCH / EXPERIMENTAL, SOFTWARE DEVELOPMENT

Description Large-scale pre-trained models (such as BERT and GPT ) brought relevant achievements in the field of natural language processing (NLP). However, they also created high dependence from the few actors in the world that have the resources to build them, creating a large divide between big tech corporation and elite universities and other companies/universities of medium and smaller size. The main goal of the thesis is to mine open source repositories to understand the level of diffusion ( and thus dependence on) of the most common large-scale pre-trained models.

Initial readings include: https://arxiv.org/pdf/2003.08271.pdf , https://doi.org/10.1016/j.aiopen.2021.08.002 , https://www.youtube.com/watch?v=Fmi3fq3Q3Bo

Required skills The thesis requires very good development skills, knowledge of fundamental NLP and ML techniques, knowledge of how to interact with repositories (e.g., git). Grade point average equal to or higher than 26 can play a relevant role in the selection.

Notes When sending your application, we kindly ask you to attach the following information:

- list of exams taken in you master degree, with grades and grade point average
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week


Deadline 30/11/2023      PROPONI LA TUA CANDIDATURA




© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY
Contatti