Politecnico di Torino | Servizi per la didattica

KEYWORD

Just learn the manual: teaching Bash to a Large Language Model

Parole chiave AI, CYBERSECURITY, EXPLANATION, LANGUAGE MODELS, MACHINE LEARNING, PROGRAMMING LANGUAGES

Riferimenti esterni Matteo Boffa - PhD student - matteo.boffa@polito.it

Gruppi di ricerca DATABASE AND DATA MINING GROUP - DBDM, SmartData@PoliTO, Telecommunication Networks Group

Tipo tesi RICERCA

Descrizione According to the constructivist philosophy, humans learn by connecting new experiences to prior knowledge. This thesis aims to investigate whether this principle also applies to Large Language Models (LLMs). Specifically, instead of relying on massive training datasets as is common in current approaches, this work focuses on leveraging the natural language understanding capabilities acquired by LLMs during their pre-training phase to guide fine-tuning on a smaller, high-quality Bash corpus.
Our goal is to mimic the process humans use when learning a new programming language: associating Bash commands and flags with their explanations from the Bash manual. By enforcing semantic understanding during learning, we will evaluate the model's ability to generalize its understanding to entire Bash sessions.
The student is required to first create a dataset of Bash sessions paired with their corresponding parsed semantic descriptions. Using this dataset, they will train an encoder that aligns natural language with Bash code. Finally, the student will assess the trained model's ability to produce global, human-readable explanations for specific Bash sessions.

See as an example starting point: https://explainshell.com/ and NLP2Bash https://arxiv.org/abs/1802.08979

Conoscenze richieste Python
Machine Learning
Basics of deep learning techniques or language models
Basics of shell scripts

Note Students with at least an average grade of 27/30 will be preferred

Scadenza validita proposta 14/01/2026 PROPONI LA TUA CANDIDATURA