KEYWORD |
Just learn the manual: teaching Bash to a Large Language Model
Parole chiave AI, CYBERSECURITY, EXPLANATION, LANGUAGE MODELS, MACHINE LEARNING, PROGRAMMING LANGUAGES
Riferimenti MARCO MELLIA, LUCA VASSIO
Riferimenti esterni Matteo Boffa - PhD student - matteo.boffa@polito.it
Gruppi di ricerca DATABASE AND DATA MINING GROUP - DBDM, SmartData@PoliTO, Telecommunication Networks Group
Tipo tesi RICERCA
Descrizione According to the constructivist philosophy, humans learn by connecting new experiences to prior knowledge. This thesis aims to investigate whether this principle also applies to Large Language Models (LLMs). Specifically, instead of relying on massive training datasets as is common in current approaches, this work focuses on leveraging the natural language understanding capabilities acquired by LLMs during their pre-training phase to guide fine-tuning on a smaller, high-quality Bash corpus.
Our goal is to mimic the process humans use when learning a new programming language: associating Bash commands and flags with their explanations from the Bash manual. By enforcing semantic understanding during learning, we will evaluate the model's ability to generalize its understanding to entire Bash sessions.
The student is required to first create a dataset of Bash sessions paired with their corresponding parsed semantic descriptions. Using this dataset, they will train an encoder that aligns natural language with Bash code. Finally, the student will assess the trained model's ability to produce global, human-readable explanations for specific Bash sessions.
See as an example starting point: https://explainshell.com/ and NLP2Bash https://arxiv.org/abs/1802.08979
Conoscenze richieste Python
Machine Learning
Basics of deep learning techniques or language models
Basics of shell scripts
Note Students with at least an average grade of 27/30 will be preferred
Scadenza validita proposta 14/01/2026
PROPONI LA TUA CANDIDATURA