KEYWORD |
Just learn the manual: teaching Bash to a Large Language Model
keywords AI, CYBERSECURITY, EXPLANATION, LANGUAGE MODELS, MACHINE LEARNING, PROGRAMMING LANGUAGES
Reference persons MARCO MELLIA, LUCA VASSIO
External reference persons Matteo Boffa - PhD student - matteo.boffa@polito.it
Research Groups DATABASE AND DATA MINING GROUP - DBDM, SmartData@PoliTO, Telecommunication Networks Group
Thesis type RICERCA
Description According to the constructivist philosophy, humans learn by connecting new experiences to prior knowledge. This thesis aims to investigate whether this principle also applies to Large Language Models (LLMs). Specifically, instead of relying on massive training datasets as is common in current approaches, this work focuses on leveraging the natural language understanding capabilities acquired by LLMs during their pre-training phase to guide fine-tuning on a smaller, high-quality Bash corpus.
Our goal is to mimic the process humans use when learning a new programming language: associating Bash commands and flags with their explanations from the Bash manual. By enforcing semantic understanding during learning, we will evaluate the model's ability to generalize its understanding to entire Bash sessions.
The student is required to first create a dataset of Bash sessions paired with their corresponding parsed semantic descriptions. Using this dataset, they will train an encoder that aligns natural language with Bash code. Finally, the student will assess the trained model's ability to produce global, human-readable explanations for specific Bash sessions.
See as an example starting point: https://explainshell.com/ and NLP2Bash https://arxiv.org/abs/1802.08979
Required skills Python
Machine Learning
Basics of deep learning techniques or language models
Basics of shell scripts
Notes Students with at least an average grade of 27/30 will be preferred
Deadline 14/01/2026
PROPONI LA TUA CANDIDATURA