Politecnico di Torino | Servizi per la didattica

KEYWORD

Just learn the manual: teaching Bash to a Large Language Model

keywords AI, CYBERSECURITY, EXPLANATION, LANGUAGE MODELS, MACHINE LEARNING, PROGRAMMING LANGUAGES

Reference persons MARCO MELLIA, LUCA VASSIO

External reference persons Matteo Boffa - PhD student - matteo.boffa@polito.it

Research Groups DATABASE AND DATA MINING GROUP - DBDM, SmartData@PoliTO, Telecommunication Networks Group

Thesis type RICERCA

Description According to the constructivist philosophy, humans learn by connecting new experiences to prior knowledge. This thesis aims to investigate whether this principle also applies to Large Language Models (LLMs). Specifically, instead of relying on massive training datasets as is common in current approaches, this work focuses on leveraging the natural language understanding capabilities acquired by LLMs during their pre-training phase to guide fine-tuning on a smaller, high-quality Bash corpus.
Our goal is to mimic the process humans use when learning a new programming language: associating Bash commands and flags with their explanations from the Bash manual. By enforcing semantic understanding during learning, we will evaluate the model's ability to generalize its understanding to entire Bash sessions.
The student is required to first create a dataset of Bash sessions paired with their corresponding parsed semantic descriptions. Using this dataset, they will train an encoder that aligns natural language with Bash code. Finally, the student will assess the trained model's ability to produce global, human-readable explanations for specific Bash sessions.

See as an example starting point: https://explainshell.com/ and NLP2Bash https://arxiv.org/abs/1802.08979

Required skills Python
Machine Learning
Basics of deep learning techniques or language models
Basics of shell scripts

Notes Students with at least an average grade of 27/30 will be preferred

Deadline 14/01/2026 PROPONI LA TUA CANDIDATURA