KEYWORD |
Area Engineering
Enhancing Proficiency and Reasoning in Italian Vision Language Models
keywords ARTIFICIAL INTELLIGENCE, LARGE LANGUAGE MODEL, LARGE LANGUAGE MODELS, MULTIMODAL LANGUAGE MODELS, NATURAL LANGUAGE PROCESSING, NATURAL LANGUAGE UNDERSTANDING, VISION LANGUAGE MODELS
Reference persons FLAVIO GIOBERGIA, ELIANA PASTOR
External reference persons Bartolomeo Bocchino
Marco Bergero
Serena Gabbio
Matteo Abbamonte
Description The thesis work is centered on the critical need for enhancing local Vision Language Models (VLMs) and Multimodal Large Language Models (MLLMs) with a specific focus on improving their proficiency and reasoning capabilities for Vision-based tasks in the Italian language. This is particularly important given that the original training of such models is predominantly in English, which can lead to suboptimal performance and understanding in non-English contexts.
The necessity for local models arises from the limitations of relying solely on third-party APIs, often in English, which may not adequately cater to the linguistic nuances and cultural references present in Italian or other non-English languages. Moreover, the dependency on such APIs can be financially prohibitive and may raise concerns regarding data privacy and sovereignty.
Small and open MLLMs and VLMs offer a viable solution by enabling customization and adaptation to specific linguistic and cultural requirements. By focusing on improving these models, the thesis aims to bridge the proficiency and reasoning gaps in Vision-based tasks, ensuring that these models can effectively serve Italian-speaking communities in various sectors such as education, healthcare, and local governance. This approach not only enhances the accessibility and relevance of AI technologies but also promotes a more inclusive and equitable distribution of these capabilities across different linguistic demographics.
To achieve the goal several potential solutions can be explored, like the translation of an existing dataset, the creation fo a new Italian Dataset or the adaptation/fine-tuning of an open Model (if a dataset has been prepared).
Required skills ML & AI Fundamentals: Basic understanding of machine learning, AI, and neural networks
NLP & CV Basics: Knowledge of natural language processing and computer vision principles
Programming Skills: Proficient in Python and familiar with AI-related libraries like PyTorch, Huggingface, Tensorflow
Italian Language Competence: Good grasp of Italian language and culture for dataset accuracy
Data Handling: Skills in data preprocessing and management for model training
Model Evaluation: Understanding of model evaluation techniques and biases
Research Skills: Ability to conduct literature reviews, formulate questions, and report findings
Deadline 06/11/2025
PROPONI LA TUA CANDIDATURA