Politecnico di Torino | Servizi per la didattica

KEYWORD

Integrating large-scale language models with knowledge graph to create natural language processing based biomedical tools

azienda Thesis in external company

keywords BIOMEDICAL TOOLS, KNOWLEDGE GRAPHS, LARGE-SCALE LANGUAGE MODELS, MACHINE LEARNING, NAMED ENTITY DISAMBIGUATION, RECOMMENDER SYSTEMS, SEMANTIC TECHNOLOGIES, SEMANTIC WEB

Reference persons ANTONIO VETRO'

External reference persons Giovanni Garifo (giovanni.garifo@polito.it) , Giuseppe Futia (giuseppe.futia@gmail.com)

Research Groups DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA

Thesis type EXPERIMENTAL / DEVELOPMENT, EXPERIMENTAL, IN COMPANY, SOFTWARE DEVELOPMENT

Description Knowledge Graphs (KGs) are receiving growing interest as a knowledge representation framework to shape the interconnected nature of biomedical data, representing heterogeneous relationships between diagnoses, related treatments, drugs, and associated effects.

The capacity to extract information from textual content and map such information to KGs is a critical task in Natural Language Processing (NLP) for building intelligent advisory systems. More precisely, this task is known as Named Entity Disambiguation (NED), and it allows to map named entities mentioned in the text, e.g., diagnoses and medicines, to the related KG entities, e.g., “Type 2 Diabetes” and “Insulin.”

Despite the widespread success of Large-scale Language Models (LLMs) in NLP tasks, there are many opportunities to leverage the structural knowledge in KG for NED. The thesis will allow the student to develop new approaches in this field by combining hybrid strategies (LLMs + KG) to create useful NLP-based biomedical tools. In particular, the student will investigate which KG features may have the most significant impact in this context.

The thesis proposal is in collaboration with Graph Aware S.r.l. company. The interaction with company will be conducted both remotely and physically at the Nexa Center for Internet & Society.