KEYWORD |
DAUIN - GR-09 - GRAphics and INtelligent Systems - GRAINS
Information extraction from semi-structured documents
keywords COMPUTER VISION, DEEP LEARNING, NATURAL LANGUAGE PROCESSING, TRANSFORMERS
Reference persons FABRIZIO LAMBERTI, LIA MORRA
Research Groups DAUIN - GR-09 - GRAphics and INtelligent Systems - GRAINS
Description The main goal of this thesis is to design and implement a proof of concept for extracting information items (such as date, gross amount, and tax code) from scanned semi-structured documents (invoices) in close collaboration with a leading insurance company. The proposed system will exploit recent advances in self-supervised vision-language-layout transformer models that can analyze each element based on their content (graphical and textual) as well as their position and relationship with respect to the rest of a document [1,2]. Starting from a generic pre-trained model on the English language, the system will be fine-tuned to the task of Information Extraction on Italian documents. Performance will be evaluated based on standard and ad-hoc metrics for information extraction reflecting (a) the percentage of documents for which information can be sufficiently extracted and (b) the accuracy of the extracted information. Experience with deep learning and Pytorch is a prerequisite. Good programming and analytical skills are required.
Suggested reading:
[1] Unifying Vision, Text, and Layout for Universal Document Processing. https://arxiv.org/pdf/2212.02623.pdf
[2] OCR-free document understanding transformer. https://arxiv.org/pdf/2111.15664.pdf
Required skills machine learning, deep learning, Python, Pytorch
Deadline 20/02/2024
PROPONI LA TUA CANDIDATURA