KEYWORD |
Application of advanced data analysis techniques to the ContrattiPubblici.org portal
Thesis in external company
keywords DATA ANALYTICS, DATA VISUALIZATION, MACHINE LEARNING, NATURAL LANGUAGE PROCESSING, SEMANTIC TECHNOLOGIES
Reference persons ANTONIO VETRO'
Research Groups DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA
Thesis type SOFTWARE DEVELOPMENT, START UP
Description ContrattiPubblici.org (https://contrattipubblici.org/) is a the largest search engine and business intelligence portal on italian public contracts (including calls for tenders) by the Italian public administration. With more than 55 million contracts that connect more than 1 million suppliers and more than 30 thousand contracting authorities, it is the largest knowledge base on public procurement in Italy. A team of about 10 people is constantly working on ContrattiPubblici.org, improving a variety of aspects, from the data collection pipeline (e.g. crawling and scraping), through data cleaning and data quality assessment, data enrichment through machine learning processes (e.g. classification, information extraction), to the front-end (both search engine and data viz).
The activities on which a thesis work would be possible (to be agreed with the candidate) are the following:
- Scraping and extracting information from government sites;
- Writing and optimizing ETL pipelines on our database;
- Analysis of the quality of data published by the public administration concerning public spending;
- Extraction of information from domain texts using the most modern language models and machine learning techniques;
- Study and implementation of data visualizations to support our users
The thesis will be carried out at Synapta S.r.l. (https://synapta.it/)
See also https://synapta.it/
Notes When sending your application, we kindly ask you to attach the following information:
- list of exams taken in you master degree, with grades
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Deadline 18/01/2023
PROPONI LA TUA CANDIDATURA