KEYWORD |
Trade-offs in accuracy, privacy and fairness with generative AI-based synthetic data production
keywords ALGORITHM FAIRNESS, ARTIFICIAL INTELLIGENCE, DATA ETHICS, DATA QUALITY, DATA SCIENCE, EXPLAINABLE AI, GENERATIVE AI, HUMAN-COMPUTER INTERACTION, SOFTWARE ENGINEERING, SYNTHETIC DATA
Reference persons ANTONIO VETRO'
External reference persons Marco Rondina (https://nexa.polito.it/people/mrondina)
Research Groups DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA
Thesis type RESEARCH / EXPERIMENTAL, RESEARCH, INNOVATIVE
Description Synthetic data generation is fundamental in contexts of data scarcity or low economical resources to collect data. However, several challenges are still open in this research field, the most important being the trade-off between privacy, fairness and accuracy. The goal of this thesis proposal is to design, develop and test new generative models for synthetic data production able to preserve privacy, guarantee fairness and good levels of accuracy. The focus will be mostly on tabular data because this type of data is used in most of the applications where fairness and privacy are paramount (for example, allocating social benefits or economic resources).
The new developed techniques will be compared with state-of-art generative models (e.g., language models, variational autoencoder, generative adversarial network, diffusion models, self-supervised learning, etc. ) and with traditional probabilistic methods for dataset generation (e.g, Bayesan networks, univariate kernel density estimation, etc.) on a variety of evaluation measures, such as: distance from original population; differential privacy; imbalance; fairness and accuracy of models trained and tested on generated data.
Considering the different (and mostly contrasting) dimensions of evaluation above mentioned, the high-level research questions are:
RQ 1. How to improve state of art generative models for synthetic data generation in a way that they can preserve privacy and fairness while maintaining acceptable levels of accuracy?
RQ 2. What is the trade-off (between privacy, fairness, and accuracy) reached by the newly developed techniques in comparison of:
• 1.1 state of art generative-AI models?
• 1.2 traditional probabilistic methods?
Required skills Good programming skills . Proficiency in data analysis techniques and tools. Basic knowledge of AI techniques. Research aptitude and curiosity to cross disciplinary boundaries. Grade point average equal to or higher than 26 will play a relevant role in the selection.
Notes When sending your application, we kindly ask you to attach the following information:
- list of exams taken in you master degree, with grades and grade point average
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Deadline 01/03/2025
PROPONI LA TUA CANDIDATURA