KEYWORD |
DAUIN - GR-16 - SOFTWARE ENGINEERING GROUP - SOFTENG
Detecting the risk discrimination in classifiers with imbalance measures
Parole chiave DATA QUALITY, DATA SCIENCE, OPEN DATA, OPEN GOVERNMENT DATA, SOFTWARE ENGINEERING
Riferimenti MARCO TORCHIANO, ANTONIO VETRO'
Gruppi di ricerca DAUIN - GR-16 - SOFTWARE ENGINEERING GROUP - SOFTENG, DAUIN - GR-22 - Nexa Center for Internet & Society - NEXA
Tipo tesi EXPERIMENTAL, RESEARCH, SPERIMENTALE APPLICATA
Descrizione Having imbalanced classes in a training set can impact the outcome of the classification model in several ways.
First, an imbalanced dataset can bias the classification model towards the majority class. Since the model is trained on a dataset with a majority of negative examples, it will be more likely to predict the negative class, even when presented with examples of the positive class. This can lead to poor performance on the minority class, such as a low recall or precision rate.
Second, imbalanced data can also make it more difficult for the model to learn the underlying relationships between the features and the classes. With a large number of negative examples and a small number of positive examples, the model may have difficulty finding the signal in the data that distinguishes the positive class from the negative class. This can lead to suboptimal model performance on both classes.
In both cases, when the objects of automated decision are individuals, such disparate performance of the algorithm means in practice to systematically and unfairly discriminate against certain individuals or groups of individuals in favor of others [by denying] an opportunity for a good or [assigning] an undesirable out- come to an individual or groups of individuals on grounds that are unreasonable or inappropriate.
The goal of the thesis is to test the capability of imbalance measures to predict unfair classifications.
Previous work on the topic and material (a few initial metrics, code, datasets) will be made available.
Conoscenze richieste Good programming skills and basic knowledge of common data analytics tools and techniques. Grade point average equal to or higher than 26 can be a criterion for selection of candidate.
Note When sending your application, we kindly ask you to attach the following information:
- list of exams taken in you master degree, with grades and grade point average
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Scadenza validita proposta 30/11/2024
PROPONI LA TUA CANDIDATURA