KEYWORD |
Epigenetic mechanisms in cancer development
keywords COMPLEX NETWORKS, MACHINE LEARNING, STATISTICS, SYSTEMS BIOLOGY
Reference persons ALFREDO BENSO, GIANFRANCO MICHELE MARIA POLITANO
External reference persons Dr. Sandro Gambino - Ospedale di Rivoli
Research Groups DAUIN - GR-19 - SYSTEM BIOLOGY GROUP - SYSBIO
Thesis type APPLIED RESEARCH
Description Epigenetic mechanisms, such as DNA methylation (methylation of the CpG islands and shores DNA regions), have gained more and more attention in the last years as regulators of gene expression.
Loss of regulation of a molecular pathway is the critical event in the etiology of cancer.
The first issue to resolve is that the expression of any given gene in a subtype of cancer is likely to be affected in only a small fraction of individuals, since there are many potential genes that may drive pathway deregulation. (Ochs, 2014- articolo su ieee).
This fact make the outlier statistic of great use for ranking genes in the omic analysis (Wei, 2021).
An outlier could be generally defined as being “an observation (or subset of observation ) which appears to be inconsistent with the remainder of that set of data’’. Many methods have been proposed to identify such outliers. Statistical approaches are not applicable for data with more variables than observations, tipically high-dimensional data (Zimek, 2018).
Other approaches, including machine learning based methods, have been proposed to identify outliers.
A further issue is that rarely the methylation of a unique CpG could be considered as influencing the expression of a gene.
Co-methylation is defined as the similarity or the strong correlation of methylation signals between CG sites. Within-sample co-methylation refers to methylation patterns between consecutive or nearby sites in one chromosome region (Sun A, 2022).
We propose to develop a model to define outliers in epigenetic datasets based on machine learning or statistic approach and then develop a machine learning approach based on the presence of outliers in gene promoter regions (CpG islands and shores), taking care of aspects such as co-methylation, in order to predict the risk of neoplastic trasformation. A second aim is to identify the main pathways that are involved in the neoplastic transformation.
Required skills Python
Machine Learning
Bioinformatics
Statistics
Deadline 30/11/2024
PROPONI LA TUA CANDIDATURA