KEYWORD |
Statistical modeling of genomic sequences
keywords BIOINFORMATICS, COMPUTATIONAL BIOLOGY
Reference persons ANNA FILOMENA CARBONE, RENATO FERRERO
Research Groups DAUIN - GR-05 - ELECTRONIC CAD & RELIABILITY GROUP - CAD
Thesis type RESEARCH
Description The information content of a genomic sequence can be measured using various information-theoretic measures, such as entropy, mutual information, and compression complexity. For example, the entropy measures the uncertainty or randomness of the sequence, with higher entropy indicating more randomness and lower entropy indicating more predictability. Mutual information can be used to quantify the dependence between different regions of the genome. Compression complexity quantifies the shortest possible description length of the sequence.
The thesis activity concerns the study of different information-theoretic measures capable of providing insights into the functional and evolutionary properties of the genome. These measures will be applied to recognize patterns and regularities in the sequence, identify the functionally important regions of the genome, and study functional relationships between different parts of the genome. The thesis aims at building a statistical model for analyzing and interpreting genomic sequence data.
Required skills programming, probability theory, statistical inference
Deadline 28/03/2025
PROPONI LA TUA CANDIDATURA