Servizi per la didattica
PORTALE DELLA DIDATTICA

Bioinformatics

05OVFSM

A.A. 2020/21

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino

Course structure
Teaching Hours
Lezioni 30
Esercitazioni in laboratorio 50
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Ficarra Elisa Tutore esterno dottorato   30 0 0 0 8
Co-lectuers
Espandi

Context
SSD CFU Activities Area context
ING-INF/05 8 B - Caratterizzanti Ingegneria informatica
2020/21
Hardware/Software solutions will be studied for the analysis of genetic data provided by the latest generation biotechnologies (e.g. DNA/RNA next generation sequencers, nanotechnology, etc.). During the course, it will be described the state of the art of such technologies, and it will be deeply studied computational and algorithmic issues for the development of tool-flows for complex genetic analyses (such as genetic mutations and aberrations). It will explain the application of Machine Learning and deep learning techniques (e.g. Random Forest, Neural Networks, CNNs, RNNs, LSTM) to biological and medical problems. Moreover, it will be presented also techniques for genetic data distributed computing. In particular, it will be introduced the Clustering programming. During the course, the basic concepts of the molecular biology will be introduced. The purpose of the course is therefore to provide training in order to make students experts of the biomolecular/genetic issues, technologies and processing techniques the most advanced in the field of biotechnology and genetic analysis.
Software solutions will be studied for the analysis of genetic and molecular data. Heterogeneous sources of data will be explored, ranging from the latest biotechnologies for DNA sequencing to molecular/cellular and medical images. During the course, it will be deeply studied computational and algorithmic challenges and innovative solutions for complex genetic analyses and reliable predicting models for prognosis and therapeutic response. Algorithmic issues related to heterogeneous data integration will also be explored. To all of these purposes, machine learning and deep learning techniques will be applied and designed. The course will offer several examples of real applications exploiting techniques from several disciplines such as statistics, text mining, pattern recognition, machine learning, and deep learning operating on a broad set of methods and models (e.g., CNNs, Baesyan CNNs, Graph Neural Networks, graph convolutional neural networks (GCNN), Attention, Autoencoders, GAN, multiplex communities and multilayer networks, etc.). The purpose of the course is, therefore, twofold. First, it will provide training to make students experts of the technologies and processing techniques the most advanced in predictive and personalized genetic and images analysis. Second, the course will provide a wide range of case studies allowing the students to face general problems transcending genetic and medical applications, such as heterogeneous data integration, data cleaning and predictive model optimization, hidden features discovery and characterization, interpretability of predictive models.
The student should acquire i) the knowledge of the latest generation biotechnologies for genetic and molecular screening, ii) the knowledge of some the most up-to-date genetic issues in the personalised medicine approach, iii) the knowledge of the main SW solutions for complex bioinformatics analyses, and of computer science techniques such as machine learning, deep learning, text mining, graph optimization, iv) the ability to design and implement effective and computationally efficient algorithmic solutions for biological problems, vi) the experience on SW optimization techniques on cluster infrastructures.
The student will acquire: - the comprehension of computational techniques for sequencing data analysis based on text mining and pattern matching techniques, as well as mathematical optimization and graph theory. - the comprehension of image processing and computer vision techniques for feature description and extraction, predictive model design and reliability assessment. - the comprehension of new deep learning approaches, such as Graph Neural Networks, graph convolutional neural networks (GCNN), Attention, Transformer Neural Networks, Baesyan CNNs. - the comprehension of heterogeneous data integration issues, and data integration methodologies exploiting statistics, multiplex communities and multilayer networks, deep learning. In this contest, the student will learn not only how existing methodologies work but also how to design new architectures and algorithms. - the comprehension of the latest generation of biotechnologies for genetic and molecular screening and of the main SW approaches for complex bioinformatics analyses. Moreover, the student - will learn how to adapt and transfer advanced AI methodologies into genetic studies, bioimage analysis and clinical applications. - will learn how to design and implement reliable algorithmic solutions for biological problems in the contest of Complex Systems. - will experience SW optimization techniques on cluster infrastructures. The course will give to the student both theoretical competence and experience to be exploited in general applications, also beyond genetic and medical fields. Finally, computational biology for complex diseases understanding is, in fact, a very challenging domain that allows the student to really understand pros and cons of several AI, statistical and computational approaches.
High level language computer programming (eg C, C + + or Java), and optionally scripting languages.
High-level language computer programming (e.g., python, C, C + + or Java, Matlab), and optionally scripting languages. Basic machine learning and deep learning concepts.
- Introduction to the Bioinformatics: Concepts of Molecular Biology, Computational, technological and efficacy requirements of the algorithms, Relevant problems in research, industry and businesses - DNA-, microRNA- and RNA-sequencing: Description of sequencing technologies, algorithmic and computational issues, main tools used for sequencing and data analysis, issues related to software development for advanced analyses, SW optimization on parallel and distributed infrastructures. - Bioinformatics techniques for the study and the prediction of regulatory processes: SW techniques for the prediction of molecular interactions, data integration and correlation, derivation of regulatory networks in Complex Systems. - Machine Learning and Deep Learning techniques: introduction to the application of some well known and used methodologies (e.g. Neural Networks, Random Forest, Convolution Neural Network, Recurrent Neural Networks, LSTM, etc.) to genetic and biological studies. - Cluster programming: implementation of analysis pipelines on computer clusters. Job scheduling, decomposition and parallelization, optimization of computational resources.
- Introduction to the Bioinformatics: Concepts of Molecular Biology, Computational; technological and efficacy requirements of the algorithms; Relevant problems in research, industry and businesses - DNA- and RNA-sequencing: Description of sequencing technologies, algorithmic and computational issues, methods and tools for DNA/RNA sequencing data analysis (exploitation of text mining and pattern matching techniques, as well as mathematical optimization and graph theory). - Radiomics and biological image characterization and analysis: 2D and 3D texture descriptors; shallow and deep learning methods for the extraction of distinctive and hidden features from fluorescent, histological and medical images; design and optimization of predictive models; methods and architecture for predictive model interpretability (general applications beyond medical applications). - Deep Learning techniques: translation of advanced AI methodologies (e.g. Convolution Neural Network, Baesyan CNNs, Graph Neural Networks, graph convolutional neural networks (GCNN), Attention, Autoencoders, GAN, LSTM) into genetic studies and clinical applications. Design of new architectures for distinctive features extraction, predictive models, interpretability and knowledge retrieval. - Techniques for the integration of heterogeneous data: heterogeneous data integration issues; heterogeneous genetic and omics data analysis; models and architectures for data integration (beyond medical applications too). - Statistics and multiplex communities and multilayer networks for Complex Systems.
The course will include exercises and computer lab sessions on some of studied tools and on the development of new SW solutions.
The course will consist of half of theoretical lectures and half of laboratory sessions. LECTURES: - the lessons will be always video recorded - they will be provided for both off-line and on-line study, according to the directives of the rector - Virtual Classroom on the Polito portal or other platforms, such as Zoom of Teams, will be used for virtual learning - social networks, such as Telegram, will be used to get in touch students and teaching staff, and share information. LAB: - the course will include exercises and computer lab sessions on some of the studied algorithms and on the development of new SW solutions and architectures - the most used language will be Python, but also, optionally, Matlab, C, C++. - lab will be provided both onsite and on online platforms provided by the Politecnico, according to the rectoral directives.
Course slides, scientific research papers, web documents and short educational movies.
- Course slides, - scientific research papers, - web documents - short educational movies. - optionally, ML and deep learning books, such as i) "Deep Learning" by Ian Goodfellow, Yoshua Bengio and Aaron Courvill, MIT Press; ii) "Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence" (Addison-Wesley Data & Analytics Series) by Jon Krohn, Grant Beyleveld, AglaÚ Bassens. Addison-Wesley Professional; 1 edition (August 5, 2019); iii) others suggested during the course.
ModalitÓ di esame: Prova orale obbligatoria; Prova scritta su carta con videosorveglianza dei docenti; Elaborato progettuale individuale; Elaborato progettuale in gruppo;
Exam: written test; individual or group project; oral discussion of projects Alternatively, final written test or research project. In both the cases, previously detailed learning outcomes should be evaluated (see in particular points i), ii), iii) and iv)) In details: Written test on all the course topics (max score 30L/30): - It consists of 2 questions about course topics, and 2 exercises based on Python programming - Duration: 2h. - Use of notes, course slides, handbooks, Python examples or exercises is forbidden Project (max score 30L/30): - Project groups should be 3-4 members. - There are no midterm deadlines for the project completion - Oral presentation is about the project development, contest, issues, and results (slides). Questions concern the project and related topics covered in the course. A demo is also required. - User manual should be provided to the professor, as well as developed code, no later than two days before the project presentation
Exam: Compulsory oral exam; Paper-based written test with video surveillance of the teaching staff; Individual project; Group project;
The exam can be taken by choosing one of the following two modalities: - written test (both theoretical and programming part); - individual or group project with an oral discussion about projects. In details, alternatively: 1) The written test will cover all the course topics (max score 30L/30L): - It consists of 2 open questions about course topics, and 2 exercises based on Python programming - Duration: 2h. - Use of notes, course slides, handbooks, Python examples or exercises will be forbidden In the case of an online exam, the written test will be performed through video surveillance on the course Virtual Classroom platform. The session will be recorded to keep track of the exam. 2) Individual or group Project (max score 30L/30): - the description of available projects will be provided by the teaching staff at the beginning of December - the available projects will span from analytical, design and implementation/optimization goals depending on the specific project. According to the specific complexity of the project, limitations on the maximum number of students in the group will be imposed by the teaching staff. - there will be no midterm deadlines for the project completion - projects can be developed without time constraints - the projects will be discussed in an oral presentation; the proper date for the presentation will be decided in agreement with the student Preparation for the project presentation and discussion: - the oral presentation will be about the project development, contest, issues, and results (slides). Questions will concern the project development, the results and theoretical related topics covered during the course. A demo will be also required. - A user manual about technical issues, as well as developed code and results, will be provided to the teaching staff no later than three days before the project presentation In case of online exam, the oral presentation will take place through the course Virtual classroom of other platform such as Zoom or Teams. The session will be recorded to keep track of the exam.
ModalitÓ di esame: Prova scritta (in aula); Prova orale obbligatoria; Prova scritta su carta con videosorveglianza dei docenti; Elaborato progettuale individuale; Elaborato progettuale in gruppo;
The exam can be taken by choosing one of the following two modalities: - written test (both theoretical and programming part); - individual or group project with an oral discussion about projects. In details, alternatively: 1) The written test will cover all the course topics (max score 30L/30L): - It consists of 2 open questions about course topics, and 2 exercises based on Python programming - Duration: 2h. - Use of notes, course slides, handbooks, Python examples or exercises will be forbidden In the case of an online exam, the written test will be performed through video surveillance on the course Virtual Classroom platform. The session will be recorded to keep track of the exam. 2) Individual or group Project (max score 30L/30): - the description of available projects will be provided by the teaching staff at the beginning of December - the available projects will span from analytical, design and implementation/optimization goals depending on the specific project. According to the specific complexity of the project, limitations on the maximum number of students in the group will be imposed by the teaching staff. - there will be no midterm deadlines for the project completion - projects can be developed without time constraints - the projects will be discussed in an oral presentation; the proper date for the presentation will be decided in agreement with the student Preparation for the project presentation and discussion: - the oral presentation will be about the project development, contest, issues, and results (slides). Questions will concern the project development, the results and theoretical related topics covered during the course. A demo will be also required. - A user manual about technical issues, as well as developed code and results, will be provided to the teaching staff no later than three days before the project presentation In case of online exam, the oral presentation will take place through the course Virtual classroom of other platform such as Zoom or Teams. The session will be recorded to keep track of the exam.
Exam: Written test; Compulsory oral exam; Paper-based written test with video surveillance of the teaching staff; Individual project; Group project;
The exam can be taken by choosing one of the following two modalities: - written test (both theoretical and programming part); - individual or group project with an oral discussion about projects. In details, alternatively: 1) The written test will cover all the course topics (max score 30L/30L): - It consists of 2 open questions about course topics, and 2 exercises based on Python programming - Duration: 2h. - Use of notes, course slides, handbooks, Python examples or exercises will be forbidden In the case of an online exam, the written test will be performed through video surveillance on the course Virtual Classroom platform. The session will be recorded to keep track of the exam. 2) Individual or group Project (max score 30L/30): - the description of available projects will be provided by the teaching staff at the beginning of December - the available projects will span from analytical, design and implementation/optimization goals depending on the specific project. According to the specific complexity of the project, limitations on the maximum number of students in the group will be imposed by the teaching staff. - there will be no midterm deadlines for the project completion - projects can be developed without time constraints - the projects will be discussed in an oral presentation; the proper date for the presentation will be decided in agreement with the student Preparation for the project presentation and discussion: - the oral presentation will be about the project development, contest, issues, and results (slides). Questions will concern the project development, the results and theoretical related topics covered during the course. A demo will be also required. - A user manual about technical issues, as well as developed code and results, will be provided to the teaching staff no later than three days before the project presentation In case of online exam, the oral presentation will take place through the course Virtual classroom of other platform such as Zoom or Teams. The session will be recorded to keep track of the exam.
Esporta Word


© Politecnico di Torino
Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY
Contatti