PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

PORTALE DELLA DIDATTICA

Elenco notifiche



Deep natural language processing

01VIXSM

A.A. 2024/25

Course Language

Inglese

Degree programme(s)

Master of science-level of the Bologna process in Data Science And Engineering - Torino

Course structure
Teaching Hours
Lezioni 41
Esercitazioni in aula 18
Esercitazioni in laboratorio 21
Lecturers
Teacher Status SSD h.Les h.Ex h.Lab h.Tut Years teaching
Cagliero Luca   Professore Associato IINF-05/A 41 12 0 0 4
Co-lectures
Espandi

Context
SSD CFU Activities Area context
ING-INF/05 8 B - Caratterizzanti Ingegneria informatica
2024/25
The course aims at introducing the fundamentals of Natural Language Processing, the main Deep Learning solutions for learning word, sentence, and contextualized embeddings (e.g., Word2Vec, GloVe, BERT), and the main NLP applications (e.g., entity recognition, text categorization, intent detection, text summarization)
The course aims at introducing - the fundamentals of Natural Language Processing, - the main Deep Learning solutions for learning word, sentence, and contextualized embeddings (e.g., Word2Vec, BERT), - the fundamentals of recommender systems, - the basics of Natural Language Generation and Large Language Models, - the main NLP applications (e.g., entity recognition, text categorization, intent detection, text summarization).
- knowledge of text preprocessing and transformation techniques. - knowledge of the main Deep Learning architectures for inferring vector representations of text. - knowledge of the key NLP tasks and application contexts (entity recognition, question answering, intent detection, text categorization, machine translation, sentiment analysis). - ability to design and implement a recommender system. - ability to study, design, implement, and test a text summarization algorithm. - Ability to design a full NLP pipeline, including the requirement analysis, methodology design and implementation, performance assessment, and result visualization.
- Knowledge of text preprocessing and transformation techniques. - Knowledge of the main Deep Learning architectures for learning vector representations of text. - Knowledge of the key NLP tasks and application contexts (entity recognition, question answering, intent detection, text categorization, machine translation, sentiment analysis). - Ability to design and implement a recommender system. - Knowledge of the fundamentals of Generative AI applied to text and ability to use Large Language Models. - Ability to study, design, implement, and test a text summarization algorithm. - Ability to use NoSQL databases to store and query textual data. - Ability to design a full NLP pipeline, including the requirement analysis, methodology design and implementation, performance assessment, and result visualization.
Fundamentals of data sciences, machine learning, and deep learning. Basic knowledge of the Python language.
- Fundamentals of data sciences, machine learning, and deep learning. - Basic knowledge of the Python language.
The course covers the following topics: - Natural Language Processing fundamentals: text characteristics, text preparation, topic modelling, overview of the main NLP applications (1.25 cr.) - Vector representations of text: word embedding architectures and shallow sentence embedding architectures (1.25 cr.) - Contextualized embedding e attention mechanism (0.9 cr.) - Entity Recognition, Intent Detection e Question Answering (1.2 cr.) - Text summarization (0.9 cr.) - Machine Translation (0.45 cr.) - Recommender Systems (0.45 cr.) - Application of NO SQL Databases for Information Retrieval: Elastic Search (0.45 cr.) - Text Categorization and Sentiment Analysis (0.6 cr.) - NLP pipeline design: requirement analysis, methodology design and implementation, empirical assessment, outcome presentation (0.6 cr.)
The course covers the following topics: - Natural Language Processing fundamentals: text characteristics, text preparation, topic modelling, overview of the main NLP applications (1.25 cr.) - Vector representations of text: word embedding architectures and shallow sentence embedding architectures (1.25 cr.) - Contextualized embedding e attention mechanism (0.9 cr.) - Entity Recognition, Intent Detection e Question Answering (1.2 cr.) - Text summarization (0.6 cr.) - Natural Language Generation and Large Language Models (0.6 cr.) - Machine Translation (0.4 cr.) - Recommender Systems (0.4 cr.) - Application of NO SQL Databases for Information Retrieval (0.4 cr.) - Text Categorization and Sentiment Analysis (0.4 cr.) - NLP pipeline design: requirement analysis, methodology design and implementation, empirical assessment, outcome presentation (0.6 cr.)
The course includes lectures in the classroom, whose topics are described earlier, and practices on the lecture topics, and in particular text preprocessing, word and contextualized embeddings, NoSQL ElasticSearch DB, text summarization, text categorization, sentiment analysis, and NLP pipeline design (1.8 cr.). Students will develop a teamwork project and deliver a written report. The report will contribute to the final exam grade. The course includes laboratory sessions on text preparation, word and sentence embeddings, intent detection, entity recognition, recommender systems, text summarization, machine translation e chatbot architectures (2.1 cr.). Laboratory sessions allow experimental activities on the most widespread open-source products.
The course includes lectures in the classroom (4.1 cr.), whose topics are described earlier, and practices on the lecture topics (1.8 cr.), and in particular text preprocessing, word and sentence embeddings, Transformers, information retrieval, machine translation, Large Language Models, text summarization, text categorization, recommender systems, and NLP pipeline design. Students will develop a team project and deliver a written report. The report will contribute to the final exam grade. The course includes laboratory sessions (2.1 cr.) on text preparation, word and sentence embeddings, Transformers, intent detection, recommender systems, text summarization, machine translation, Large Language Models, and chatbot architectures. Laboratory sessions allow experimental activities on the most widespread open-source products.
- Neural Network Methods for Natural Language Processing. Graeme Hirst.
Class handouts will be made available through the didactic portal. Additional readings (covering the most relevant course topics): - Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning. Mohammad Taher Pilehvar, Jose Camacho-Collados. Morgan & Claypool. ISBN: 9781636390215 - Neural Network Methods for Natural Language Processing. Yoav Goldberg. Morgan & Claypool. ISBN: 9781627052955 - Deep Learning in Natural Language Processing. Li Deng and Yang Liu Editors. Springer. ISBN: 9789811052088 - Natural Language Processing with Transformers. Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Released February 2022. Publisher(s): O'Reilly Media, Inc. ISBN: 9781098103248
Slides; Libro di testo; Esercizi; Esercizi risolti; Esercitazioni di laboratorio; Esercitazioni di laboratorio risolte; Video lezioni tratte da anni precedenti;
Lecture slides; Text book; Exercises; Exercise with solutions ; Lab exercises; Lab exercises with solutions; Video lectures (previous years);
Modalità di esame: Prova scritta (in aula); Elaborato progettuale in gruppo;
Exam: Written test; Group project;
... The exam comprises two main mandatory parts: 1) a written test on the theoretical aspects introduced during the course (closed and/or open-ended questions) (max. 22 points). 2) the discussion and evaluation of the final report on a team project assigned during the course (max. 10 points). The final score is given by the sum of the points achieved in the written part and in the evaluation of the final report. 30 with honors (30L) will be assigned if the final score is equal to or above 31. Learning objectives assessment: The written part will assess - NLP fundamentals - Embedding models - Recommender Systems - Text categorization and sentiment analysis - Elastic Search - Other NLP applications covered by the course (e.g., Entity Recognition, Intent Detection e Question Answering, Text Summarization, Machine Translation) The team project will assess - the ability to define the requirements and the problem statement. - the ability to study, design, and implement a full NLP pipeline. - the ability to develop and test efficient and effective text analytics solutions, - the ability to setup, run, and collect a sufficiently large set of empirical results, - the ability to present the requirement analysis, the methodology design, and the outcomes of the empirical analyses. In the written part the points assigned to each question/exercise will be clearly indicated next to the question/exercise. Wrong answers may cause a score penalty. Missing answers will receive no penalty. The exam is closed-book. Electronic devices are not allowed.
Gli studenti e le studentesse con disabilità o con Disturbi Specifici di Apprendimento (DSA), oltre alla segnalazione tramite procedura informatizzata, sono invitati a comunicare anche direttamente al/la docente titolare dell'insegnamento, con un preavviso non inferiore ad una settimana dall'avvio della sessione d'esame, gli strumenti compensativi concordati con l'Unità Special Needs, al fine di permettere al/la docente la declinazione più idonea in riferimento alla specifica tipologia di esame.
Exam: Written test; Group project;
The exam comprises two main mandatory parts: 1) a written test on the theoretical aspects introduced during the course (closed and/or open-ended questions) (max. 22 points). 2) the discussion and evaluation of the final report on a team project assigned during the course (max. 10 points). The final score is given by the sum of the points achieved in the written part and in the evaluation of the final report. 30 with honors (30L) will be assigned if the final score is equal to or above 31. Learning objectives assessment: The written part will assess - NLP fundamentals - Embedding models - Recommender Systems - Text categorization and sentiment analysis - Natural Language Generation and Large Language Models - Information Retrieval - Other NLP applications covered by the course (e.g., Entity Recognition, Intent Detection, Question Answering, Text Summarization, Machine Translation) The team project will assess - the ability to define the requirements and the problem statement. - the ability to study, design, and implement a full NLP pipeline. - the ability to develop and test efficient and effective text analytics solutions, - the ability to setup, run, and collect a sufficiently large set of empirical results, - the ability to present the requirement analysis, the methodology design, and the outcomes of the empirical analyses. In the written part the points assigned to each question/exercise will be clearly indicated next to the question/exercise. Wrong answers may cause a score penalty. Missing answers will receive no penalty. The written exam lasts 1 hour and is closed-book. Electronic devices are not allowed.
In addition to the message sent by the online system, students with disabilities or Specific Learning Disorders (SLD) are invited to directly inform the professor in charge of the course about the special arrangements for the exam that have been agreed with the Special Needs Unit. The professor has to be informed at least one week before the beginning of the examination session in order to provide students with the most suitable arrangements for each specific type of exam.
Esporta Word