Stress testing chatbots: evaluating factuality, reasoning, abstraction, and other safety challenges
Reference persons ANTONIO VETRO'
Thesis type RESEARCH / EXPERIMENTAL, RESEARCH, INNOVATIVE
Chatbots have become increasingly popular in recent years, with many industries using them to provide customer support, automate conversations, and even offer therapy services. However, chatbots are still in their early stages of development, and there is a need to assess their capabilities and limitations. In this thesis, the student will conduct a series of stress tests for chatbots that evaluates their factuality, reasoning, abstraction, and other safety challenges.
How accurate and reliable are chatbots in providing factual information?
To what extent can chatbots conduct deductive or inductive reasoning ?
How chatbots can simulate abstraction of concepts and provide meaningful responses?
How effectively can chatbots engage in emotional conversations?
What other risks chatbot conversations embed (e.g., privacy, disinformation, harms of representation, harmful content, cybersecurity, privacy, etc) ?
To conduct this study, the student will use a mixed-methods approach that includes both quantitative and qualitative data collection and analysis. The study will involve testing existing chatbots (e.g., ChatGPT) and simulate conversations in different scenarios. The chatbot will be tested to respond to questions according to different risk categories, and provide information on a range of topics, including science, history, and current events.
To evaluate the chatbot's performance, the student will conduct a series of stress tests that assess chatbots' factuality, reasoning, abstraction, and other safety challenges. The student might also conduct surveys and interviews to gather feedback from other people about their experience with the chatbots.
This study aims to provide insights into the limitations and capabilities of chatbots. This study will contribute to the ongoing debate about the ethical implications of chatbots.
Required skills Good programming skills and basic knowledge of common data analytics tools and techniques. Curiosity. Grade point average equal to or higher than 26 can be a criterion for selection of candidate.
Notes When sending your application, we kindly ask you to attach the following information:
- list of exams taken in you master degree, with grades and grade point average
- a résumé or equivalent (e.g., linkedin profile), if you already have one
- by when you aim to graduate and an estimate of the time you can devote to the thesis in a typical week
Deadline 17/11/2024 PROPONI LA TUA CANDIDATURA