Master Thesis Topic Bank
Welcome to the Master Thesis Topic Bank at GPT-Lab! This resource provides a curated list of thesis topics for master’s students at Tampere University who wish to conduct their thesis in the fields of artificial intelligence and software engineering. Here, you’ll find a variety of predefined topics that reflect the cutting-edge research happening at GPT-Lab. Each topic is designed to align with our core focus areas, providing you with opportunities to contribute to innovative projects.
You are encouraged to select a topic from the list or propose your own in the registration form, provided it falls within the scope of AI and software engineering. Whether you’re passionate about natural language processing, machine learning, or software development, you’ll find exciting opportunities that match your academic and research interests.
For those proposing their own topics, please ensure they meet the criteria and relevance to our ongoing research themes.
How It Works:
- Browse Topics: Review the list of available thesis topics, each with a brief description.
- Select or Propose: Choose a topic or propose your own idea during the application process.
- Get Started: Once your application is accepted, you will be paired with an advisor to begin your research journey.
Disclaimer: For the best viewing experience, please access this table on a desktop device.
Thesis topic | Description | Supervisor/Thesis advisor | Availability |
---|---|---|---|
Utilizing RAG by using Graph database in [real world application] implementation | TBA | Dr. Mika Saari | Yes |
Continuous Data Integration in RAG Systems | Additional information: Continuous Data Integration (CDI) in Retrieval-Augmented Generation (RAG) systems refers to the real-time, automatic update and synchronization of data sources used for retrieval, which is then leveraged to improve the generative model's responses. Prerequisites: Experience with Data Engineering and LLMs with an understanding of RAG | Ayman Khan | Yes |
Benchmarking ethics in Code generated by LLM: literature review | Additional information: Research in this area primarily focuses on ensuring fairness, mitigating biases, and addressing the risks associated with hallucinations in generated code. When it comes to ethical benchmarking of code generated by LLMs it’s a complex and considerably a very new area of research that requires considering multiple dimensions, from fairness and bias mitigation to communication and contextual understanding. Students can have a comparison study of all the existing literature work on benchmarking ethics and maybe propose their own. Prerequisites: Understanding of EU AI ACT, LLMS and benchmarking. | Ayman Khan | Yes |
How do open-source LLMs perform in real-life software engineering tasks? | Additional information: Student will explore the application of open-source LLMs in real-world software engineering tasks. The focus will be on benchmarking LLMs like Code Llama, WizardCoder, Phind-CodeLlama, Mistral, Starcode, Yi-Coder, or DeepSeek Coder against closed-source models, analyzing their efficacy in tasks such as code and test generation, refactoring, and bug detection and fixing on benchmarks from the industry. Prerequisites: A solid foundation in software engineering and familiarity with machine learning concepts is needed. Basic knowledge of Python programming and access to computational resources (e.g., GPUs) is recommended as well as previous experience with local large language models. | Prof. Pekka Abrahamsson | Yes |
Impact of generated PlantUML diagrams on large language model generated code accuracy. | Additional information: The performance of an LLM generating or fixing code depends on the correct context in which the edits are made. Hence, the research question is: Can the performance of the coder AI be improved by enriching the prompt with generated UML diagrams? The diagrams will be generated from relevant project source files and given to the LLM using a suitable DSL such as PlantUML. The feasibility of the approach is tested on an SWE- bench, some subset of that, or some other coding benchmark. Prerequisites: The student has experience ising coding LLMs such as Github Copilot, Qwen 2.5, Mistral ,or DeepSeek Coder. Also, one should be fluent in creating and reading UML, know how to write PlantUML, and be interested in empirical testing of software tools. | Dr. Jussi Rasku | Yes |
Reducing Hallucinations in LLMs Using Knowledge Graphs and RAG | Additional information: This thesis will focus on a problem that many large language models (LLMs) have, which is providing incorrect or confusing information, known as hallucinations. To solve this, the study will explore how to use knowledge graphs and retrieval-augmented generation (RAG) methods together. Knowledge graphs are models that store facts in a structured way, helping LLMs base their answers on real information. RAG allows LLMs to pull relevant information from external sources before generating text. This research will study how well different types of knowledge graphs and RAG techniques reduce hallucinations and may suggest new ways to combine these methods for better results. The goal is to make LLMs more reliable and trustworthy in areas like customer service, content creation, and information access by addressing the hallucination problem. Prerequisites: This thesis assumes familiarity with AI concepts, particularly LLMs. Basic understanding of knowledge graphs and experience with information retrieval techniques, especially RAG, are beneficial. While no deep expertise is required, an analytical mindset and willingness to engage with these methods are crucial. | Toufique Hasan | Yes |
Ethical issues in AI agents and multi-agent collaboration: a systematic literature review | Additional information: As AI continues to progress, hypothetical future risks become the reality of today. AI agents, both lone agents and AI agents working collaboratively, present various promises across a wide variety of industries, but also various risks. Discussion on these systems has accelerated with recent advances in Generative AI (GenAI) and Large Language Models (LLMs). A study systematically reviewing this research, so as to gain an overview of ethical risks and mitigation measures already acknowledged in literature, presents a timely contribution to the field of AI ethics. | Dr. Kai-Kristian Kemell | Yes |
Effective ways to integrate open-source LLMs into human-centric workflows | TBA | Md Mahade Hasan | Yes |
Assessment of transparency in multi-modal generative models | Additional information: Transparency in AI relates to approaches that make it possible for humans to understand the results of AI systems and the motivations or reasons behind such results. Research on transparency and explainability of generative AI solutions is in its infancy, with some work reported on the transparency of LLMs. A proposed thesis topic would focus on transparency approaches for multi-modal generative AI models. The preliminary study might survey what has been done concretely for such models and the main topic would then be to propose and explore/evaluate a good candidate approach. See this for similar focus but only on LLMs: https://hdsr.mitpress.mit.edu/pub/aelql9qy/release/2 Prerequisites: Strong foundation in machine learning and uses of deep learning, particularly with generative models such as GANs, LLMs, and multi-modal networks, as well as familiarity with AI ethics, transparency, and explainability in AI systems. | Prof. Niklas Lavesson | Yes |
RAG Technology for Addressing Data Memory Issues in LLMs | TBA | Zeeshan Rasheed | Yes |
Speech-to-text transcription of live audio by using an LLM | Additional information: Live captioning of audio and video streams is a challenging and common problem faced in many use cases. Examples are news and sports broadcasts. Many of the current (automatic) solutions face issues in transcribing terms, places and names. The use of modern LLM technologies is one of the more promising solutions to these challenges. Prerequisites: Knowledge of Python or similar programming language used for implementing AI systems. Experience of using REST APIs. Basic knowledge on how LLMs work. A knowledge of video and audio processing can be seen as an advantage. | Dr. Petri Rantanen | Yes |
What are the process/techniques/tools to gamify LLM-based multi-agent systems | TBA | José Siqueira de Cerqueira | Yes |
Multi-Agent RAG System for Requirements Traceability and Management | Additional information: This focuses on tracking and managing requirements throughout the software development process. It links requirements to design documents, code, and tests, ensuring that changes are properly reflected across the project. While traditional traceability handles these tasks well, it struggles to keep up with fast-paced development where changes happen frequently and teams work in real time. To improve this, real-time integration with tools like Slack, GitHub, and CI/CD pipelines allows the system to react quickly when requirements or code change. Agents can monitor communication channels, track code updates, and assess how these changes impact other parts of the project. This ensures that the system is always up-to-date and can trace how a change in one area affects other areas, such as tests or design documents. For this approach to work effectively, the system needs to be connected to real-time data sources, built on an event-driven architecture, and able to coordinate updates across multiple agents. By doing this, the system can provide a more responsive way to manage requirements and changes in modern software projects. Prerequisites: For a master's thesis focusing on Generative AI, Python, programming, open-source tools, and fine-tuning. | Malik Sami | Yes |
Automated SOFTWARE QUALITY ASSURANCE using LLM | Additional information: Use of Large Language Models to automate various aspects of Software Quality Assurance, such as code review, bug detection, and test case generation. The goal is to enhance the efficiency and accuracy of the SQA process by integrating LLMs to analyze codebases, detect vulnerabilities, and improve overall software quality. The project will involve experimenting with state-of-the-art LLMs like GPT to fine-tune their capabilities in understanding and validating code. The research will also focus on developing practical methods for integrating LLMs into existing software development pipelines, making the process more seamless and accessible for software engineers. The student will work with industry-standard tools and frameworks, collaborating closely with experts in software engineering and AI/ML to achieve innovative solutions for automating quality assurance tasks. Prerequisites: Strong programming Skill, understanding of Quality Assurance and proficiency in software testing, knowledge of LLMs and familiarity with AI/ML | Shahbaz Siddeeq | Yes |
How to effectively incorporate LLM4SE in developer’s workflow | TBA | José Siqueira de Cerqueira | Yes |
Evaluating Local Language Models and their integrated IDE use cases | Additional information: This thesis will explore how to evaluate and integrate local LLMs into VS Code to function as a code assistant, similar to GitHub Copilot, without relying on external cloud services. The student will compare local models like Deepseek-Coder-V2, Codestral, Llama Coder and others against cloud-based alternatives in terms of performance and utility for common coding tasks such as code completion, bug detection, and test generation. Prerequisites: Knowledge of software engineering, experience with VS Code plugins, and familiarity with Python and LLM concepts are recommended. Access to computational resources for running local models is necessary. | Dr. Jussi Rasku | Yes |
Empowering Companies to Integrate AI into their Business Processes using BPMN | Additional information: The student will develop a prototype low-code interface that uses LLMs to automate business processes modeled in BPMN. The goal is to investigate how LLM based tooling built with frameworks like LangChain or LangGraph, can be used to translate process models into Python code for automation and agent orchestration. Prerequisites: Understanding of UML, BPMN and AI, as well as experience with Python programming and REST APIs, is required. | Dr. Jussi Rasku | Yes |
Comparative Study of Finnish capable Language Models over several NLP Tasks | Additional information: The thesis will focus on evaluating language models fluent in Finnish, like Poro and Viking, across various tasks such as text generation, translation, and question answering. They are compared against the state-of-the-art commercial offering such as ChatGPT models. The student will benchmark the models in both CPU and GPU environments and assess their performance and accuracy. Prerequisites: Familiarity with large language models, natural language processing and machine learning concepts. Experience with benchmarking tools and Hugging Face models is recommended. | Dr. Jussi Rasku | Yes |
Fast-Slow LLM Chimera | Real-Time Language Model Switching for Interactive Low-Latency Applications Additional information: This thesis will investigate a hybrid approach where an LLM with minimal latency initiates a response, and then a more powerful LLM continues the response after a few words. The goal is to provide quick initial feedback while maintaining the quality of the response in applications such as real-time communication systems. Prerequisites: Knowledge of LLM architectures and local LLMs, as well as experience with Python and latency optimization, is needed. Access to multiple LLMs and computational resources is recommended. | Dr. Jussi Rasku | Yes |
A Playful UI/UX for AI Roundtable - Extracting Insight from LLM Simulated Workshop Sessions | Additional information: The student will further develop a prototype AI-roundtable where LLMs with distinct personas engage in discussions on predefined topics. The aim is to demonstrate how LLMs can simulate human workshop sessions and highlight the role of human interaction in meaning-making. The task involves packaging an existing prototype into a user-friendly web interface where users can choose discussion topics, assign roles, generate discussions and extract insight. Additionally, there will be functionality for caching and replaying past sessions. Prerequisites: Experience with UI/UX design and web development (e.g., React or Vue), as well as basic knowledge of LLM APIs like OpenAI or Azure. Familiarity with Python and conversational AI is recommended. | Dr. Jussi Rasku | Yes |
Generating UI Code for Scientific Command Line Tools Using LLMs | Additional information: This thesis focuses on generating a minimalistic GUI for scientific tools using LLMs. The case study involves building a Tkinter-based interface for the existing VeRyPy vehicle route optimization library. Tasks include creating a low-dependency map-based visualization and improving the modularity of the code, particularly in handling object functions and constraint calculations. The student will report on the effectiveness of LLM-assisted code generation for refactoring and GUI building. Prerequisites: Knowledge of Python, GUI programming (Tkinter or similar), and familiarity with operations research and optimization problems. Previous experience with LLM-based code generation tools would be beneficial. | Dr. Jussi Rasku | Yes |
Empirical Study on the Performance Hit when Reducing the VRAM Footprint of Open-Source LLM Models | Additional information: This thesis investigates methods for reducing the VRAM requirements of large open-source LLMs, specifically models like Mixtral and Llama3, for more accessible deployment on hardware such as RTX 3090 GPUs. The student will explore techniques like quantization and report on the trade-offs between model performance, accuracy, and hardware efficiency. Prerequisites: A solid understanding of machine learning and LLM architectures is required. The student should be fluent user of *nix command line tools. Familiarity with model compression techniques such as quantization and experience with GPU-based training environments would be advantageous. | Dr. Jussi Rasku | Yes |
A Lightweight Command Line GPU Time Allocation System for a Multi-Organization Shared Server | Additional information: This project aims to design and implement a lightweight, single-server, command-line-based GPU allocation system used by multiple organizations. The system will enable users to reserve GPU time efficiently, show the reservations, and prevent the use of unallocated resources, ensuring fair usage across various projects. The student will also compare the solution against existing systems in terms of performance and ease of use. Prerequisites: Strong programming skills (preferably C), knowledge of Linux systems and command line interfaces, and experience with distributed computing environments. | TBD | Yes |