KEYWORD |
GR-06 - ELECTRONIC DESIGN AUTOMATION - EDA
End-to-End Deployment of Autoregressive Transformers on Heterogeneous RISC-V SoCs
Thesis abroad
keywords ARTIFICIAL INTELLIGENCE, DEEP LEARNING, DEEP NEURAL NETWORKS, DNN COMPILERS, EMBEDDED SYSTEMS, ENERGY EFFICIENCY, LLMS, LOW POWER, SOFTWARE
Reference persons ALESSIO BURRELLO, DANIELE JAHIER PAGLIARI
External reference persons Dr. Victor Jung (ETH, Zurich)
Prof. Luca Benini (ETH, Zurich)
Research Groups DAUIN - GR-06 - ELECTRONIC DESIGN AUTOMATION - EDA, ELECTRONIC DESIGN AUTOMATION - EDA, GR-06 - ELECTRONIC DESIGN AUTOMATION - EDA
Thesis type EMBEDDED SOFTWARE DEVELOPMENT, EXPERIMENTAL, RESEARCH, SOFTWARE DEVELOPMENT
Description Large Language Models (LLMs), such as GPT, BERT, and Llama, have revolutionized natural language processing by enabling human-like text understanding and generation. Most of these models feature a very large number of parameters and cannot run on normal consumer hardware. Hence, compressing LLMs has become a very important task, and in the in past year, many Small Language Models (SLMs) have been proposed.
However, the number of parameters is not the only challenge to bring Language Models to the edge. Efficient DNN compilers need to be developed to efficiently use the plethora of new DNN Accelerators proposed in the past years.
This thesis will focus on the Siracusa [1] platform, a Heterogeneous SoCs with Software-Managed Caches and developed in collaboration between ETH Zürich and Meta. You will use the Deeploy [2] compiler to build a demonstrator doing end-to-end autoregressive inference of a small Llama-based model.
Deeploy compiles ONNX [3] graph to bare-metal C, it can currently compile any singular step of the autoregressive process, your job is adding support for code generation for consecutive steps. Additionally, you will need to implement a simple tokenizer and token selector.
The man objectives of this thesis are:
1. Extend Deeploy to generate code for many steps of the autoregressive inference process.
2. Implement an efficient tokenizer.
3. Implement a token selector.
4. Benchmark extensively the developed solution.
References:
[1] Siracusa: https://pulp-platform.org/siracusa/
[2] Deeploy: https://github.com/pulp-platform/Deeploy
[3] ONNX: https://onnx.ai/onnx/intro/concepts.html
Required skills • Proficiency in Python and C
• Familiarity with Language Models
• Knowledge of compiler concept and bare-metal programming
Notes This thesis is conducted in tight collaboration with ETH Zürich. The thesis can also be carried out in Zurich.
Deadline 03/02/2026
PROPONI LA TUA CANDIDATURA