KEYWORD |
Enabling Local Tightly, Global Loosely coupled Programmable Accelerators in Heterogeneous Systems
Thesis in external company
keywords COMPILERS, EMBEDDED SYSTEMS, FIRMWARE, MULTI-CORE, PARALLEL COMPUTING, SOFTWARE, SOFTWARE ACCELERATION
Reference persons ALESSIO BURRELLO, DANIELE JAHIER PAGLIARI
External reference persons Dr. Davide Schiavone (EPFL, Lausanne)
Prof. David Atienza (EPFL, Lausanne)
Prof. Luca Carloni (Columbia University, New York City)
Research Groups DAUIN - GR-06 - ELECTRONIC DESIGN AUTOMATION - EDA, ELECTRONIC DESIGN AUTOMATION - EDA, GR-06 - ELECTRONIC DESIGN AUTOMATION - EDA
Thesis type EXPERIMENTAL
Description Many domain-specific heterogeneous Systems-on-Chip (SoCs) equipped with multiple accelerators have been recently proposed to speed up Artificial Intelligence computations. Such accelerators are called “tightly coupled” when they are integrated close to the CPU so that the synchronisation and data exchange between them are shallow and require low latency. In such systems, the accelerators and CPUs are usually connected on the same bus. Otherwise, they are called “loosely coupled.” In these systems, the accelerators and CPUs are not connected in the same bus but rather in a network-on-chip (NoC). This allows for more scalable SoCs but at the price of slower and more complex communications.
X-HEEP (eXtendable Heterogeneous Energy-Efficient Platform) is an open-source, configurable, and extensible single-core RISC-V 32-bit MCU developed at the Embedded Systems Laboratory (ESL), sponsored by the EcoCloud Sustainable Computing Center of Swiss Federal Institute of Technology Lausanne (EPFL). It has been designed based on existing open-source IPs from the PULP, OpenHW Group, and OpenTitan projects. Its main advantage is that it eases the integration of tightly coupled accelerators throughout the so-called eXtension-Accelerator interface (XAIF) into SoCs. So far, it has been implemented with accelerators such as CGRAs, Near-Memory Computing IPs, Systolic Arrays, POSIT datapaths, and GPUs.
ESP (Embedded Scalable Platform) is an open-source platform for heterogeneous SoC design that provides a flexible tile-based architecture built on a multi-plane NoC. It was developed at Columbia University by the System-Level Design (SLD) group and is also compatible with RISC-V IPs. ESP also provides a flow to develop and integrate accelerators described in High-Level Synthesis (SystemC, C++) or RTL (Verilog/SystemVerilog/VHDL) loosely coupled via the NoC with a specified tile-based interface.
In this project, we propose to build a Local-Tightly, Global-Loosely coupled Accelerator Heterogeneous System by integrating X-HEEP into an ESP SoC. We want to use ESP to build the main tile-based SoC and X-HEEP to build the internals of individual accelerator tiles.
This highly programmable and heterogeneous SoC will have several levels of accelerator integration. In fact, inside a tile, the accelerator will be seen as tightly coupled by the local CPU within the tile (X-HEEP), while it will be seen as loosely coupled by the CPU(s) integrated into another tile.
The final system can leverage programmable tiles based on flexible CPU+Accelerator architectures, where each tile is specialized for a given function. For example, an ESP system could be composed of one tile specialized in running Linux with a RISC-V CPU; one tile could integrate X-HEEP with a GPU for high-parallel functions; one tile could contain X-HEEP with a CGRA for highly spatial reconfigurable functions; and one final tile with X-HEEP with near-memory computing IPs to efficient local processing and last-level cache functions. Plus, extra tiles for memories and I/O.
Project objectives:
- Understand the ESP and X-HEEP architectures, and emulate both on FPGA or simulate them using Questasim.
- Build an interface and its relative bridge to connect X-HEEP and ESP and integrate X-HEEP into ESP in a minimalistic configuration to verify and test an application on Questasim or FPGA. To improve compatibility test coverage, one tile must be X-HEEP, while the other must be a tile coming from the ESP IPs. Such a test should include a C program that enables the bidirectional communication of the two tiles.
- Build an ESP SoC based on an ESP’s example capable of running Linux with the CVA6, including an X-HEEP tile enhanced with a tightly coupled accelerator. The final system should leverage X-HEEP to accelerate one AI function.
Required skills Required knowledge and skills:
- RTL design and FPGA implementation in SystemVerilog
- Good understanding of memory architectures and microcontrollers
- Good analytical skills
- Good background in computer architecture
- Teamwork and git
Appreciated skills:
- Scientific curiosity
- Good communication skills
- Advanced English
Notes Thesis in collaboration with the École Polytechnique Fédérale de Lausanne (EPFL) and with the Columbia University. The thesis can be carried out from Turin, Lausanne or New York City conditioned to each group's availability at the time of application and to the student's constraints.
Deadline 07/10/2025
PROPONI LA TUA CANDIDATURA