Thesis BELLE2-MTHESIS-2021-066

FPGA-based Self-repairing Digital Circuits Design for the Belle II Experiment

Sara Massarotti ; Raffaele Giordano

University of Naples "Federico II" Naples (Italy)

Abstract: Belle II is a High-Energy Physics (HEP) experiment designed for measurements in the heavy flavor sector of the Standard Model (SM) and for New Physics (NP) searches. The experiment is at the rare/precision frontier and it requires very selective triggering and real-time processing of the produced data. The Belle II detector is installed at the KEK laboratory in Tsukuba, Japan and it is built around the interaction point of the SuperKEKB e+e- high-luminosity (8x10^35 cm^2s-1) collider. Belle II includes high-speed reconfigurable on-detector electronics for data processing and transfer, in which static RAM-based Field Programmable Gate Arrays (SRAM-based FPGAs) are key components. Due to the SuperKEKB operation, detectors and electronics operate in a radiation environment, that may impact their functionality. The Beam Exorcism for a Stable Experiment II (BEAST II) commissioning detector has been designed to measure the radiation effects and to prevent radiation damage of detectors and electronics. It comprises a monitoring system for radiation effects in Xilinx Virtex-5 and Kintex-7 SRAM-based FPGAs, which have been selected due to their usage in some of the Belle II sub-detectors. The main issue with the usage of SRAM-based FPGAs in radiation environments are single event upsets (SEUs) in the configuration memory which may cause failures of the firmware. In order to preserve the correct functionality, mitigation techniques based on triple modular redundancy (TMR) and configuration correction, i.e. “scrubbing”, have been developed. In this thesis work, I have designed and implemented a self-repairing circuit based on a novel combination of these techniques, the generation of redundant configuration frames. I implemented my work on a Xilinx Kintex-7 FPGA (XC7K325T) included in the BEAST II detector. The proposed self-repairing system, named the Configuration Consistency Corrector (C3), leverages both TMR and configuration scrubbing. In particular, the redundancy of the configuration allowed me to implement the self-repair without external memories, i.e. the resources needed to protect the FPGA firmware are all included within it. Moreover, an internal digitally-controlled-oscillator (DCO) generates the system clock to the scrubber, providing a fully self-contained system. I have studied a reliability-oriented design flow supported by Xilinx, the Isolation Design Flow (IDF). It is conceived to optimize the isolation between different modules within the same FPGA device and to enhance the reliability without using multiple devices. Thus, I have modified C3 architecture in order to implement a second version of the self-repairing circuit according to the IDF requirements. I have designed and performed dedicated tests to evaluate the self-repair capability of the two versions. These tests aimed at simulating SEU effects on the circuits. In particular, they are able to identify the bits defining the circuit functionality and to drive the circuit itself to corrupt them and to attempt self-repair. Then, I have identified the configuration memory bits causing a circuit failure, i.e. which cannot be self-corrected, and I have performed a statistical analysis on their distribution for the two circuits. My results show that the SEU-to-failure cross section for the plain version of the circuit is 5.6+/-0.5*10^-11 cm^2 and it increases to 1.1+/-0.1*10^-10 cm^2 for the IDF version, i.e. the probability to observe a failure of the IDF implementation is almost twice the plain one. In the final part of the thesis work I have investigated the possible causes of this counterintuitive result, concluding that the present architecture of the circuit should be modified to fully benefit from IDF. The plain C3 version is now running on the XC7K325T FPGA of BEAST II. Nevertheless, my work with the IDF C3 implementation shed light on the development of self-repair circuits based on the Xilinx IDF, and I believe it contributed to the state of the art in the usage of the FPGAs on HEP detectors.

Note: Presented on 17 02 2021
Note: MSc

The record appears in these collections:
Books, Theses & Reports > Theses > Masters Theses

 Record created 2021-07-01, last modified 2021-07-19

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)