# **Highlights of TWEPP 2023**

Tom Williams (RAL)

RAL seminar, 23/11/2023

#### TWEPP = Topical Workshop on **Electronics** for Particle Physics

The workshop covers all aspects of electronic systems, components and instrumentation for particle and astro-particle physics such as: electronics for particle detection, triggering, data-acquisition systems, accelerator and beam instrumentation. Operational experience in electronic systems and R&D in electronics for LHC, High Luminosity LHC, FAIR, neutrino facilities and other present or future accelerator projects are the major focus of the workshop.

#### TWEPP = Topical Workshop on **Electronics** for Particle Physics

The workshop covers all aspects of electronic systems, components and instrumentation for particle and astro-particle physics such as: electronics for particle detection, triggering, data-acquisition systems, accelerator and beam instrumentation. Operational experience in electronic systems and R&D in electronics for LHC, High Luminosity LHC, FAIR, neutrino facilities and other present or future accelerator projects are the major focus of the workshop.

#### TWEPP = Topical Workshop on Electronics for Particle Physics

#### The purpose of the workshop is:

- Present original concepts and results of research and development for electronics relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities;

- Review the status of electronics for running experiments and accelerators;
- Identify and encourage common efforts for the development of electronics;
- Promote information exchange and collaboration in the relevant engineering and physics communities.

#### TWEPP = Topical Workshop on Electronics for Particle Physics

The purpose of the workshop is:

- Present original concepts and results of research and development for electronics relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities;

- Review the status of electronics for running experiments and accelerators;
- Identify and encourage common efforts for the development of electronics;
- Promote information exchange and collaboration in the relevant engineering and physics communities.



#### A few plenary sessions: invited talks

Two parallel sessions, talks grouped into tracks:

- ASICs
- Optoelectronics & links
- Power, grounding & shielding
- Production, testing & reliability
- Programmable logic, design & verification tools and methods
- Radiation tolerant components & systems
- Module, PCB & component design
- System design, description & operation
- Trigger and timing distribution



Two poster sessions *Abstract submissions by topic*



A few plenary sessions: invited talks

Two parallel sessions, talks grouped into tracks:

- ASICs
- Optoelectronics & links
- Power, grounding & shielding
- Production, testing & reliability
- Programmable logic, design & verification tools and methods
- Radiation tolerant components & systems
- Module, PCB & component design
- System design, description & operation
- Trigger and timing distribution



Two poster sessions **Abstract submissions by topic** and the *Abstract submissions by topic* 

*Topics I'll focus on today …*

# **Location, Location, Location**

#### Calaserena Village, near Cagliari (Sardinia)

#### ● Resort in Geremeas

- Very small village, mainly holiday homes & resorts
- 45 minute drive from Cagliari
- Surprisingly cheap
	- Out of season temperature 20 25C (from *my perspective, ideal conference weather!)*
- In previous years, hosted in a city or at least a large town. Potential downside of resort?
	- Stuck there all week
		- What if the food is bad?
		- *■ Even worse: What if the bar closes early …*
- On the other hand: Private beach!





#### **Location, Location, Location**







### **Food & drinks**







### **Familiar faces**

Quite a few people from RAL and ex-PPD'ers

- Dave Newbold
- Luigi Calligaris, Luis Ardila
- Weiming Qian
- Marcus French, Mark Willoughby
- ASIC design team





## **Onto the physics electronics ...**

Many, many talks & posters ⇒ this summary necessarily focuses on what I found interesting (and could understand)

Mostly go chronologically:

- Current experiments
- HL-LHC upgrades
- R&D for future experiments
	- DRD / DRD-ish



# **Current experiments**



CMS Level-1 Trigger selects 100kHz of bunch crossings, making decision within 3.8 µs

- FPGA-based boards connected by optical links
- Calorimeter & muon subsystems: Perform reconstruction of relevant objects
- Global trigger: Applies menu of trigger paths = single-/multi-object kinematic & quality cuts

Recent development: Detect ʻanomalous events', by implementing autoencoder neural network in Global Trigger



#### **Why Anomaly Detection?**

#### Problem:

Traditional trigger strategies rely on a priori knowledge of signal or generic kinematic selections.

What if we miss new physics because we don't have the right trigger?

#### Solution:

Triggering on "anomalousness" offers an answer that is both

- 1. Signal agnostic Applicable to signatures that we have not had the foresight or person-power to target specifically
- 2. Highly sensitive Can boost signal efficiency to signatures limited by L1 trigger bandwidth





What is **AXOLTTL**? Anomaly eXtraction Online Level-1 Trigger aLgorithm

- Variational autoencoder (VAE) trained on real unbiased data to detect outliers
- Information bottleneck created by small-dimensional latent space enforces efficient encoding  $\Rightarrow$  learning
- Calculated from standard Global Trigger ( $\mu$ GT) quantities
	- $(pT, \eta, \phi)$  hardware integer inputs from: 1 p<sub>T</sub><sup>miss</sup>, 4 e/ $\gamma$ , 4  $\mu$ , and 10 jets





#### **Model Design**

Level-1 Trigger constraints informed design





#### **Model Design**

Level-1 Trigger constraints informed design





Reconstruction term

Full regularization term

- Remove decoder network
	- Significant latency & resource savings, minimal performance degradation
- $\cdot$  Remove latent  $\sigma$  term from loss calculation
	- Saves even more on timing, negligible performance degradation





#### **Test Crate Validation**

- For certain runs, Test Crate decisions are recorded in 2023 data files
	- Use these bits to validate emulation and show rate agreement
- Minimal ( $\sim$ 1%) mismatches between trigger hardware and emulation
	- Mismatches clustered near decision boundaries. most likely due to rounding issue







### **GNNs in an FPGA @ sPHENIX**



- Located at RHIC accelerator at BNL (USA)  $\bullet$
- $\sim$ 56 MHz accelerator clock with  $\sim$ 9.3 MHz BC
- Running period 2023-2025  $\bullet$
- $~\sim$ 4m long,  $~\sim$ 5m high, 1000 tons  $\bullet$
- Tracking detectors (MVTX, INTT, TPC, TPOT) and calorimeters (EMCAL, HCAL)
- 1.4 T Magnetic Field,  $|\eta| \leq 1.1$  $\bullet$
- Tracking detectors capable of streaming readout, but unable to save all TPC data.
- 15 kHz designed Trigger Rate





### **GNNs in an FPGA @ sPHENIX**

#### **Motivation - Heavy Flavour**

- Integrate the AI-based heavy flavour trigger system demonstrator into the sPHENIX experiment for p+p run in 2024 to R&D its feasibility, requirements, and constrains
	- Heavy-flavour (HF) events are very rare  $\sim$ 1% of Minimum Bias (MB) events at RHIC energy
	- RHIC collision rate is around 2-3 MHz, sPHENIX readout 15 kHz (DAQ 300 Gb/s)
		- " Trackers are Streaming Readout (SRO) capable, but can't save all TPC data
	- 10% trigger-enhanced SRO increases HF MB rate  $\sim$  300 kHz
	- ML HW tagging aims to sample remaining 90% of the luminosity using the tracklet reconstruction from the silicon trackers
- The aim is to deploy future system on Electron-Ion Collider (EIC)
	- Al-based electron tagging with streaming readout to identify the (non)interesting Deep-Inelastic-Scattering (DIS) processes in the e+p/A collisions.
		- based on the measured scattering electron energy and direction



### **GNNs in an FPGA @ sPHENIX**

#### The ML algorithm - TrackGNN

- Based on Graph Neural Network (GNN)
	- Detector and physics knowledge improves prediction
	- Based on PyTorch and PyTorch Geometric
- Initial training on simulated data from MVTX and INTT
	- On GPU NVIDIA Titan RTX, A500, and A6000
- Topological selection of HF signals on FPGA
	- Tracking and clustering must be done on FPGA
- Beam-spot and anomaly detection on GPU based feed-back system
- We propose a novel method to treat the events as track graphs instead of hit graphs. This method is driven by the physics (transverse momentum)
	- Estimate momentum based on silicon hits -> 15% improvement on trigger decision



**ECML PKDD 2022. Sub 1256** 



### **GNN in an FPGA @ sPHENIX**

#### Generation of the GNN IP core – two parallel efforts

- 1. Team lead by the Georgia Institute of Technology (GIT)
	- Direct translation of the sPHENIX TrackGNN model to IP using HLS
	- Model
		- 5 layers, each layer: 64 dim 4 layers for node and 64 dim 4 layers for edge embedding
	- $-$  Goal: 100-200 nodes, 200-500 edges
	- $-$  Implementation
		- 100 nodes, 140 edges ٠
		- Measured Start-to-end latency
			- $-$  150 us @ 130 MHz, 130 us @ 180 MHz
		- Still needs 10-20x speedup! ٠



- Latency - Fast-paced development 380 us (25<sup>th</sup> August) -> 150 us (4<sup>th</sup> September) @ 130 MHz
	- Attempts to increase clock to 300 MHz failed on timing constrains ٠

Target: 5µs

- Detailed latency breakdown and parallelism exploration ongoing
- Might require model changes os Alamos

Close discussion between model developers and FPGA engineers

### **GNN in an FPGA @ sPHENIX**

#### arXiv:2112.02048 Generation of the GNN IP core - two parallel efforts arxive.114.24076

- 2. Team lead by the Massachusetts Institute of Technology (MIT) and Fermilab (FNAL)
	- Based on High Level Synthesis for Machine Learning (his4ml), a generalized python framework for machine learning inference in FPGAs
- Third main upgrade underway, focusing on 3 examples  $\bullet$ 
	- Example 1: Tri-muon reconstruction with the LHC (muon endcaps)
	- Example 2: Heavy flavor tracking at sPHENIX
	- Example 3: Silicon strip tracking at LHC



# **HL-LHC upgrades**



#### **A common solution**

- Single backend board design used in tracker, HGCal, level-1 trigger, MTD, BRIL. **> 700pcs**
- VU13P FPGA with 124x bidirectional optical links (25Gb/s for backend)

#### **Beyond the hardware …**

● Boards provided with common infrastructure FW & SW — everything apart from the system-specific algorithm firmware ○ Focus of effort at RAL

Collaboration of 8 institutes (UK, Germany, France, Italy, India, China) formalised: Serenity consortium













# Lessons learned while developing the Serenity-S1 ATCA card

Torben Mehner on behalf of the Serenity consortium and the CMS Tracker group

5th October 2023

#### Payload Service Area **KRIA** FF Storage **CLK** PHY 5x CLK **FPGA** Eth Switch **IPMC** FF Power In ZynqMP board management: https://iopscience.iop.org/article/10.1088/1748-0221/17/03/C03009 OpenIPMC: https://ieeexplore.ieee.org/document/9465210

**Board Overview** 



- **Board Infrastructure**  $\bullet$ 
	- **Xilinx KRIA SoM**  $\circ$
	- Clock, power, PHY  $\circ$
	- SD, SSD  $\circ$
- **ATCA Infrastructure**  $\bullet$ 
	- **Backplane connectors**  $\circ$
	- IPMC (OpenIPMC DIMM module)  $\circ$
	- Power input  $\circ$
	- **Ethernet switch**  $\circ$
- Payload  $\bullet$ 
	- **FireFly optical transceivers**  $\circ$
	- **VU13P FPGA**  $\circ$
	- **Clocks**  $\circ$

3

32

#### Component Shortage Mitigation - PLL

- Zero-delay jitter cleaner phase-locked loop  $\bullet$
- Skyworks Si5395A not available  $\bullet$
- Evaluated ZL30274 (dual PLL) P. Hazell, S. Baron  $\bullet$
- Accumulated jitter <1ps (1 kHz 10 MHz)  $\bullet$ 
	- Virtex Ultrascale+ requirements met with LDO  $\circ$ power supply







Δ

#### Component Shortage Mitigation - ATCA connectors



- TE has stopped producing Zone 1 connectors  $\bullet$
- EPT will stop production in May 2024
- Zone 1 connectors are still produced by  $\bullet$ 
	- Positronic VPB series  $\circ$
	- Conec ATC22\* series  $\circ$
- Adapt footprint to be multi-vendor compliant
	- Mechanical alignment pins  $\circ$
- CERN has bought all connectors for Serenity production (Zone 1 and Zone 2) + 15%



### **Samtec FireFlys**



# **Samtec FireFlys**

Used extensively in HEP backend electronics

- Custom module for frontend links (i.e. lpGBT)
- 28Gbps parts used for sending data between backend boards
	- 4-channel bidirectional modules
		- Worked fine for several years
	- 12-channel unidirectional modules
		- ✓ Same footprint as frontend parts
		- $\triangleright$  Issues encountered with beta TX parts (Jul 22)
			- Some channels died during qualification
			- Traced to VCSEL *(another company)*
		- **■ Samtec rep. gave comprehensive summary of source of problems at TWEPP** 
			- VCSEL fixed; Samtec improved their QA.
		- Testing new parts on Serenitys right now!


### *Weiming Qian*



**Science and Technology Facilities Council** 

# **ATLAS Global Common Module (GCM)**



TWEPP2023, Sardinia Italy

### *Weiming Qian*



**Science and Technology Facilities Council** 

# **ATLAS Global Common Module (GCM)**



### *Weiming Qian*



**Science and Technology Facilities Council** 

# **ATLAS Global Common Module (GCM)**



# **CMS Level-1 Trigger**

- Decides which events should be read out
- Further selection downstream by HLT

### Challenges for phase-2

- Luminosity:  $1.5 \rightarrow 7.5 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}$
- Pile up:  $\sim 60 \rightarrow 200$
- Design goal: Retain the same thresholds
	- Extend acceptance for some final states (e.g. LLP)





# **CMS Level-1 Trigger (2)**

### Phase-2: Meeting the challenge

- Specification: A bit more breathing room ...
	- $\circ$  Latency: 3.8  $\rightarrow$  12.5 microseconds
	- $\circ$  Max. accept rate: 100  $\rightarrow$  750 kHz
- Inputs
	- Finer granularity
	- Particle trajectories from tracker *(NEW!)*
		- ✓ Identify primary vertex
		- ✓ Particle-flow-style reconstruction
- Re-build system using latest technology
	- UltraScale+ FPGAs
		- $\sqrt{ }$  More flip flops, LUTs, RAM, I/O ...
	- $\circ$  Optical links: 10  $\rightarrow$  25Gbps



 $1.0 -$ 





# **CMS Level-1 Trigger: Phase-2 architecture**



**Phase-2 trigger project** 



Science and **Technology Facilities Council** 

# **CMS Level-1 Trigger: Integration tests**

The subject of my poster

- Final system: O(200) boards of 4 designs, implementing 20 algos, connected by several thousand links
- Factorise testing via well-defined I/O interfaces
	- Fully validate algorithms first in single-board tests
- **✓** Extensive tests of link protocol *(error injection)*
- **✓** Verified most algorithms in single-board tests
- **✓** Several multi-board slice tests performed
- **✓** Latency currently 8.6 µs, less than 9.5 µs target
- **✓** Online SW: Already very mature
	- Reliably controlling several boards  $-$  incl. for several of the slice tests







### **FPGA Design with High Level Synthesis** Methodology, gains, and pitfalls

### **Michalis Bachtis**

### **University of California, Los Angeles** On behalf of the CMS Collaboration

**TWEPP 2023** 

### Lookup tables



return a+lookup[addr];

- An adder that reads a LUT and and adds a constant
- A LUT of 512x72 bits instantiated
	- In Ultrascale architecture  $= 1$  $\circ$ **BRAM**
- **At 100 MHz** 
	- Latency of 1 cycle [to read the  $\circ$ **ROM**
- **At 500 MHz** 
	- Latency of 2 cycles [automatic  $\circ$ register in the output of ROM]
- When increasing the width of more than 72 or if we add more than 52 entries
	- **Automatically instantiates**  $\circ$ more BRAMs

### Playing with the pipeline, reusing logic

- In Time Multiplexed designs: implement one algorithm core and feed  $\bullet$ several chunks of data. As an example let's assume:
	- 6 sets of three 16 bit numbers (a,b,c) are arriving in the system  $\circ$
	- For each set we need to calculate a+b\*c [with a DSP]  $\circ$
- We have a fully pipelined option and an option to reuse logic  $\bullet$



### Pitfall. What did the compiler really build?

- We did not specify anything in the code to force a shift register...  $\bullet$ 
	- We could have "helped" the compiler by mimicking the array manipulations in C  $\circ$
- In fact the design was implemented with muxes  $\bullet$



- Does it matter?
	- Sometimes it does because in large designs the routing congestion could get the  $\circ$ implementation to fail.. 15

### HLS algorithm + HDL data management+glue



- Optimal results obtained by both HLS cores and HDL
- Algorithms  $\rightarrow$  HLS
- Data management, SLR crossing etc  $\rightarrow$  HDL  $\bullet$

### The CMS Level-1 trigger upgrades for CMS Phase-2



#### Increased budget latency and rate:

- 3.8  $\mu$ s  $\rightarrow$  12.5  $\mu$ s
- 750 kHz of L1 output

### Advanced object reconstruction on FPGA:

- **B** Global Calorimeter Trigger (GCT) and Global Muon Trigger (GMT) (higher granularity)
- Global Track Trigger (GTT) (tracker tracks, vertex finding)
- Correlator Trigger (CL2) (Particle Flow)
- Global Trigger (GT) (with more complex algos)
- Resolution similar to offline level

### Level-1 trigger Data Scouting (L1DS) at 40 MHz LHC bunch crossing rate

- Collect and store the reconstructed particle primitives of the L1 processing chain at the full bunch crossing rate
- Enable study of exotic signatures that cannot be fit into the trigger budget
- **B** Global Trigger decisions  $\Rightarrow$  sDS
- Gorrelator, Global Track, Global Calorimeter and Global Muon Trigger  $\Rightarrow$  sGS
- **E** Can later be extended to include other systems in later stages  $\Rightarrow$  sLS

The physics potential of a Level-1 trigger Data Scouting system



### Phase-2 L1DS main physics plans

- Use when possible to study with L1 resolution
- High combinatorics:  $W \rightarrow 3\pi$ ,  $D_s v$ ,  $H \rightarrow \rho v$ ,  $\phi v$ , ...
- High rate: multiple soft (b-)jets, displaced (soft) leptons, ...
- Heavy Stable Charged Particles (HSCPs) over multiple BXs

#### Monitoring at the bunch crossing rate

- L1 trigger pre-/post-firing without special configurations
- Per-bunch luminosity measurements

### Level-1 trigger Data Scouting (L1DS) at 40 MHz LHC bunch crossing rate

- Collect and store the reconstructed particle primitives of the L1 processing chain at the full bunch crossing rate
- Enable study of exotic signatures that cannot be fit into the trigger budget
- **Global Trigger decisions**  $\Rightarrow$  **sDS**
- Correlator, Global Track, Global Calorimeter and Global Muon Trigger ⇒ sGS
- $\blacksquare$  Can later be extended to include other systems in later stages  $\Rightarrow$  sLS



**Rocco Ardino** 

### **Run-3 demonstrator of the L1DS**

### For LHC Run-3, L1DS demonstrator to readout multiple sources of the CMS L1 trigger:

- Very heterogenous system
- 3 boards (KCU1500, SB-852, VCU128), different output technologies (DMA, TCP/IP)



### Xilinx VCU128 boards setup in Run-3 L1DS demonstrator



(a) Test setup in CMS DAQ laboratory



(b) Point 5 service cavern production system

#### Setup for VCU128 scouting boards:

- One Stop Systems PCIe bus to accomodate multiple PCIe boards
- 2  $\times$  5 PCIe (3.0  $\times$  16) Slot Expansion
- Control server connected to PCIe bridge for control and monitor of boards

#### Production system in P5 service cavern:

- **1st VCU128: connected to 12**  $\times$  **BMTF** processors
- 2nd VCU128: connected to  $6 \times GT$ processors
- Both boards on same PCIe tree
- **Qutput links from CMS service cavern**  $\rightarrow$  surface (~100m)





GMT muon occupancy per bunch crossing for barrel muons, matching filling scheme.



Di-muon invariant mass distribution, after recalibration of L1T muon pT estimate

# **Longer-term R&D**



# Electronics for Colliders: What's Next?



**Science and** Technology<br>Facilities Council

OALLE-2: "Electronics for a future particle collider"







### Themes in detector requirements

- Improved granularity
	- More channels, more data, more multiplexing
	- Reconstruction / data reduction requires distributed data
- Improved precision
	- More data (but improved data reduction possible?)
- $\rightarrow$  4D and 5D techniques (space + time + energy deposition) • More data, need for high-accuracy timing distribution
- ▸ Low power -> reduced cooling -> less material
	- Requirement for advanced technology nodes, better integration
- Advanced data handling

26

- Sophisticated data reduction at front end to cope with backgrounds
- Design and simulation of full systems
	- Co-optimisation of sensors, electronics, and algorithms





Technology **Facilities Council** 



TF6: Calo

28



TWEPP, 4th October 2023

iew XX<sub>X</sub><br>iew planet<br>tigt sales

arge area water

**National Service N** 

.......

Dave.Newbold@stfc.ac.uk

R&D needs being met

Science and

**Facilities Council** 

**Technology** 

TF3: Solid state



### Verification - a bottleneck

- ASIC verification is more complex and time consuming than design
- Verification is resource intensive
	- New skill in HEP community







### Unleashing verification quality and efficiency



- Verification activities in HEP ASIC community  $\bullet$ 
	- Several ASIC projects already have dedicated verification experts to handle verification
	- Future ASICs will require more verification resources
	- We need to lay a strong foundation to ensure that verification activities are efficient and effective
- This talk focusses on processes and practices  $\bullet$ which improves quality of verification and makes it efficient
	- Improves overall quality of ASIC design project

#### **Maturity model**

- A set of structured levels that describe how well the behaviors, practices and processes of an organization can reliably and sustainably produce required outcomes
- Can be use used as  $\bullet$ 
	- a framework to discuss quality and efficiency
	- benchmark for comparison
	- an aid for introspection
	- a tool for appraisal



### **HEP Verification Maturity Model**









Science and

**Technology Facilities Council** 

# 28nm IP development programme

RAL TD contribution to CERN common IP library

### Amplifier IP

A range of amplifiers have been developed to validate the analogue technology limits of 28nm CMOS.

- Rail-to-Rail class AB Precision Amplifier
	- $\leq 800 \mu W$  power consumption
	- $70x42$ um<sup>2</sup>
	- Input range of 100-700mV
	- Input offset <3mV
	- $\triangleright$  High drive strength for high loads on-chip/off-chip
- $\Box$  PMOS input class AB High speed Amplifier
	- $\triangleright$  <700µW power consumption
	- $63x53um<sup>2</sup>$
	- Input range of 100-410mV
	- Bandwidth 100MHz at 1pF load
	- Input offset <5mV
	- High speed for driving fast signals or references.
- $\Box$  NMOS input class AB High speed Amplifier
	- $\leq 700 \mu W$  power consumption
	- $60x40um^2$
	- Input range of 390-700mV
	- Bandwidth 100MHz at 1pF load
	- Input offset <5mV
	- High speed for driving fast signals or references.



+ bandgaps, reference drivers, DACs, 1Gbps serializer, ...



Remote Direct Memory Access (RDMA), as the name suggests, allows read and write operations directly in the target machine(s). This implies no OS involvement allowing high-throughput and low-latency applications.

This requires RDMA enabled NICs on both ends (RNIC) that perform the DMA, reducing the CPU load.



 $\overline{2}$ 



Back-end boards required to get the data, and send it Front-end boards send data already packaged within an to the computing farms. This requires multiple custom ethernet frame allowing switching and routing. cards and custom boards

Choosing the proper protocol allows the use of COTS switches

ETH RDMA network stack library has been chosen for the first prototype. Some of its characteristics:

October 3 2023

- Entirely written in HLS (Vivado 2019.1)
- It targets Xilinx FPGA with PCIe connection
- $\bullet$  10/100 Gb/s speeds
- It supports UDP. TCP and RDMA



Systems @ **ETH** zürich



### **Real-time Firmware Simulation**

Start form ETH network stack entirely developed in HLS. Functionalities and features must be understood: real-time firmware simulation with real network traffic.

- Works on Linux machines: Tun/Tap devices
- It makes use of DPI-C interface of SystemVerilog: C code in our testbench!
- Tap device exchanges raw ethernet frames between simulation and Linux network stack
- We can capture such frames and study them



Soft-RoCE used to capture and store in memory data sent. Enable fast verification of the stack without going through sythesis/implementation every time.

Once the stack has been verified, firmware can be eventually built (Resources? Performances? Is timing closure reached?)



#### Some results - Wireshark-Used Wireshark to capture Ethernet frames coming out of the simulation. 127-275.549820803.22.1.212.100 224.0.0.251 181 Standard query 0x0000 PTR nfs, tcp.local, "OM" question PTR. ftp. tcp.local, "OM" question PTR webday, tcp.local, "OM" questio 128 386.678694676 22.1.212.209 22.1.212.10 **RRACE** 458 RC ROMA Write Only OP-0x808016 129 306.678741474 22.1.212.10 22.1.212.209 **BRoCE** 62 RC Acknowledge OP-Rx00001 08 09 27 5e bb b6 09 0a 35 02 9d e5 08 00 45 00<br>01 bc 60 00 00 00 40 11 a4 53 16 01 d4 d1 16 01<br>d4 0a 40 13 12 57 01 a4 00 00 00 00 00 00 00 00 11 d6 00 45 ff 00 00<br>00 11 d6 04 be 5d 300 cm 00 00 00 00 00 00 00 \* 000. .... = Flags: 0x0 ...8 9690 6960 8980 = Fragment Offset: 8 0018  $4.5.$  $-105$ Time to Live: D4 Protocol: UDP (17) digital. 82 2d 66 60 51 88 69 56 66 66 66 65 66 66 66 66 Header Checksum: Oxa453 [validation disabled] [Header checksum status: Unverified] Source Address: 22.1.212.209 Destination Address: 22.1.217.10 00 00 00 00 00 ff 40 00 00 00 00 00 00 00 00 00 00 - User Datagram Protocol, Src Port: 18515, Dst Port: 4791 Source Port: 18515 Destination Port: 4791  $1$ eneth:  $424$ 00 00 00 00 00 ff 89 90 00 00 00 00 00 00 00 00 \* Checksum: 0x0000 [zero-value ignored] [Stream index: 4] + ITimestamps1 00 00 00 00 00 ff c0 00 00 00 00 00 00 00 00 00 upr payload (416 bytes) - InfiniBand - Base Transport Hoader Opcode: Reliable Connection (RC) - ROMA WRITE Only (10) 0... .... - Solicited Event: False .0.. .... = MicReq: False  $...00$  .... = Pad Count: 0 .. 8000 - Header Version: 0 00 00 00 00 00 ff 40 01 00 00 00 00 00 00 00 00 Partition Key: 85535 Reserved: 00 .... - Acknowledge Request: True 00 00 00 00 00 ff 03 be 81 a7 .000 0000 - Reserved (7 bits): 0 Packet Sequence Number: 10337885 - RETH - ROMA Extended Treasgert Worden<br>Virtual Address: 0x000055067a86f000<br>Remote Key: 0x00000228 DMA Length: 384 (8x00000188) Invariant CRC: dxd3beBla7 Data (384 bytes) [Length: 384] [Community ID: 1:i7]xRwOChlxSOxSk/aWeCoeUsvMw] In this frame we can check: Queue Pair number IP addresses  $\bullet$ • Memory addresses • RDMA OP Code

INFN

## **RISC-V: Fault tolerance via triplication**



## **RISC-V: Fault tolerance via triplication**

### Fachhochschule Dortmund

University of Applied Sciences and Arts

- RV32-IMC Core  $\bullet$ 
	- $-$  3 stage pipeline
	- **Multiplication extension**
	- 50 MHz @ 1.2V
	- Fully triplicated core
- SRAM shared between instruction & data  $\bullet$ 
	- Flexible memory layout
	- IMEM & DMEM data bus can access whole SRAM address range
	- RISC-V pipeline stalls during load & store instructions to SRAM
	- load & store to peripherals simultaneously possible
- JTAG Interface  $\bullet$ 
	- JTAG TAP & debug module
	- Non-volatile debug ROM with debug ISR



STRV-R1 (SEU-tolerant-RISC-V) - TWEPP 2023 | alexander.walsemann@fh-dortmund.de
# **RISC-V: Fault tolerance via triplication**

#### Fachhochschule Dortmund

#### STRV-R1 - Heavy-Ion Irradiation SEFI

University of Applied Sciences and Arts

- Despite the SEE mitigation techniques SEFIs o
	- SEFIs observed during heavy-ion Irradiation
	- Average improvement over SEU cross-section  $\overline{\phantom{0}}$ 
		- At low LETs (<16 MeV.cm<sup>2</sup>/mg): 2800x
		- At high LETs (>32 MeV.cm<sup>2</sup>/mg): 7700x
- Estimated SEFI rate in HL-HLC environment

 $L_0$ 

 $[\frac{MeVcm^{2}}{mg}]$ 

 $< 1.0$ 

 $< 5.7$ 

 $< 3.3$ 

- SEE particle flux  $1 \times 10^9$  p/cm<sup>2</sup>/s  $\overline{\phantom{0}}$
- 2.2 Chip level SEFI per hour

Cross-

section

**SEU** 

**SEFI** 

Timing





 $\sigma_{HI\infty}$ 

 $\lfloor cm^2 \rfloor$ 

 $4.27 \times 10^{-2}$ 

 $2.95 \times 10^{-6}$ 

 $2.86 \times 10^{-5}$ 

 $L_{0.25}$ 

 $[\frac{MeVcm^2}{mg}]$ 

18.87

10.32

19.95

## **RISC-V: SEU detection**

### **Motivation**

#### **Security Features**

- Ibex can implement a set of extra features to support security-critical applications
- · Main strategy: Ibex core can detect external attacks due to corrupted states
- Alerts provided by dedicated signals

#### **Research Question:**

Can these built-in security features be used to detect SEUs within the Ibex core?



https://ibex-core.readthedocs.io/en/latest/03 reference/security.html

## **RISC-V: SEU detection**

### **Research Methodology**

#### **Testbench architecture**

- CoCoTB testbench
	- · Ibex RTL code
	- Python models for SoC
		- · Data/Instruction memory
		- Stdio
		- $\blacksquare$
	- Random SEU injection
	- (Pre-pass with Genus to extract flip-flop list)
- Application code compiled and loaded in I-memory
- . Xcelium RTL simulator



**KU LEUVEN** 

### **RISC-V: SEU detection**

### **Fault Injection Simulation Results**

#### **Results by symptom**



TB found CRC error but alert was low

# **Any questions?**

 $\leq$ 

# **CMS Level-1 Trigger: Anomaly detection**

### **Model Performance**

- AXOL1TL is trained with unbiased data collected by the CMS Experiment during 2023 with  $\sqrt{s} = 13.6$  TeV
	- 10.5 million events used 50% for training, 50% for setting thresholds
- Dotted lines represent the score thresholds implemented in the **Global Trigger Test Crate**
- Significant performance improvement on various SM and BSM signals by adding AXOL1TL to the 2023 trigger menu
	- · Signal samples are Monte-Carlo generated
	- Table shows performance improvement for a Higgs decaying to 2 (pseudo-) scalars to bottom quarks







### **CMS Level-1 Trigger: Anomaly detection**



**TWEPP 2023** 

# **GNN in an FPGA @ sPHENIX**

### **The DAQ-AI Data Flow**

- Motivation to use FFLIX board
	- To reuse the PCIe implementation (16-lane Gen-3) and software tools provided by the **FELIX developers**
	- on-board FPGA is a Kintex Ultrascale XCKU115FLVF1924-2E
- The decision signal of heavy flavor event from the AI-Engine will be sent out via the **LEMO connectors to the sPHENIX GTM/GL1** system to initiate the TPC readout in the triggered mode
- GPU based feed-back system for the beamspot monitoring



**Streaming automated controls** 





Science and **Technology Facilities Council** 

# **ATLAS Global Common Module (GCM)**





Andrea Contu - INFN Cagliari

5

### Real time tracking with FPGAs

- Modern FPGAs can perform parallel data  $\qquad \qquad \bullet$ processing with high throughputs, low latencies and better energy efficiency than CPUs and GPUs (for certain tasks)
- This talk: demonstrator system for real-time  $\bullet$ tracking on FPGAs with the "artificial retina" architecture to reconstruct tracks in the **Vertex Locator**



PCIe 16x board, 1 Intel Stratix 10 FPGA, 16 optical links





 $\overline{7}$ 

#### The RETINA demonstrator at the testbed

- Simulated data used for high rate tests  $\qquad \qquad \bullet$
- Now live data from the LHCb monitoring farm  $\bullet$
- Demonstrates that a RETINA based tracking on FPGAs  $\bullet$ possible in HEP experiments:
- Current setup:  $\qquad \qquad \bullet$ 
	- Reconstructs tracks of a VELO quarter  $\circ$
	- Spread over multiple PCIe-hosted FPGA cards. 8 cards are sufficient  $\circ$
	- Scalable to cover the whole detector with additional FPGA cards.  $\circ$



#### Distribution network

- As the RETINA algorithm is spread among several 3  $\bullet$ boards, a distribution is needed to exchange hits among boards:
	- 8 nodes full-mesh network  $\circ$
	- 28 full-duplex links at 25.8 Gbps  $\circ$
	- Total bandwidth 1.41 Tbps  $\circ$





 $12<sup>2</sup>$ 

#### **ELECTRONIC SYSTEM DESIGN FLOW IN HEP**

Limitation of the currently used design flow:

- Based only on a low-abstraction level description of the system (hardware level).  $\bullet$
- Architecture exploration is time and resources-heavy  $\bullet$
- in multi-chip modules/detector, single chip are optimized separately  $\bullet$



#### **ELECTRONIC SYSTEM LEVEL APPROACH**

Develop a high abstraction level description of the system, from front-end to back-end, for:

- architecture exploration
- new feature development

Requires a self-contained environment for Virtual prototyping

reference model development



#### PIXESL: AN ELECTRONIC SYSTEM LEVEL PROTOTYPING FRAMEWORK

#### **Open source:**

- The model is based on C++ and CSYSTEMC
- Performance analysis are based on Python

#### User-friendly:

- User and developer roles are separated  $\bullet$
- The framework supports architectural and network ٠ configurability (structure, memory, arbitration, interconnections)

#### **Reusable:**

- Generalized layers and standardized packet transport (TLM)  $\bullet$
- A library of layer types, functional components, and packet transport types
- Common integrated metrics analyzer  $\bullet$





#### **LHCB VELO UPGRADE II ARCHITECTURE EXPLORATION**

The upgrade aims at a 4D pixel detector.

https://cds.cern.ch/record/2844669/

Main readout challenge:

extreme occupancy (x2 Velopix)

#### Flow:

Model Velopix (VELO upgrade I ROC) Simulate higher occupancy events **Find bottlenecks** Repeat!

Optimize architecture



#### **ON-CHIP PACKET SORTING DESIGN SPACE EXPLORATION**



The number of bin depends on the latency of the readout efficiency and on the target grouping efficiency

