





**Kristian Harder** 

PhD Hamburg University/DESY 1998–2002: QCD analysis (OPAL, TESLA) track reconstruction software (TESLA)

About me

Fermilab 2002–2006:

electroweak analysis  $(D\emptyset)$ silicon detector back-end electronics  $(D\emptyset)$ 

RAL 2006-:

silicon detector simulation (ILC) exotica analysis (CMS) readout+trigger electronics (CMS, DUNE)



#### Technology Facilities Council

# **About this lecture**

UGT SHEVH-JZ

Many of you will have to program FPGAs during your project or afterwards. Not many of you will have to design new FPGAs... I will focus on the practical aspects of working with FPGAs. **Targeting absolute beginners!** 

14:00 – 15:00 introductory lecture

- 15:00 15:20 coffee break
- 15:15 17:30 work in lab / discussion / Q&A

(might finish earlier)



#### Field Programmable Gate Array

an integrated circuit consisting of

- lpha a large number of blocks with logic gates
- **\*** connected by a programmable interconnection fabric
- ★ accessible through I/O blocks



Program your own electronic circuit onto this chip, anywhere, anytime! (within available resources)



# **FPGA:** trivial example





# **FPGA:** trivial example









Science and Technology Facilities Council

# **FPGA:** trivial example





#### A look-up table is a small memory bank that encodes a general logic function:

| input A | input B | input C | output |
|---------|---------|---------|--------|
| 0       | 0       | 0       | 0      |
| 0       | 0       | 1       | 1      |
| 0       | 1       | 0       | 1      |
| 0       | 1       | 1       | 0      |
| 1       | 0       | 0       | 1      |
| 1       | 0       | 1       | 0      |
| 1       | 1       | 0       | Ο      |
| 1       | 1       | 1       | 1      |

LUTs can be programmed to act as basic logic gates (AND, OR, etc), but also as complex combinations.



#### Configurable logic block by Xilinx (probably outdated):



Iook-up tables to manipulate inputs
 multiplexers to route the signals
 flip-flops (clocked storage devices) to hold the outputs
 multiple blocks running on same clock for synchronous operation



#### Multiple logic blocks working together

What happens if block A output changes just as block B is reading it? need synchronous operation – everything running on same clock

Any individual task needs to
★ either finish well within one clock cycle
★ or be broken down into subtasks with registers in between

How much can be done within one clock cycle
depends on the clock frequency
Faster clocks means things get done more quickly,
but you might have to break them down into smaller bits.
faster clock is not always better



Science and Technology Facilities Council

# synchronous sequential logic

intermediate logic





#### Configurable logic block by Xilinx (probably outdated):



#### other blocks on modern FPGAs:

high speed transceivers, PCIe interfaces, ethernet interfaces, memory banks, clock generators, DSPs, interfaces for external RAM, even entire CPUs... (but typically digital electronics only)



**张** 

# modern FPGAs

# example of resources available on current generation FPGAs: Xilinx Virtex Ultrascale+ devices

| Device Name                                 | VU3P  | VU5P  | VU7P  | VU9P  | VU11P | VU13P  | VU19P |
|---------------------------------------------|-------|-------|-------|-------|-------|--------|-------|
| System Logic Cells (K)                      | 862   | 1,314 | 1,724 | 2,586 | 2,835 | 3,780  | 8,938 |
| CLB Flip-Flops (K)                          | 788   | 1,201 | 1,576 | 2,364 | 2,592 | 3,456  | 8,172 |
| CLB LUTs (K)                                | 394   | 601   | 788   | 1,182 | 1,296 | 1,728  | 4,086 |
| Max. Dist. RAM (Mb)                         | 12.0  | 18.3  | 24.1  | 36.1  | 36.2  | 48.3   | 58.4  |
| Total Block RAM (Mb)                        | 25.3  | 36.0  | 50.6  | 75.9  | 70.9  | 94.5   | 75.9  |
| UltraRAM (Mb)                               | 90.0  | 132.2 | 180.0 | 270.0 | 270.0 | 360.0  | 90.0  |
| HBM DRAM (GB)                               | -     | -     | -     | -     | -     | -      | -     |
| HBM AXI Interfaces                          | -     | -     | -     | -     | -     | -      | -     |
| Clock Mgmt Tiles (CMTs)                     | 10    | 20    | 20    | 30    | 12    | 16     | 40    |
| DSP Slices                                  | 2,280 | 3,474 | 4,560 | 6,840 | 9,216 | 12,288 | 3,840 |
| Peak INT8 DSP (TOP/s)                       | 7.1   | 10.8  | 14.2  | 21.3  | 28.7  | 38.3   | 10.4  |
| PCIe* Gen3 x16                              | 2     | 4     | 4     | 6     | 3     | 4      | 0     |
| PCIe Gen3 x16/Gen4 x8 / CCIX <sup>(1)</sup> | -     | -     | -     | -     | -     | -      | 8     |
| 150G Interlaken                             | 3     | 4     | 6     | 9     | 6     | 8      | 0     |
| 100G Ethernet w/ KR4 RS-FEC                 | 3     | 4     | 6     | 9     | 9     | 12     | 0     |
| Max. Single-Ended HP I/Os                   | 520   | 832   | 832   | 832   | 624   | 832    | 1,976 |
| Max. Single-Ended HD I/Os                   |       |       |       |       |       |        | 96    |
| GTY 32.75Gb/s Transceivers                  | 40    | 80    | 80    | 120   | 96    | 128    | 80    |

#### NB: very difficult to use anywhere near 100% of those resources due to limitations of the interconnection fabric



CPUs and FPGAs are capable of performing arbitrary tasks depending on programming.

But the approach is fundamentally different:

#### CPU

rigid silicon — the processing units remain fixed (except basics like enabling/disabling cores in multicore processors) Flexibility from stepping through instructions provided in memory

#### **FPGA**

flexibility arises from reconfiguring the fabric itself, producing a highly specialised processing unit

very different type of device
 major differences in how they are being programmed
 suitable for different types of application



#### example: edge detection in histogram (e.g. line of video pixels)



| scan a line like                                                                                  |                  |  |
|---------------------------------------------------------------------------------------------------|------------------|--|
| <pre>scan a line like for i in range(1,17):     edge[i] = (abs(hist[i]-hist[i-1])&gt;thres)</pre> | CPU              |  |
|                                                                                                   | scan a line like |  |
|                                                                                                   |                  |  |
|                                                                                                   |                  |  |



#### example: edge detection in histogram (e.g. line of video pixels)





instantiate a bunch of comparators, get result in O(1) clock cycle



#### example: edge detection in histogram (e.g. line of video pixels)



FPGA benefits from parallel instantiation of a large number of specialised logic circuits

NB: GPUs are somewhat in the middle — massive parallelisation with simplified CPUs Kristian Harder, RAL HEP graduate lectures 2024



**FPGAs** offer advantages over CPUs for specific applications:

- 🔆 high degree of parallelisation, pipelining
- 🖈 many high speed data links
- $\bigstar$  precise control over data path  $\rightarrow$  fixed latency

**FPGAs** have weaknesses too:

- **\*** complex arithmetic (floating point numbers etc)
- ★ cost (depending on parameters)



#### Use of FPGAs in particle physics

- ★ L1 trigger
- 🖈 DAQ
- $\bigstar$  clock distribution (incl fast commands, triggers)

#### **FPGA** use elsewhere

high performance computing: FPGAs supporting CPUs
 aerospace, automotive, telecommunications, BitCoin mining
 prototyping of new ASICs



#### Use of FPGAs in particle physics

- ★ L1 trigger
- 🖈 DAQ
- 🔆 clock distribution (incl fast commands, triggers)

#### **FPGA** use elsewhere

high performance computing: FPGAs supporting CPUs
 aerospace, automotive, telecommunications, BitCoin mining
 prototyping of new ASICs

Note on ASICs (Application Specific Integrated Circuits): actual custom chips can have higher speed, higher density, lower power, more resources, can be cheaper in large quantities, require no in situ programming BUT: much longer time to availability, mistakes are expensive we typically use them only in detector front-end



# **FPGA** vendors

AMD XILINX. about 50% market share, full range incl high end



formerly Altera, about 35% market share



low power, low cost devices



*Microsemi* low power, radiation hard, non-volatile (flash based)

...and probably others! This lecture focusing on AMD/Xilinx — used by CMS-UK, ATLAS-UK



# commercial FPGA platforms

#### examples for commercially available FPGA boards:



# **FPGA** accelerator card based on Altera device



FPGA RF/optical I/O card based on Xilinx device



#### **Imperial College's Serenity board**



multi-purpose board
 mostly for CMS upgrade
 two FPGA sites
 FPGAs easily replaceable
 optical high speed links
 ATCA form factor
 comes with single board PC



FPGA manufacturers provide development platforms for their FPGAs: FPGA on circuit board with peripherals and infrastructure

#### benefits:

- **A** often available very early on after release of new devices
- **A** often relatively low cost because subsidised

# In widespread use in our labs! ★ tested algorithms for Serenity long before prototypes available ★ experience can actually influence custom board design ★ ideal for learning: availability, example designs



Science and Technology Facilities Council

# Xilinx zcu102





Science and Technology Facilities Council



**Digilent Nexys A7** 



The set of configuration instructions for an FPGA
★ is not actually modifying hardware,
★ but it is not a software algorithm either.
It is somewhat in between, which is why it is called



Description either graphically as schematics, or in a hardware description language.

> hardware description language looks similar to software, but is very different.



### schematics



We won't use schematics today, but we will implement <u>this</u> design. Note inputs, outputs, blocks, signals, busses, constants, and a loop! Relatively easy to understand, but not very practical for complex tasks.

#### Two main languages in use:

Science and

Technology Facilities Council

КК

|                      | VHDL       | Verilog   |
|----------------------|------------|-----------|
| resemblance          | Pascal/Ada | С         |
| strong types         | yes        | no        |
| composite data types | yes        | no        |
| case sensitive       | no         | yes       |
| library management   | yes        | no        |
| who in CMS likes it  | Brits      | Americans |

| VHDL:                         |                        | Verilog: |                               |  |
|-------------------------------|------------------------|----------|-------------------------------|--|
| 2 process ({S0,S1},A,B,C,D)   |                        | 1        |                               |  |
| 3                             | begin                  | 2        |                               |  |
| <pre>4 case {S0,S1}, is</pre> |                        | 3        | always @({S0,S1}, A, B, C, D) |  |
| 5                             | 5 when "00" => Y <= A; |          | case ({S0,S1})                |  |
| 6                             | when "01" => Y <= B;   | 5        | 2'b00: Y = A;                 |  |
| 7                             | when "10" => Y <= C;   | 6        | 2'b01: Y = B;                 |  |
| 8                             | when "11" => Y <= D;   | 7        | 2'b10: Y = C;                 |  |
| 9                             | when others => Y <= A; | 8        | 2'b11: Y = D;                 |  |
| 10 end case;                  |                        | 9        | endcase                       |  |
| 11                            | end process;           | 10       |                               |  |

#### from blog.digilentinc.com

#### Two main languages in use:

Science and

Technology Facilities Council

|                      | VHDL       | Verilog   |
|----------------------|------------|-----------|
| resemblance          | Pascal/Ada | С         |
| strong types         | yes        | no        |
| composite data types | yes        | no        |
| case sensitive       | no         | yes       |
| library management   | yes        | no        |
| who in CMS likes it  | Brits      | Americans |

We use VHDL in our projects because

**X** complex data types make interfaces easier to read (and write)

**\*** strong typing reduces margin for error

 $\star$  it does seem to be a bit easier to read

Firmware design is intrinsically modular. Can mix Verilog, VHDL and schematic design in one project. (Prefer not to.)



# example project

Let's make some LEDs blink on our Nexys A7 development board! We will need:



# clock ★ all development boards have oscillators ★ some connected to FPGA directly ★ some connected through programmable clock chips ★ high speed clock signals are often differential

#### firmware

**X** VHDL design discussed on following pages



2 3

5 6 7

8

17

19 20

21

22

27

31

41

45

# top level block

--- simple "hello world" example

```
library IEEE;
     use IEEE.std logic 1164.all;
     use IEEE.numeric std.all;
     Library UNISIM;
     use UNISIM.vcomponents.all;
 9
10
11 🖂
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                          : out STD LOGIC VECTOR (7 downto 0)
                 leds
14
          );
15 白
     end top:
16
18 🕀
     architecture rtl of top is
       signal clk
                     : std logic;
       signal count : unsigned(30 downto 0) := (others => '0');
23
     begin
24
25 🖯
        ibuf: BUFG
26
            port map(
              i => sysclk,
28
               0 => clk
29 🖨
            );
30
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
             if rising edge(clk) then
36
                  count <= count + 1:
37 白
               end if:
38 🖨
            end process;
39
40
         leds <= std logic vector(count(30 downto 23));</pre>
42
43
44 🗇
     end rtl;
```

#### This is all the VHDL we need today!

- This block connects to a 100 MHz clock input,
- sends the clock signal through a buffer,
- **\*** runs a 31 bit counter on that clock (highest bit should then alternate at about 0.1 Hz),
  - and connects the highest (i.e. slowest) bits to LEDs

Let's look at the code in detail



1 2 3

4

5

6 7

21

22

41

# top level block

--- simple "hello world" example

```
library IEEE;
     use IEEE.std logic 1164.all;
     use IEEE.numeric std.all;
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 Θ
     entity top is port(
12
                sysclk
                         : in STD LOGIC;
13
                leds
                         : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 🖨
     end top;
16
17
18 🖯
     architecture rtl of top is
19
20
        signal clk
                     : std logic;
       signal count : unsigned(30 downto 0) := (others => '0');
23
24
      begin
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               o => clk
29 🖨
            );
30
31
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
             if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 A
            end process;
39
40
         leds <= std_logic_vector(count(30 downto 23));</pre>
42
43
44 🖨
     end rtl;
45
```

#### $\leftarrow$ This is a comment



1 2

21

# top level block

--- simple "hello world" example

```
3
     library IEEE;
     use IEEE.std logic 1164.all;
 4
     use IEEE.numeric std.all;
 5
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 🖯
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                 leds
                          : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 白
     end top;
16
17
18 🕀
     architecture rtl of top is
19
20
        signal clk
                     : std logic;
        signal count : unsigned(30 downto 0) := (others => '0');
22
23
     begin
24
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               0 => clk
29 🛆
            );
30
31
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 🖨
            end process;
39
40
41
         leds <= std logic vector(count(30 downto 23));</pre>
42
43
44 🖂
     end rtl;
45
```

Load packages from libraries

💥 IEEE.std\_logic\_1164 has types for logic signals

IEEE.numeric\_std has numeric data types

 $\overline{\mathbf{x}}$ unisim.VComponents has declarations and simulation data for device-specific primitives



# top level block

42 43

45

--- simple "hello world" example

```
library IEEE;
     use IEEE.std logic 1164.all;
     use IEEE.numeric std.all;
     Library UNISIM;
     use UNISIM.vcomponents.all;
     entity top is port(
                sysclk
                          : in STD LOGIC;
                          : out STD LOGIC VECTOR (7 downto 0)
                leds
         );
     end top;
     architecture rtl of top is
       signal clk
                      : std logic;
       signal count : unsigned(30 downto 0) := (others => '0');
     begin
        ibuf: BUFG
           port map(
              i => sysclk,
              0 => clk
            );
        process(clk)
           begin
             if rising edge(clk) then
                  count <= count + 1;
               end if;
           end process;
        leds <= std_logic_vector(count(30 downto 23));</pre>
44 🖻
     end rtl;
```

#### **Declare a VHDL block with ports**



💥 We give this block a name (top)

🔆 and define connections (ports) to the outside



top level ports correspond to actual FPGA I/O pins



top level block

--- simple "hello world" example

```
2
 3
      library IEEE;
 4
     use IEEE.std logic 1164.all;
 5
     use IEEE.numeric std.all;
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 Θ
     entity top is port(
12
                 sysclk
                         : in STD LOGIC;
13
                 leds
                          : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 白
     end top;
16
17
18 🖯
     architecture rtl of top is
19
20
        signal clk
                      : std logic;
21
        signal count : unsigned(30 downto 0) := (others => '0');
22
23
24
      begin
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               o => clk
29 🖨
            );
30
31
32
33 🖯
         process(clk)
34 ¦
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37白
               end if;
38 A
            end process;
39
40
41
         leds <= std_logic_vector(count(30 downto 23));</pre>
42
43
44 🖨
     end rtl;
45
```

## — Describe the VHDL block



# top level block

--- simple "hello world" example

```
2
 3
     library IEEE;
     use IEEE.std_logic_1164.all;
 4
     use IEEE.numeric std.all;
 5
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 🖯
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                 leds
                          : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 白
     end top;
16
17
18 🖯
     architecture rtl of top is
19
20
        signal clk
                     : std logic;
21
       signal count : unsigned(30 downto 0) := (others => '0');
22
23
      begin
24
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               0 => clk
29 🖨
            );
30
31
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 🖨
            end process;
39
40
41
         leds <= std_logic_vector(count(30 downto 23));</pre>
42
43
44 🖨
     end rtl;
45
```

Declare internal signals we need



💥 consider signals more like wires, not as variables

can assign an initial state, though



top level block

--- simple "hello world" example

```
2
 3
      library IEEE;
 4
     use IEEE.std logic 1164.all;
     use IEEE.numeric std.all;
 5
 6
 7
      Library UNISIM;
 8
      use UNISIM.vcomponents.all;
 9
10
11 🖯
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                 leds
                          : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 白
     end top;
16
17
18 🕀
     architecture rtl of top is
19
20
        signal clk
                      : std logic;
21
        signal count : unsigned(30 downto 0) := (others => '0');
22
23
      begin
24
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               0 => clk
29 🖨
            );
30
31
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 🖨
            end process;
39
40
         leds <= std_logic_vector(count(30 downto 23));</pre>
41
42
43
44 🖂
     end rtl;
45
```

## Instantiate a different block



💥 this one is from a library

🔆 it buffers an incoming clock for distribution



we name this instance ibuf

we connect it to our signals



top level block

--- simple "hello world" example

```
2
 3
     library IEEE;
     use IEEE.std_logic_1164.all;
 4
     use IEEE.numeric std.all;
 5
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 🖯
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                 leds
                          : out STD LOGIC VECTOR (7 downto 0)
14
          );
15 白
     end top;
16
17
18 🕀
     architecture rtl of top is
19
20
        signal clk
                     : std logic;
21
        signal count : unsigned(30 downto 0) := (others => '0');
22
23
24
      begin
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               0 => clk
29 🖨
            );
30
31
32
33 🖯
         process(clk)
34 ;
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 🖨
            end process;
39
40
41
         leds <= std_logic_vector(count(30 downto 23));</pre>
42
43
44 🖂
     end rtl;
45
```

### a process



runs when specific events occur

here: rising edge of clk

allocate incremented value to count

(almost like software, isn't it?)



top level block

--- simple "hello world" example

```
2
 3
     library IEEE;
     use IEEE.std logic 1164.all;
 4
     use IEEE.numeric std.all;
 5
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
11 🖯
     entity top is port(
12
                 sysclk
                          : in STD LOGIC;
13
                          : out STD LOGIC VECTOR (7 downto 0)
                 leds
14
          );
15 白
     end top;
16
17
18 🕀
     architecture rtl of top is
19
20
        signal clk
                     : std logic;
21
        signal count : unsigned(30 downto 0) := (others => '0');
22
23
      begin
24
25 🖯
        ibuf: BUFG
26
            port map(
27
              i => sysclk,
28
               0 => clk
29 🖨
            );
30
31
32
33 🕀
         process(clk)
34 ;
            begin
35 🖯
              if rising edge(clk) then
36
                  count <= count + 1;
37 白
               end if;
38 🖻
            end process;
39
40
41
         leds <= std_logic_vector(count(30 downto 23));</pre>
42
43
44 🖂
     end rtl;
45
```

## connecting counter bits with LED



This is NOT a one time assignment

🔆 it connects signals like wires

every change in count will change the state of leds



# top level block

--- simple "hello world" example

```
2
 3
     library IEEE;
     use IEEE.std logic 1164.all;
 4
     use IEEE.numeric std.all;
 5
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
12
                sysclk
                        : in STD LOGIC;
13
                         : out STD LOGIC VECTOR (7 downto 0)
                leds
14
         );
15 白
     end top;
16
17
18 🖯
     architecture rtl of top is
19
20
       signal clk
                    : std logic;
       signal count : unsigned(30 downto 0) := (others => '0');
22
23
     begin
24
25 Θ
        ibuf: BUFG
26
           port map(
27
              i => sysclk,
28
              0 => clk
29 🖨
           );
30
31
32
33 🖯
        process(clk)
34 ;
           begin
35 E
             if rising edge(clk) then
36
                 count <= count + 1;
37 白
              end if;
38 🖨
           end process;
39
40
41
        leds <= std logic vector(count(30 downto 23));</pre>
42
43
44 🖻
     end rtl;
45
```

### one missing ingredient:

our design software needs to be told what pins clock and LEDs are connected to and what logic standard to use define <u>constraints</u> (separate file)

```
# Nexys A7 constraints file
    set property CFGBVS VCC0 [current design]
 2
    set property CONFIG VOLTAGE 3.3 [current design]
 3
 4 !
 5
    # System clock (100MHz)
    set property IOSTANDARD LVCMOS33 [get ports {sysclk}]
 6
    set property PACKAGE PIN E3 [get ports sysclk]
    create clock -period 10 -name sysclk [get ports sysclk]
 9
10
   # LEDs
11
    set property IOSTANDARD LVCMOS33 [get ports {leds[*]}]
12 ;
    set property PACKAGE PIN H17 [get ports {leds[0]}]
    set property PACKAGE PIN K15 [get ports {leds[1]}]
13 :
14 :
    set_property PACKAGE_PIN J13 [get_ports {leds[2]}]
15 :
    set property PACKAGE PIN N14 [get ports {leds[3]}]
16
    set property PACKAGE PIN R18 [get ports {leds[4]}]
17 :
    set property PACKAGE PIN V17 [get ports {leds[5]}]
    set property PACKAGE_PIN U17 [get_ports {leds[6]}]
18 :
19 | set property PACKAGE PIN U16 [get ports {leds[7]}]
```

(this information from Nexys A7 documentation)



# firmware workflow

There is a number of steps between VHDL and a blinking LED:

synthesis

translate VHDL into netlist (optimised components with connections)

implementation

map design onto actual FPGA resources,

assign place to entities and route signals along the fabric

bitfile generation

create actual bitstream that can be uploaded to device

**JTAG** configuration

connect to device via serial JTAG interface and configure it! most high-end FPGAs use volatile RAM to store configuration  $\rightarrow$  need to reconfigure with JTAG after each power-up

or store firmware in external flash ROM



# JTAG

**A** or a FPGA and a flash ROM for non-volatile firmware storage

All devices identify themselves, so software can verify expected type



Some boards have built-in JTAG controllers, just need USB cable. Others need external USB programmers:



JTAG connection can be used for debugging! Logic analyser cores



# Xilinx Vivado

| HEPLNX - TigerVNC ×                                                                                                                                    |                                                                                                                       |                                                                                                 |   |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|---|--|--|--|--|
| 🔹 Applications Places example - [/data/pff62257/fpga_lecture/example/example.xpr] - Vivado 2019.1                                                      |                                                                                                                       |                                                                                                 |   |  |  |  |  |
| example - [/data/pff62257/fpga_lecture/example/example.xpr] - Vivado 2019.1 ×                                                                          |                                                                                                                       |                                                                                                 |   |  |  |  |  |
| <u>F</u> ile <u>E</u> dit Flow Tools Rep <u>o</u> rts <u>W</u> indow Layout <u>V</u> iew <u>H</u> elp <u>Q-Quick Access</u> write_bitstream Complete 🗸 |                                                                                                                       |                                                                                                 |   |  |  |  |  |
| 🚍 🖘 🖷 🐘 🗙 🕨 👭 🍄 Σ 🚿 🖉 🖉                                                                                                                                |                                                                                                                       |                                                                                                 |   |  |  |  |  |
| Flow Navigator 😤 🗧 PROJECT MANAGER - example                                                                                                           |                                                                                                                       |                                                                                                 |   |  |  |  |  |
| ✓ PROJECT MANAGER                                                                                                                                      | Sources ? _ D L X Project Summary x top.vhd x ? D L                                                                   |                                                                                                 |   |  |  |  |  |
| Settings                                                                                                                                               |                                                                                                                       | //data/pff62257/fpga_lecture/example/example/example.srcs/sources_1/imports/example/top.vhd ×   | 1 |  |  |  |  |
| Add Sources                                                                                                                                            | V Design Sources (1)                                                                                                  |                                                                                                 |   |  |  |  |  |
| Language Templates                                                                                                                                     | • 🖬 top(rtl) (top.vhd)                                                                                                | $\begin{array}{c c c c c c c c c c c c c c c c c c c $                                          | _ |  |  |  |  |
| 🗘 IP Catalog                                                                                                                                           | ✓ □ Constraints (1) ✓ □ constrs_1 (1)                                                                                 | 19<br>20 signal clk : std_logic;<br>21 signal count : unsigned(27 downto 0) := (others => '0'); |   |  |  |  |  |
| V IP INTEGRATOR                                                                                                                                        | 🜓 constraints.xdc (target)                                                                                            | 22 :<br>23 : begin                                                                              |   |  |  |  |  |
| Create Block Design                                                                                                                                    | > 🔄 Simulation Sources (1)                                                                                            | 24 :<br>25 🗇 ibuf: IBUFDS                                                                       |   |  |  |  |  |
| Open Block Design                                                                                                                                      | > 🗅 Utility Sources                                                                                                   | 26 port map(                                                                                    |   |  |  |  |  |
| Generate Block Design                                                                                                                                  | Hierarchy Libraries Compile Order                                                                                     | 27 i => sysclk_p,<br>28 ib => sysclk_n,<br>29 o => clk                                          |   |  |  |  |  |
| ✓ SIMULATION                                                                                                                                           |                                                                                                                       | 30 (a) );<br>31 :                                                                               |   |  |  |  |  |
| Run Simulation                                                                                                                                         | Source File Properties ? _                                                                                            | 32:<br>33 © process(clk)                                                                        |   |  |  |  |  |
|                                                                                                                                                        | 🗈 constraints.xdc 🗧 🗲                                                                                                 |                                                                                                 |   |  |  |  |  |
| ✓ RTL ANALYSIS                                                                                                                                         | Contraction of                                                                                                        | ∧ 36 count <= count + 1;                                                                        |   |  |  |  |  |
| > Open Elaborated Design                                                                                                                               | Enabled                                                                                                               | 37 (⊇) end if;<br>38 (⊇) end process;                                                           |   |  |  |  |  |
|                                                                                                                                                        | Location: /data/pff62257/fpga_lecture/exa                                                                             | pie/ex 39 40                                                                                    |   |  |  |  |  |
| V SYNTHESIS                                                                                                                                            | XDC         ···         41         leds(7 downto 0) <= std_logic_vector(count(27 downto 20));           42         42 |                                                                                                 |   |  |  |  |  |
| Run Synthesis                                                                                                                                          | Size: 0.7 KB                                                                                                          | 0.7 KB 43                                                                                       |   |  |  |  |  |
| > Open Synthesized Design                                                                                                                              | <                                                                                                                     | 44 ⊖ end rtl;<br>45 :                                                                           |   |  |  |  |  |
| ✓ IMPLEMENTATION                                                                                                                                       | General Properties                                                                                                    |                                                                                                 |   |  |  |  |  |
| Run Implementation                                                                                                                                     | Tcl Console Messages Log Reports Design Runs x                                                                        |                                                                                                 |   |  |  |  |  |
| > Open Implemented Design                                                                                                                              | Q 素 ≑ I4 ≪ ▶ ≫ + %                                                                                                    |                                                                                                 | 1 |  |  |  |  |
| 1999 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 - 1997 -                                        | Name Constraints Status                                                                                               | WNS TNS WHS THS TPWS Total Power Failed Routes LUT FF BRAMS URAM DSP Start Elapsed Run Strategy |   |  |  |  |  |
| <ul> <li>PROGRAM AND DEBUG</li> </ul>                                                                                                                  | ✓ ✓ synth_1 constrs_1 synth_design Comp                                                                               |                                                                                                 | 1 |  |  |  |  |
| 👫 Generate Bitstream                                                                                                                                   | ✓ impl_1 constrs_1 write_bitstream Con                                                                                | olete! 7.227 0.00 0.055 0.00 0.00 0.00 0.00 0.682 0 1 28 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  | а |  |  |  |  |
| > Open Hardware Manager                                                                                                                                | ۲                                                                                                                     | >                                                                                               | > |  |  |  |  |
| JavaEmbeddedFrame                                                                                                                                      | pff62257@hepInw062:~                                                                                                  | ample - [/data/pff62257/fpga_lec ] pff62257@hepInw062: /data/pff62                              | ] |  |  |  |  |

## Vivado report

## Report after building our example firmware for the Nexys A7:

| Settings Edit                                                                                                                         |                                                                                                                                                                                    |                                                                                                                                                                            |                                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
| Project name:<br>Project location:<br>Product family:<br>Project part:<br>Top module name:<br>Target language:<br>Simulator language: | nexys_a7<br>/net/home/ppd/pff62257/fpga_tutorial/nexy<br>Artix-7<br>xc7a100tcsg324-1<br>top<br>VHDL<br>Mixed                                                                       | <sup>rs_</sup> a7                                                                                                                                                          |                                                                                                                               |
| Synthesis                                                                                                                             |                                                                                                                                                                                    | Implementation                                                                                                                                                             | Summary   Route Status                                                                                                        |
| Status:<br>Messages:<br>Part:<br>Strategy:<br>Report Strategy:<br>Incremental synthesis:                                              | <ul> <li>Complete</li> <li>1 warning<br/>xc7a100tcsg324-1<br/>Vivado Synthesis Defaults</li> <li>Vivado Synthesis Default Reports<br/>Automatically selected checkpoint</li> </ul> |                                                                                                                                                                            | Complete<br>2 warnings<br>xc7a100tcsg324-1<br>Vivado Implementation Defaults<br>Vivado Implementation Default Reports<br>None |
| DRC Violations                                                                                                                        |                                                                                                                                                                                    | Timing                                                                                                                                                                     | Setup   Hold   Pulse Width                                                                                                    |
| No DRC violations were<br>Implemented DRC Report                                                                                      |                                                                                                                                                                                    | Worst Negative Slack (WNS):<br>Total Negative Slack (TNS):<br>Number of Failing Endpoints:<br>Total Number of Endpoints:<br>Implemented Timing Report                      | 7.1 ns<br>0 ns<br>0<br>31                                                                                                     |
| Utilization Post                                                                                                                      | Synthesis   Post-Implementation                                                                                                                                                    | Power                                                                                                                                                                      | Summary   On-Chip                                                                                                             |
| LUT 1%<br>FF 1%<br>IO 4%<br>BUFG 3%<br>0 25                                                                                           | Graph   Table                                                                                                                                                                      | Total On-Chip Power:<br>Junction Temperature:<br>Thermal Margin:<br>Effective ଶ୍ରାA:<br>Power supplied to off-chip device<br>Confidence level:<br>Implemented Power Report | 0.104 W<br>25.5 °C<br>59.5 °C (12.9 W)<br>4.6 °C/W<br>es: 0 W<br>Medium                                                       |

КК

Science and

Technology Facilities Council



# simulation

More complex firmware is best tested in simulation first A very powerful tool for verification and debugging!

- \* exactly reproducible inputs
- **\*** much faster turnaround time than tests on hardware
- $\bigstar$  but not always a fully accurate reflection of timing, especially when timing is marginal or outside specifications
  - comparison of firmware output on simulation and on actual hardware is often part of verification procedure

Vivado has an integrated simulator

Third party software exists (e.g. Siemens/Mentor Graphics QuestaSim)



# simulation





# last topic: how to approach a BIG firmware project

software development: distributed, collaborative, version controlled, unit tests, release management firmware development: lonesome engineer with hard disk full of zip files



# last topic: how to approach a BIG firmware project

software development: distributed, collaborative, version controlled, unit tests, release management

firmware development: lonesome engine with here disk full of zip files

- 🔆 big very complex chips
- \* extremely high speed signals
- 🔆 distributed development
- magotiated interfaces
- 🛠 developers with varying skill levels



# Firmware projects like CMS L1 trigger are very demanding:

- 🖈 need to be very reliable
- **\*** subject to international collaboration and peer review

Need to work much more like with large software projects: modularity (leave the hardcore stuff to top experts)

- **version control and release management**
- 🔆 rigorous testing, project supervision

CMS L1 trigger firmware project:
★ separate framework and algorithm firmware
★ script-based firmware build system (also enforces module structure)
★ git repository (with automatic nightly builds)
★ formal developer and user support (ticket system)
▶ very successful model, proven in LHC run 2 already



R

 $\infty$ 

Science and Technology Facilities Council

conclusion

VILINIX

27.00-00

## Key points:

**FPGAs** are a very powerful tool

for low latency high throughput applications

FPGA programming by firmware has many similarities with software development, but important differences

# | čišiš| |=

We are moving more and more functionality into FPGAs,

e.g. in L1 trigger.

We have a lot more people who know how to write software than how to write firmware. This has to change.

## I hope I demonstrated today that writing firmware is no voodoo.



## We will have a coffee break now, then move to R1 PPD lab 6. We have five PCs with an FPGA board attached work in groups of 2–3 people

## We will:

- **X** go through the design+programming step by step
- **test** the example on an actual FPGA
- **\*** maybe try a few modifications
- $\star$  discuss any questions you might have



```
--- simple "hello world" example
```

```
2
 3
     library IEEE;
 4
     use IEEE.std logic 1164.all;
 5
     use IEEE.numeric std.all;
 6
 7
     Library UNISIM;
 8
     use UNISIM.vcomponents.all;
 9
10
12 !
                sysclk : in STD LOGIC;
                        : out STD LOGIC VECTOR (7 downto 0)
13 \
                leds
14
         );
15 🔶 end top;
16
17
18 🕀
     architecture rtl of top is
19
20
       signal clk : std logic;
       signal count : unsigned(30 downto 0) := (others => '0');
21
22
23
     begin
24
25 🖯
        ibuf: BUFG
26
           port map(
27
             i => sysclk,
28
              o => clk
29 🛆
           );
30 i
31 :
32
33 E
        process(clk)
34 ;
           begin
35 🖯
             if rising edge(clk) then
36
                 count <= count + 1;
37 🖨
              end if:
38 🖻
           end process;
39 !
40
        leds <= std logic vector(count(30 downto 23));</pre>
41
42
43
44 (a) end rtl;
45 !
```

```
1 # Nexys A7 constraints file
 2 set property CFGBVS VCCO [current design]
    set property CONFIG VOLTAGE 3.3 [current design]
 3 1
 4 !
 5 ; # System clock (100MHz)
    set property IOSTANDARD LVCMOS33 [get ports {sysclk}]
 6 ;
    set property PACKAGE PIN E3 [get ports sysclk]
 7
 8
    create clock -period 10 -name sysclk [get ports sysclk]
 9 ;
10 ' # LEDs
11 : set property IOSTANDARD LVCMOS33 [get ports {leds[*]}]
12 | set property PACKAGE PIN H17 [get ports {leds[0]}]
13 | set property PACKAGE PIN K15 [get ports {leds[1]}]
14 set property PACKAGE PIN J13 [get ports {leds[2]}]
15 set property PACKAGE PIN N14 [get ports {leds[3]}]
16 set property PACKAGE PIN R18 [get ports {leds[4]}]
17 set property PACKAGE PIN V17 [get ports {leds[5]}]
18 set property PACKAGE PIN U17 [get ports {leds[6]}]
19 | set property PACKAGE PIN U16 [get ports {leds[7]}]
```

### --- simple "hello world" example

library IEEE; use IEEE.std\_logic\_1164.all; use IEEE.numeric\_std.all;

Library UNISIM; use UNISIM.vcomponents.all;

#### entity top is port(

sysclk : in STD\_LOGIC; leds : out STD\_LOGIC\_VECTOR (7 downto 0); button : in STD\_LOGIC); end top;

architecture rtl of top is

signal clk : std\_logic; signal count : unsigned(30 downto 0) := (others => '0'); signal reset : std logic;

### begin

ibuf: BUFG
 port map(
 i => sysclk,
 o => clk
 );

process(clk)

```
begin
    if rising_edge(clk) then
        if reset = '1' then
            count <= (others => '0');
        else
            count <= count + 1;
            end if;
        end if;
end process;
```

leds <= std logic vector(count(30 downto 23));</pre>

process(clk)
 begin
 if rising\_edge(clk) then
 reset <= button;
 end if;
end process;</pre>

# Nexys A7 constraints file
set property CFGBVS VCC0 [current design]

set\_property CFGBVS VCC0 [current\_design]
set\_property CONFIG\_VOLTAGE 3.3 [current\_design]

### # System clock (100MHz)

set\_property IOSTANDARD LVCMOS33 [get\_ports {sysclk}]
set\_property PACKAGE\_PIN E3 [get\_ports sysclk]
create\_clock -period 10 -name sysclk [get\_ports sysclk]

### # LEDs

set\_property IOSTANDARD LVCMOS33 [get\_ports {leds[\*]}]
set\_property PACKAGE\_PIN H17 [get\_ports {leds[0]}]
set\_property PACKAGE\_PIN K15 [get\_ports {leds[1]}]
set\_property PACKAGE\_PIN J13 [get\_ports {leds[2]}]
set\_property PACKAGE\_PIN N14 [get\_ports {leds[3]}]
set\_property PACKAGE\_PIN R18 [get\_ports {leds[4]}]
set\_property PACKAGE\_PIN V17 [get\_ports {leds[5]}]
set\_property PACKAGE\_PIN U17 [get\_ports {leds[6]}]
set\_property PACKAGE\_PIN U16 [get\_ports {leds[7]}]

### # button

set\_property IOSTANDARD LVCMOS33 [get\_ports {button}]
set\_property PACKAGE\_PIN\_N17 [get\_ports {button}]