

# A data acquisition system for the Cerenkov Telescope Array

Julien HOULES, Dirk HOFFMANN
CPPM/IN2P3/CNRS
And the CPPM CTA group

Contact: houles@cppm.in2p3.fr



### Camera server

### **Global architecture**



### **Camera data flow**



Whole Camera ~ 2000 PM -> 300 front end boards



### Camera server

- Build event
- L2 trigger on camera server (L1 on front end) :
  - CPU (SSE, AVX...)
  - GPU
- Compress ?
- Send data to central server (array level)



# **Data flow hypothesis**

- ~ 2000 pixels camera
- L1 trigger rate : 10 KHz
- Size of sampling data for 1 PM: 144 bytes (16 bit \* 72 samples)
- No data loss (all the L1 events are sent)
  - $\rightarrow$  Max theorical bandwidth = 10000 \* 2000 \* 144 = 2.88 GB/s = 23 Gb/s
- 7 detectors for each front end board: 300 boards/camera
  - Each board generates a flow of 2880/300 = 9.6 MB/s = 77 Mb/s

https://portal.cta-observatory.org/WG/ACTL/SitePages/Data%20Rates.aspx



### **Global architecture**



### Camera infrastructure



### **Dell Precision T7500**



- Two Intel Xeon X5650
   (2.66GHz,6.4GT/s,12MB,6Cores)
- Memory: 24GB (6x4GB) 1333MHz
- Intel X520 DA2 10GbE Dual Port SFP+ Server Adapter, PCIe x8

- Triple channel (maximum speed reached)
- QPI at 6.4 GT/s (maximum speed on the market)
- Memory DDR3-1333
- 2 full speed full duplex 10 Gb/s links (PCIe x8 Gen 2)
- 1 PCle x16 slot free (->GPU)
   and 1 PCle x8 free (-> one more 10 Gbps adapter)
- SFP+ -> Copper or Optical link

~ 3500 euros



### **Dell Powerconnect 6248**



- 48 \* 1 Gb/s ports
- Backplane 184 Gb/s
- 2 \* 10 Gb/s SFP+ ports included
   2 more 10 Gb/s optional ports
- Up to 12 switches stackable
  -> 576 ports

~ 1500 euros (with 2 \* 10 Gb/s)





# **Event builder**

# Why a prototype?

#### We need a prototype:

- To evaluate the maximum speed reachable
- To test several technologies
- To validate different approaches of the data processing
- To adapt our needs to what we can do



# Our first approach

- High modularity to make adaptation to different front end electronics easier
- Multitask approach to divide the flow processing if needed
- Use of a standard Linux distribution but take control on scheduling and memory allocation
- Constrained electronics to reach the best performances (in a first time)

### **Event builder**



# Data format : regular frame

1 frame: 1024 bytes

#### Level 2 triggering on camera server



# Data format : jumbo frame

1 frame : 8192 bytes

#### Level 2 triggering on camera server



### **Software overview: 1st architecture**



# Software overview: 2<sup>nd</sup> architecture



### Stimulation configuration



### **Stimulation room**





### First results: event builder

150 nodes (15 per HP server) sending data to interface 1 150 nodes (15 per HP server) sending data to interface 2

#### Tests of the event building with varying packet size:

| <u>1<sup>st</sup> Achitecture</u>                                                                        | 2 <sup>nd</sup> Achitecture                                                                                |  |  |
|----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|--|--|
| Jumbo frames (8192 bytes): 19,2 Gb/s (2,4 GB/s ~ 8000 evts/s) with no loss CPU usage: 300 % (3 cores/12) | Jumbo frames (8192 bytes): 19,2 Gb/s (2,4 GB/s ~ 8000 evts/s) with no loss CPU usage: 160 % (1.6 cores/12) |  |  |
| Regular frames (1024 bytes): 6,5 Gb/s (0,82 GB/s) with no loss CPU usage: 300 % (3 cores/12)             | Regular frames (1024 bytes): 8 Gb/s (1 GB/s) with no loss CPU usage: 170 % (1.7 cores/12)                  |  |  |



Test of a direct I/O solution to improve small frames reception in progress

### **Integration in ACS**

The basic functions of the Event Builder are available from the ACS interface





# **Stimulator**

### **Need for a real stimulator**

#### Need a stimulator to make:

- timing measurements on software
- real time validation
- algorithms validation
- trigger validation
- latency measurements on network
- front end boards and stimulator mix
- validate the complete acquisition chain

# **Testing configuration**



Powerconnect 6248 stack





10 Gb/s links

To camera server

### **EVOC NET-1820**

#### Most promising candidate (50 € / port)



Intel Atom D525 dual core processor 1.8GHz

4.0 GB RAM

6 x Intel 82574L Giga LAN

(supports 9K frames and boot on LAN)

8-bit Digital I/O interface

1 x Parallel port, Serial port

~ 300 € each → ~ 15000 € for 300 ports without switches

Measured throughput @ CPPM:  $\sim$  2,4 Gb/s (400 Mb/s per port)  $\rightarrow$  can easily be improved.



### **Future**

### **Future work**

- Test of a direct I/O solution to improve regular frames reception (in progress)
- Perform precise measurements on performances
- Improve the event builder

 Build a full-size stimulator

 Design a L2 trig (CPU ? GPU ?)  Work with slow control and array server communications teams

- Full ACS integration
- Make the software reliable enough for production stage



### **Interface definition**

### Data format : type 1.0

The front end electronics transmit all events (after a L1 triggering)

| Event n   |                          |                     | -n 'n C           |
|-----------|--------------------------|---------------------|-------------------|
| Event n+1 | Header flag Empty Time 1 | Event number Time 2 | Level 1<br>header |
| Event n+2 | PM 1 sam                 | npling data         |                   |
| Event n+3 | PM 2 sam                 | pling data          |                   |
| Event n+4 | PM 3 sam                 | npling data         |                   |
| Event n+5 | PM 4 sam                 | npling data         |                   |
| Event n+6 | PM 5 sam                 | pling data          |                   |
| Event n+7 | PM 6 sam                 | npling data         |                   |
|           | PM 7 sam                 | npling data         |                   |

### Data format : type 1.1

The front end electronics transmit a single value for each PM for all events

(after a L1 triggering)

| Event r  | 1  | (alter a LI triggeri |        |              |                |
|----------|----|----------------------|--------|--------------|----------------|
| 276111   | •  | Header flag          | Empty  | Event number | Level 1 header |
| Event n- | -1 | Time                 | e 1    | Time 2       | el 1<br>der    |
| Event n- | -2 |                      | PM 1 m | ax/tot       |                |
| Event n- | -3 |                      | PM 2 m | ax/tot       |                |
| Event n- | -4 |                      | PM 3 m | ax/tot       |                |
| Event n- | -5 |                      | PM 4 m | ax/tot       |                |
| Event n- | -6 |                      | PM 5 m | ax/tot       |                |
| Event n- | -7 |                      | PM 6 m | ax/tot       |                |
|          |    |                      | PM 7 m | ax/tot       |                |
|          |    |                      |        |              | _              |

# Data format: type 2.0

The front end electronics only transmit L2 triggered events

| Event n                   |     |                   |              | ~              |
|---------------------------|-----|-------------------|--------------|----------------|
|                           | \   | Header flag Empty | Event number | Jee<br>Jee     |
| Event n+6                 |     | Time 1            | Time 2       | Level 1 header |
| Eventino                  |     |                   |              | 7 1            |
| Event n+24                |     | PM 1 sa           | mpling data  |                |
| Event n+31                |     | PM 2 sa           | mpling data  |                |
| Event n+34                |     | PM 3 sa           | mpling data  |                |
| Event n+52                |     | PM 4 sa           | mpling data  |                |
| Event n+67                |     | PM 5 sa           | mpling data  |                |
| Event n+72                |     | PM 6 sa           | mpling data  |                |
| ► Be careful about latend | cy! | PM 7 sa           | mpling data  |                |

# Data format : type 2.1

The front end electronics transmit a single value for each PM in

L2 triggered events

|   | Event n                  |     | LZ inggered events |        |              |                |
|---|--------------------------|-----|--------------------|--------|--------------|----------------|
|   |                          |     | Header flag        | Empty  | Event number | hea            |
|   | Event n+6                |     | Time               |        | Time 2       | Level 1 header |
|   | Event n+24               |     |                    | PM 1 m | ax/tot       |                |
|   | Event n+31               |     |                    | PM 2 m | ax/tot       |                |
|   | Event n+34               |     |                    | PM 3 m | ax/tot       |                |
|   | Event n+52               |     |                    | PM 4 m | nax/tot      |                |
|   | Event n+67               |     |                    | PM 5 m | ax/tot       |                |
|   | Event n+72               |     |                    | PM 6 m | ax/tot       |                |
| > | Be careful about latence | cy! |                    | PM 7 m | ax/tot       |                |
|   |                          |     |                    |        |              |                |

CPPM/IN2P3/CNRS

### Data format: type 3.0

The front end electronics only transmit triggering PM in L2 triggered events



### Data format: type 3.1

The front end electronics transmit a single value for triggered PM in L2 triggered events



### Data format : type X.Y.1

sampling datas beginnings are truncated, can be applied to the types described to the type d



### **Data format: discussion**

The formats exposed are just a proposition and must be discussed with all the concerned teams





# **Backup**

### **Non Uniform Memory Access**



QPI @ 6.4 GT/s bandwidth < DDR3-1333 memory bandwidth