# Development of Associative Memory ASIC in 28nm technology

Francesco Crescioli



Journées VLSI - FPGA - PCB et Outils CAO de l'IN2P3 16-5-2018

































## AMchip06





Produced 25 wafers (8000 chips) +25 wafers in 2018

- Digital ASIC
  - ▶ 65 nm TSMC
  - ▶ 100 MHz
  - ▶ 168 mm<sup>2</sup>
  - 128k 8x16 bit patterns
    - 2 ternary bits + 14 binary bits
  - Flip-chip BGA
  - ► MGT I/O at 2/2.4 Gbps
- ► Full-custom CAM cell
  - XORAM technology
    - doi:10.1109/ICECS.2012.6463629
  - Optimized for low power
- Power consumption during operations avg 3 W

#### Power consumption figure of merit:

2.3 fJ/bit/comparison @ 1.15 V 1.8 fJ/bit/comparison @ 1.0 V



# Goals for the ATLAS/CMS Phase-II chip

- ▶ 384k patterns in  $\simeq 150 \text{ mm}^2$ 
  - ▶ 16 bit \* 8 busses
  - 2 ternary bits + 14 binary bits
- 250 MHz comparison clock
- LVDS DDR IOs @ 500 MHz to reduce pinout, but to avoid MGT complexity
- ► At least 1 fJ/bit/comparison



#### **AM07**



 $17 \times 17 \text{ BGA}$ 

- ▶ 10 mm<sup>2</sup> 28 nm
- ▶ 4 × 4*k* patterns organized in independent cores
  - 2x DOXORAM, 2x KOXORAM
  - ▶ Both are evolutions of the XORAM of AM06. Patent pending in Italy.
- ▶ 1 bus using 9x LVDS pairs in DDR mode (18 bit)
- ▶ 7 busses using 18 bit LVCMOS
- 4 outputs LVCMOS (12 bit address + 8 bit hitmap)
- Designed to run internally at 200 MHz
- ► RX/TX LVDS test drivers

### AM07 Internals



- ► Full-custom blocks (CAM, LVDS) designed with Cadence Virtuoso
- ► Standard cell logic synthesized from VHDL using Cadence Genus
- ► Top level integration and P&R using Cadence Innovus
- Static timing analysis using Cadence Tempus
- ► IR drop analysis using Cadence Voltus
- Functional simulation in UVM using Cadence Incisiv



#### AM07 Test Board



To test the AM07 we need two FPGAs on the test board to drive all the LVCMOS and LVDS signals.



2x Xilinx Kintex 160t Si5380 low jitter clock manager (LTE freq optimized, not all freq available)

Firmware based on IPBus + python scripts

#### **Functional Tests**

- ► AM07 has an internal feature to scan the pattern bank and produce an unique CRC value
  - Scan is successful up to 245 MHz (chip was designed for 200 MHz)
- ► LVDS RX/TX tested up to 1.1 gbps
- ▶ In the FPGA we can tune all clk relative phases and individual delays for each IO to deskew the busses
  - ▶ Zero errors up to 150 MHz using all busses, CLK via LVCMOS
  - Zero errors up to 180 MHz using LVDS bus and CLK via LVDS
- ► Tests still ongoing (FPGA fw more difficult than expected)

#### **BIST & LVDS**



# Power consumption

Precise measurements are still ongoing, as the power consumption depends on the data and the matching and it must be verified in functionality tests.

However preliminary results show a good agreement with the Calibre simulations:

| Tech.   | Meas (fJ/bit/comp) | Sim (fJ/bit/comp) |
|---------|--------------------|-------------------|
| KOXORAM | 0.748              | 0.69              |
| DOXORAM | 0.851              | 0.91              |

#### **AM08**

The purpose of this small area prototype (10 mm<sup>2</sup>) is to finalize all the features and interfaces that will be used by the final chip (AM09).

- AM08 development already in an advanced state
  - Submission foreseen in the fall
- ► New features:
  - ▶ All LVDS DDR IOs @ 500 MHz
    - Control words encoded in the data stream
  - Configuration via SPI
  - ▶ 250 MHz core freq
  - Internal DPLL
    - To generate 8x 250 MHz clocks 45° apart to be used in different cores and spread the power consumption over the clk period



# **Five papers** accepted at IEEE's International Symposium on Circuits and Systems (ISCAS):

- Characterization of an Associative Memory chip in 28 nm CMOS technology
- Characterization of an LVDS link in 28 nm CMOS for multi-purpose pattern recognition chip
- Design and characterization of new Content Addressable Memory cells
- A fully-digital delay locked loop in 28 nm CMOS
- ▶ Temperature sensor with process and mismatch auto-compensation technique in 28 nm CMOS

#### Conclusions

- ► The Associative Memory is a device designed for real-time pattern recognition in high performance computing applications
  - It has been used in tracking hardware processor at hadron collider experiments (CDF SVT, ATLAS FTK)
  - ► It will be used in future HL LHC experiments upgrades (ATLAS HTT)
  - To meet new requirements we pursued the development of the AM in 28 nm
- ► AM07, the first fully functional prototype in 28 nm, has been produced in 2017 and it's under test
  - Functionality has been verified (at least up to 150 MHz, internally up to 245 MHz)
  - Power consumption is compatible with the simulation expectations and in track with our roadmap to the final chip
- ▶ AM08 is under design, it will be submitted in fall 2018
  - ► Features & interface of the final chip