

#### **Picosecond TDC Design**

Ecole de Microélectronique IN2P3 2017

Moritz Horstmann CERN/EP-ESE-ME



# Outline

- Introduction
- TDC Design Overview
- Challenges in TDC Design
- picoTDC
- Delay Locked Loops



#### Introduction



### **TDC** applications in HEP

- Drift time in gas based tracking detectors
  - Low resolution: ~1ns
  - Examples: CMS and ATLAS muon detectors
- TOF, RICH
  - High resolution: 10ps 100ps
  - Example: ALICE TOF
- Background reduction
- Signal amplitude measurement: TOT









# Other TDC applications

- Laser ranging
- 3D imaging
- Medical imaging: TOF PET
  - Improve signal/noise and have lower radiation dose.
- Fluorescence lifetime imaging
- General instrumentation.
- Differences to HEP systems
  - Smaller systems Fewer channels
  - Averaging can in some cases be used to get improved time resolution











#### "Snapshots" of Higgs Boson Events at the LHC

# $H \rightarrow \gamma\gamma \qquad \qquad H \rightarrow ZZ^* \rightarrow ee_{\mu\mu}$



Ecole de Microélectronique IN2P3 2017

Moritz.Horstmann@cern.ch

7

#### High-Luminosity LHC





#### Signal Vertex Efficiency





#### **Time-Aware Vertexing**





#### 3D vs. 4D Vertex Reconstruction



4D reconstruction with track time information at ~25 ps



#### Pile-up = 200



**CMS Simulation** 



#### Particle-flow Event Reconstruction





#### **TDC Design Overview**



Ecole de Microélectronique IN2P3 2017

Moritz.Horstmann@cern.ch

#### **Time Measurement Chain**



#### **TDC Architectures**





#### **Time Measurements**

- Start stop measurement
  - Measurement of time interval between two local events:
    - Start signal Stop signal
  - Used to measure relatively short time intervals with high precision
  - For small systems (1 channel)
  - Like a stop watch for a local event
- Time tagging
  - Measure time of occurrence of events in relation to a given time reference Time reference (Clock) Events to be measured (Hit)
  - Used to measure relative occurrence of many events on many channels on a defined time scale
  - Such a time scale will have limited range but can be circular (e.g. LHC machine orbit time)
  - For large scale HEP systems
  - Like a normal watch with a common 24h scale





Start

Stop

# Interface to front-end and time walk compensation schemes

- Basic discriminator
  - Significant time walk (depending on signal slew rate)
- Double threshold
  - Interpolate to "0" volt amplitude
  - Needs two discriminators and two TDC channels, Limited efficiency reported in practice.
- TDC plus pulse amplitude (peak or charge) measurement with ADC
  - ADC measurement expensive and slow (may be needed anyway)









- Constant Fraction Discriminator: CFD
  - Compensate directly in discriminator
  - Works very well for fixed pulse shape with varying amplitude.
    - Needs delay: Made as distributed RC within ASIC (but also works as filter)
  - If signal shape not constant, then?
- Leading edge + Time Over Threshold (poor mans ADC)
  - Minimal extra hardware (also measure falling edge time)
  - Has been seen to work quite well in several applications.
  - If signal shape not constant then?
  - TOT now very often seen in HEP for indirect amplitude measurement with moderate resolution







- Alternative: Very fast analog sampling
  - Pulse matching highest possible flexibility and performance
  - High power low channel density
  - 64GHz 8b ADC's now feasible, 2W
    - 100GbE optical
  - Large amount of data to read out and process (unless done on chip).
  - Multiple sampling capacitor array chips made in HEP community
    - Sampling rate: 1 5Gs/s
    - Analog bandwidth: Few hundred MHz -GHz
    - Resolution: 8 12 bits
    - Memory size
    - Channel count
    - Triggering Buffering
    - ADC
    - Readout









#### Time measurement

- Coarse count: ~1ns
  - Multi GHz counters can be made in modern ASIC's.
  - Gray code
    - Only one bit changing
  - Dynamic range: Large
- 1st. Level fine interpolation:
  - Extract timing difference between signal and reference (clock)
    - Dynamic range: 1 (2) clock cycle
  - A: Use same interpolation reference as counter (Clock).
  - B: Use Different "reference"
- Alignment between coarse and fine needs special care.
  - Must be done with precision of full resolution
  - If badly done then large error (coarse count) in small time window around coarse time change.
  - Example: Use of two phase shifted binary counters and selecting one based on fine interpolation.









#### Time to amplitude

- Time to Amplitude Conversion: TAC
  - Classical type high resolution TDC implemented with discrete components
  - Delicate analog design
  - Requires ADC
  - Slow conversion time –> dead time
  - Not using same reference as coarse time
- Dual slope Wilkinson ADC/TDC
  - Time stretcher
  - Measure stretched time with counter
  - Slow: Analog de-randomizer
  - Example: NA62 GTK in-pixel design





#### **Delay line based**

- Basic principle
  - Use "gate" (inverter) delays
    - Normally two inverters
  - Gate delays have large process, voltage and temperature dependency
  - Using inverting cell
    - Rise and fall time (N and P transistors) does not match well over process, voltage and temperature.
    - Different tricks can be used to make inverting and non inverting buffer have "same" delay but remains problematic.
  - Fully "digital"
  - Capture:
    - Use hit as clock to capture state of delay chain
    - Use delay signals to capture state of hit signal (high speed sampler)
- Delay Locked Loop
  - Control delay chain to cover exactly one clock cycle.
    - Compensates for Process, Voltage and Temperature effects (but not miss-match)
    - Uses same timing reference as course count and self calibrates to this.
  - Begin-end effects, Phase error, Jitter, Delay cell matching
  - Such a delay locked loop is a very quite circuit as all transitions are perfectly distributed over clock period (not the case for the Hit signal)
  - Half digital / half analog`







### Delay elements

- Current starved inverters/buffers
  - N-side, P-side, Both
  - Only one of the two current starved
- Regulate delay chain power supply with local LDO
  - Careful interfacing to other circuits
- Differential delay cell
  - Consumes DC power -> More power
  - Only needs one cell per delay (better resolution)
  - (Less sensitive to power supply noise)
  - (Generates less noise)
  - Different types of loads can be used
    - Inductive peaking can gain ~20%
  - ~25ps possible in 130nm, worst case
- Pseudo differential and many more







CP







Ecole de Microélectronique IN2P3 2017

Moritz.Horstmann@cern.ch

In

#### Sub-gate delay. 2nd. interpolation

- Vernier principle
  - Difference in delays can be made much smaller than delay in cell R=T2-T1
  - Basic Vernier chain gets impractical long
  - Performance gets mismatch dominated
  - Delay difference can be implemented in many ways:
    - Capacitance loading
    - Transistor sizing
    - Different current starving
    - etc,.
  - How to lock to reference ?
    - DLL's locked to different references
    - DLL's with different number of delay cells locked to same reference.





#### **DLL** arrays

- An array of DLL's can use the Vernier principle
  - DLL's auto lock to common timing reference
- Example: Improve binning from 25ps to 6.25ps
  - 4 equal DLL's driven by fifth DLL with slightly larger delay
    - Potentially very mismatch sensitive
  - 1 DLL driving many small DLL's
    - Less mismatch sensitive (mismatch correction still advantageous)
    - Non trivial layout to assure matching routing capacitances and R-C delays





- Passive delays
  - In modern IC technologies wiring delays already the dominating source of delays.
  - No easy way to "lock" to global reference
    - Some kind of adjustment required
  - R-C delay
    - The adjustment of any tap affects all the other taps
      - Used in HPTDC. In practice a bit of a pain (but works)
  - Transmission line
    - Short delays can be made with on-chip transmission lines
    - Predefined and characterized transmission lines exists in may chip design kits.

160MHz 40MHz

Hit ->

- Lossy so signal shape changes down the line.
- Can be used on hit signals instead of on DLL signals
  - Flexibility on channel count versus resolution (used in HPTDC)
  - This scheme can be used with many approaches



Ecole de Microélectronique IN2P3 2017



Coarse counter

lCh0

Ch1

Ch2

Ch3

Looped Vernier (beating oscillators)

- Two delay chains/loops propagates timing signals with slightly different delay.
  - Start Stop type
- Start oscillators with start and stop signals
  - Latch loop1 count (start) when stop
    occurs
  - Latch loop2 count (stop) when edge in loop2 catches up with edge in loop1.
  - Store in which Vernier cell the two edges meet.
- Appears elegant but hard to implement:
  - Loop feedback time and re-coupling must be "zero" delay
    - Circular layouts tried (but not so good for matching)
  - All this per channel
  - No direct lock to a reference
  - Long conversion time -> Dead-time
  - Some errors accumulate during recirculation





#### Analog interpolation between delay cells

- Resistive voltage division across neighbor delay cells.
  - Rise times in delay chain longer than delay of cell.
  - Purely resistive division "autoscales" with delay of delay cell
  - Only carries current during transitions.
- Parasitic capacitance makes this resistive division a mixture of resistive division and R-C delays
  - Relatively low resistor values required to prevent being R-C dominated.
  - With equal resistances the bins are not evenly spaced -> re-optimize individual resistors
  - Does not any more fully "autoscale" to delay of delay cell.
- Can be done on single ended and differential delay cells







Time amplifier in "metastable window" of latch (with internal feedback).

- Any type of latch has a small time window where it enters a metastable region and it takes some time to resolve this
- A small change of timing on the input gives a "large" change of timing on the output: Time Amplifier
- For very high time resolution cases.
  - Only small window where time amplification occurs
  - Non linear, very sensitive to power supply, etc.
  - Hard to use in practice
  - For 3rd level interpolation

Plus other "exotic" schemes.

(implementation nightmare)







#### Wave Union TDC





# Central timing block

Reference (Clock)

- For multi channel TDC's it is attractive to have a central timing block used to drive array of individual channels
  - Minimal complexity per channel.
  - Only one block to calibrate.
  - Power consumed in timing block less critical (but timing distribution to channels gets significant)
- For very high resolution TDC's this gets increasing difficult as required signal propagation delays larger than required resolution (mismatch!).
- Buffer delays large than resolution: missmatch sensitive
- For highly distributed TDC functions on large chips (e.g. pixel chips) it gets routing and power prohibitive even for low time resolution.
  - Alternative: Centralized DLL locked to reference generates control voltage to distributed delay loops (miss-match !)

Centralized timing block locked to global reference (e.g. DLL array)





### Time capture registers

- The latches/registers used to capture the timing event get critical in the ps range
- Fast capture/regeneration registers required
  - Timing signals have large rise/fall times compared to required resolution.
  - Small and well defined metastability window with good resolving capability.
  - Single ended (e.g. classical master slave FF) or differential (sense amplifier for fast SRAM's)
- Mismatch between registers
  - Assuming multiple registers must latch at same instance
- Routing of hit signal to registers must be done with care







#### **Capture Scheme**





#### Synchronous

Sample state of hit signal Continuous data flow Potentially no dead time



#### Asynchronous

Sample state of reference signal Sample only when actual hit occurs -> lower power



#### HPTDC

- History
  - Architecture initially developed at CERN for ATLAS MDT (design transferred to KEK)
  - CMS Muon and ALICE TOF needed similar TDC with additional features / increased resolution
- Features
  - 32 channels(100ps binning), 8 channels (25ps binning)
  - 40MHz time reference (LHC clock)
  - Leading, trailing edge and TOT
  - Triggered or non triggered
  - Highly flexible data driven architecture with extensive data buffering and different readout interfaces
- Used in large number of applications:
  - More than 20 HEP applications: ALICE TOF, CMS muon, STAR, BES, KABES, HADES, NICA, NA62, AMS, Belle, BES, , ,
    - We still supply chips from current stock, running out
  - Other research domains: Medical imaging,
  - Commercial modules from 3 companies: CAEN, Cronologic, Bluesky
  - ~50k chips produced
- 250nm technology (~10 years ago for LHC)
  - Development: ~5 man-years + 500kCHF.
  - Can not be produced any more
- http://tdc.web.cern.ch/TDC/hptdc/docs/hptdc\_manual\_ve r2.2.pdf









### Challenges in TDC Design



Ecole de Microélectronique IN2P3 2017

Moritz.Horstmann@cern.ch
## Difficulties in ps range resolution

#### LSB/sqrt(12) ≠ rms





#### System Level



**Complete Measurement Chain** 

- Detector Noise
- Analog Front End
- Time Walk Correction
- Time Reference Noise
- TDC Noise
- Inter-channel Crosstalk
- PVT variation ...



## **Delay Element**

Critical building block - often longest delay path / used in many architectures





#### **Time Capture Registers**

#### Critical building block - makes timing decision



#### For fine-time TDC designs:

Fine resolution = good matching / high power OR Fine resolution = FF calibration



## picoTDC



Ecole de Microélectronique IN2P3 2017

Moritz.Horstmann@cern.ch

41

### **Potential Users**



Calorimeter upgrades: Provide precision timing (~30 ps) on high energy photons in ECAL, on photons and high energy hadrons in HGCal Precision timing only for showers

We propose additional (thin) timing layers MIP timing with 30 ps precision and full efficiency Acceptance: InI<3.0 and p<sub>T</sub>>0.7 GeV in the barrel and outer endcap





#### **MPD** NICA



IN2P3 2017

#### Requirements

- achieve sub 10ps LSB sizes
  with RMS better than bin-size
- ~64 channels
- large dynamic range
  - allow to use one common reference
- robust against power supply noise
- flexible in terms of power consumption / time resolution



## **TDC Trends**



### picoTDC Architecture.



- Central interpolator with counter to extend dynamic range
- Measurements are referenced to common reference to allow to synchronize multiple TDCs
- DLL for PVT auto calibration and power consumption tradeoff
- Short propagation delays and fast signal slopes of timing critical signals to reduce jitter
- Calibration applied on two groups of channels to reduce circuit overhead and calibration time
- Relatively constant power consumption make it less sensitive to change in hit rate



## Low Jitter PLL

- Clock multiplication from 40MHz to 2.56GHz for fine time counter and time interpolator
  - Low jitter critical
  - Jitter filtering of 40MHz clock to the extent possible
    - 40MHz reference MUST be very clean
  - LC based oscillator
- Design: Jeffrey Prinzie, KU Leuven
- Detailed layout and optimization
- Prototyped May 2015
- Measurements very promising (350fs RMS jitter)





#### Phase Noise vs. Freq. Offset



#### **Fine-Time Interpolator**



DLL to control LSB size

- -> 64 fast delay elements in first stage 12 ps
- -> total delay of DLL 781 ps at 1.28 GHz
- Resistive Interpolation to achieve sub gate delay resolutions
  - -> LSB size of 2nd stage controlled by DLL



### Voltage Controlled Delay Cell





### **Operation Region (Post Layout)**





• Running at all corners @12ps delay



## DLL

- 64 taps, 12.2ps delay
- Self-Calibrating
- Jitter not as critical, doesn't pile up









#### **Resistive Interpolation and Drivers**

- Interpolation can be disabled for low resolution mode
- Drivers: tapered buffers, each driving 32 capture FFs and 64 standard cell FFs
- Calibration separate for each half







#### **Resistive Interpolation**



- > use small resistances, small loads



## Calibration



- 5 bit for each channel
- Up to +24ps with 750fs steps



## **Driven Line Simulation**





#### **Device Mismatch**



- Calibration can correct for Fine-Time Interpolator and Distribution Buffer mismatch
- Don't want to calibrate each single register
  - -> time capture registers require good matching



#### **Time Capture Register**





#### **Time Capture Flip Flops**

- Revisited design, timing vs. power very critical, 16k capture Flip Flops running @1.28GHz
- Highly optimized M/S Flip Flop followed by standard cell Flip Flop for metastability resolution
- Monte Carlo simulations show a mismatch of 800fs RMS, noise influence of 240fs RMS





## Hit Decoding

- Decoding fully synchronous
- One pipeline register at each clock phase per channel
- 3 coarse or 4 fine
  pipeline stages
- In each stage signal phases move closer together
- Result in center of capture channel / phases





#### **Full Timing Macro**



- 64 channels, DLL and resistive interpolator in the center
- Hit signal input on the left, output on the right



#### Post Layout Power Consumption

- DLL + resistive interpolation:
- Time distribution + calibration:
- Capture registers:
- Decoding:
- > Total @ 3ps bins:
- Total @ 12ps bins:

40mW 260mW 250mW 50mW

600mW 200mW



## **Hit Receivers**

- Differential receivers optimized for ultra-low jitter, low power
- Full Range (common mode 0V .. VDD=1.2V), somewhat LVDS-compatible
- Highest speed @~800mV common mode
- Optimized for 200mv Peak-Peak amplitude
- Design: Bram Faes, KU Leuven
- Prototyped & tested





#### **Sources of Measurement Deviation**

- Bin size 3ps -> 880fs RMS
- PLL: 350fs RMS phase Jitter
- DLL&Drivers : 400fs RMS phase Jitter, INL/DNL can be calibrated down to ~400fs
- Capture FFs: 800fs mismatch (DNL)
- Hit receivers: <1ps jitter</li>
- ~1.75ps RMS total deviation
- External sources: input clock jitter, signal pre-processing





## Full picoTDC Architecture



64 channels, 3ps or 12ps time binning, 200us dynamic range



# **TDC** Logic

- Synthesized logic from SystemVerilog RTL
- Based on data driven architecture from HPTDC
  - Simplifications with individual buffers per channel
  - Clocking: 320 MHz
  - Trigger matching based on time measurements
- Extensive verification environment
- New interfaces defined and implemented
  - Control/monitoring, Trigger, Readout



## **Logic Features**

- Untriggered or triggered with configurable latency and length, overlap possible
- Naturally overflowing counter used for calculating trigger matches, TOT etc.
- Counter with arbitrary overflow and reset for machine cycle, can be inserted in event header when triggered or in measurements when untriggered
- Combination of TOT+untriggered+arbitrary counter overflow should preferably be removed, adds a lot of logic overhead



## Interfaces

- Power: 1.2v, ~1.0W (64ch, 3ps), ~0.5W (64ch, 12ps) ~0.3W (32ch, 12ps)
- Hits: Differential (LVDS "compatible")
- Time reference: 40MHz differential
  - Low jitter reference critical for high time resolution
- Trigger/Event-Rst/BX-Rst/reset: Sync Yes/No
  - Option: encoded protocol?
- Control/monitoring: I<sup>2</sup>C at CMOS 1.2V-levels
  - Option: GBT E-link?
- Readout: 4 readout ports of 1-8 differential signals
  - Common mode 0.6V, programmable current 1-4mA
  - Compatible with LpGBT and FPGAs
- Packaging: ~340 FPBGA





## **Constraints on Input Signals**

- Max. one edge per 1.28GHz-Cycle (~0.8ns)
- Internal glitch filter
  - Filter time can be programmed to enforce the 0.8ns or more for filtering e.g. oscillations
- Small derandomizer (4 hits) for each channel running @1.28GHz
- Sustainable rate to channel buffer 320MHz, trigger matching running @320MHz for each channel separate
- No bottlenecks until readout buffers
- Trigger in each 40MHz-Cycle possible



## Readout

- 1 or 4 readout ports
  - 4 ports: High rate applications (e.g. non triggered)
    16 TDC channels per port
  - 1 port: Low-medium rate 64 channels (or 32 channels in 32 channel mode) Round robin with channel group separators, max. consecutive hits per group can be configured
- Readout data: 32bit words
  - Headers, trailers, TDC data, status, etc.
- Readout ports interface
  - Byte wise:
    - 40, 80, 160, 320 MHz
    - Option: Sync signal to mark first byte of word
  - Serial:
    - 8B/10B encoding
    - Low speed: 40, 80, 160, 320 Mbits/s
    - High speed: 1.28 Gbits/s
- TDC readout bandwidth:
  - Max:
    320M
    - 320MHZ x 8 x 4 = 10Gbits/s (~4Mhits/s per channel without triggering)
    - 1.28Gbits/s x 4 = 5Gbits/s
  - Min: 1 x 40Mbits/s= 40Mbits/s



### 32 Bit Frames

**TDC** measurement

Type (1)=0

TDC data (31)

| Event headers (up to two)                                         |              |              |         |  |  |  |
|-------------------------------------------------------------------|--------------|--------------|---------|--|--|--|
| Type (4)=100?                                                     | Field A (12) | Field B (12) | Div (4) |  |  |  |
| Possible fields: event ID, Bx ID, natural ID, status & monitoring |              |              |         |  |  |  |

#### Event trailers (up to two)

| Type (4)=101? Field A (12) Field B (12) Div (4) |  |
|-------------------------------------------------|--|
|-------------------------------------------------|--|

Possible fields: event ID, Bx ID, natural ID, #hits, status & monitoring

In untriggered mode, trigger input can be used to generate headers with selectable data (e.g. internal counters)

#### Errors/status

Type (4)=1100

Error/status flags (28)

#### Channel group separator (for single readout port)

Type (4)=1111 Chan-Grp-ID (2)

Div (26)



## Absolute TDC data

Full TDC data, DEFAULT FORMAT

Type (1) Channel (4) Edge (1) Coarse cnt (12) Fine cnt (5) DLL int (6) Res int (2) 0

## **Relative to Trigger**

A: Triggered with relative time: Same as absolute

Type (1) Channel (4) Edge (1) Coarse cnt (12) Fine cnt (5) DLL int (6) Res int (2) 0

B: Triggered with relative leading and TOT: Same as absolute Lead. + TOT

| Type (1) | Channel (4) | Leading (16) | TOT(11) |
|----------|-------------|--------------|---------|
| Type (1) | Channel (4) | Leading (19) | TOT(8)  |



## Leading + TOT

- Packet Type:
- Channel ID:
- Leading:

•

- Large dynamic range
  - 16bit 3ps resolution: 200ns
  - 19bit 3ps resolution: 1600ns
- Programmable part of full 25bits leading TDC
- (Relative to trigger to be useable)
- TOT (Relative to leading):
  - Short dynamic range:
    - 8bit 3ps resolution: 780ps
    - 11bit 3ps resolution: 6.1ns
  - Programmable part of full 25bits TOT difference
    - TOT assumed to be used for offline time-walk correction of leading.
- Alternative: Readout of Individual Leading and Trailing edges with full range/resolution

1bit

16/19 bits

11/8 bits

2x readout bandwidth





Ecole de Microélectronique IN2P3 2017

4 bits, for single port readout +2 bit group separator

## Full ASIC Floorplan




# **Verification Environment**

| 70,000,000fs | 13,980,000,000rs |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          | TimeA = 13,985,211,002fs   13 |          |       |     |    |           |
|--------------|------------------|-----------------------------------------------|--------------|------------|-------------|-------------|-------------|---------|-------------------------|--------------------|------------------------------------------------|-------------|-------------|-------------|----------|-------------------------------|----------|-------|-----|----|-----------|
|              | 201              | 0062                                          | 0            |            |             |             |             |         |                         | (92190110 € 922000 |                                                |             |             |             | 0        |                               |          |       |     |    |           |
|              | Notoneso         |                                               |              |            | *******     |             |             |         | 02100110                |                    |                                                |             | X 3220000   |             |          | *******                       |          |       |     |    |           |
|              | J                |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       | h   |    |           |
|              |                  | J ∟<br>V n⊳                                   | J ∟<br>V n⊳  | V ne       | J L<br>V ne | J ∟<br>V n⊳ | J ∟<br>V n⊳ | V OB    | J ∟<br>V n⊳             | J ∟<br>V n⊳        | J L<br>V ne                                    | J L<br>V ne | J L<br>V ne | J L<br>V ne | L I      |                               |          |       |     |    | <br>Vn⊳ V |
|              | /<br>]           | <u>,                                     </u> | <u>л — —</u> | л <u> </u> | <u>, —</u>  | <u>,</u>    | <u>,</u>    | <u></u> | <u>,</u>                | <u>,</u>           | <u>л — — — — — — — — — — — — — — — — — — —</u> |             | Ë           | <u>,</u>    | <u>,</u> |                               | 7        | _\    |     | _/ |           |
|              | 2E3              | 9AF 8                                         | 0            |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               | 3c39F780 |       |     |    |           |
|              | ^                |                                               |              |            | 002         | 0021822c    |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            | 000         | 00000011    |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              | T: 1             | A010                                          | 0630         |            | 0F:         | 004         | 1в98        | 0       | H1: 82180110 H2: 922c00 |                    |                                                |             |             |             | :c00     | 00                            | ( 0F     | : 024 | 1FB | 80 |           |
| FFFF         |                  |                                               |              |            | 000         | 0000        |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  | 15R: 3E41E580                                 |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         | 12R: 32423180           |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              | V 108. 20112200  |                                               |              |            |             |             |             |         | 10p. 2p41p590           |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              | A 10K. 27415300  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         | V on .                  | 224                | 2920                                           | 0           |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         | 7F: 10422D80       |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         | 6F: 1841DA80       |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         | 5R: 1641E680       |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         | 4F: 1041E180       |                                                |             |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         | 3F:                     | 0c4                | 1 <b>F</b> 98                                  | 0           |             |             |          |                               |          |       |     |    |           |
|              |                  |                                               |              |            |             |             |             |         |                         |                    |                                                |             |             |             |          |                               |          |       |     |    |           |



- Verification in SystemVerilog
- Use cases can be defined and automatically tested, visualization of buffer occupancy, lost hits etc.



Ecole de Microélectronique IN2P3 2017

5

23591087ps: Missing Falling hit at channel

# **Verification Features**

- Environment supports and verifies all TDC features
  - Triggered / untriggered
  - Rising / rising&falling / TOT
  - Different counter and reset settings
- Extensive test cases
  - High / low / burst hit rate
  - High / low trigger rate, overlapping triggers
- Specific use cases can be defined, verified





# Backup



- DLL overview
- Building blocks:
  - VCDL
  - PD
  - LF
- DLL analysis:
  - Linear
  - Nonlinear
- Lock acquisition
- Charge sharing



Paulo.Moreira@cern.ch

**Delay-Locked Loops** 

## **DLL Block Diagram**



#### **Delay-Locked Loop functional blocks**

- Voltage Controlled Delay Line (VCDL):
  - Takes the reference clock as an input and delays it by some amount **D**.
  - The delay D is function of a control voltage D(V<sub>control</sub>).
  - Sometimes the control quantity can be a current. In this case we have a Current Controlled Delay Line (CCDL)
  - We will assume that the higher the voltage (or the current) the shorter will be the propagation delay through the delay line.

- Phase Detector (**PD**):
  - Compares the phase of the signal at the input and output of the VCDL.
  - Depending on the type, produces an error signal that:
    - It is proportional to the phase difference between the input and output phases;
    - It just gives an indication on the sign of the phase error (bang-bang detector).
- Loop filter (**LF**):
  - Eliminates the high frequency components of the error signal:
  - It can be implemented as:
    - An RC low-pass filter
    - An active low pass filter
    - A charge-pump and a capacitor



### Intrinsic Delay in CMOS Circuits



## **CMOS** Inverter

- Common-source configuration:
  - NMOS can only discharge (pull-down);
  - PMOS can only charge (pull-up);
  - Both P and N transistors are thus needed.
- CMOS inverter:
  - No static power consumption.
- Mobility electrons > mobility holes:
  - PMOS transistors are weaker than NMOS.
  - To compensate:

 $W_p/W_n = \mu_n/\mu_p \approx 3/1$  (for L<sub>n</sub> = L<sub>p</sub>, typically minimum length in digital circuits).

- What's the best way to control the inverter delay:
  - V<sub>dd</sub>?
  - C<sub>L</sub>?
  - None of the two!



#### The Starved Inverter





- Can we run the starved inverter infinitely slow?.
- No, must have:





 $t_d = f(V_{control}) = K_{vcdl} \times V_{control}$ (linear approximation valid around the working point)

## **Differential Delay Cell**

Advantages:

- 'Insensitive' to common-mode;
- Signal and the Inverted signal available.
- Constant power consumption: low switching noise

#### Disadvantages:

- Consumes static power;
- Half of the tail current used to charge/discharge the load;
- Differential to single ended converter required to interface with CMOS logic





### The DFF Phase Detector



#### Output leads the input



• Sign information only:

- No phase error magnitude information;
- It distinguishes early or late only;
- It is called a bang-bang phase detector.

#### Loop operation:

 When in lock the phase change occurs virtually every clock cycle and the average phase error becomes zero.

#### • Its advantages are:

- simplicity of operation;
- Operation possible at the maximum FF operation frequency;
- Minimum pulse width 1/f;
- The phase range spans from  $-\pi$  to  $+\pi$ .
- Insensitive to duty-cycle distortion in the CK input (<u>however:</u> duty-cycle distortion on the D input creates asymmetry in the transfer function)



#### Paulo.Moreira@cern.ch

## **DFF PD Implementation**



- Carefully design one.
- To avoid phase errors and Metastability:
  - Internal nodes  $\rightarrow$  same fanout;
  - Gates  $\rightarrow$  the same driving capability;
  - Every two gates in the same latch  $\rightarrow$  same fan-in;
  - The latch SR1 is critical → should reach its final state as fast as possible;
  - Decision in a fraction of the reference clock period →
    Otherwise increased jitter.
- Layout is critical for operation:
  - Device matching;
  - Large area devices;
  - Layout as symmetrical as possible;
  - Keeping the wire loading identical on corresponding nodes.



- Consider what happens when a current is fed to a capacitor:
- The voltage across the capacitor (V) is simply the time integral of the current (I) being fed to the capacitor:

$$V(t) = \frac{1}{C} \int_{0}^{t} I(t) dt + V_{0}$$

 We can thus easily integrate the phase error if we feed to a capacitor a current that is proportional to the phase error 'measured' by the phase detector:



#### Active Loop-filter: Charge-Pump + Capacitor



$$V_{control}(t) = V_{cap}(t) = \frac{I_{cp}}{C} \int_{0}^{t} sign(\Phi_{err}(t)) dt + V_{0}$$

- M1: current sink, M2: current source;
- M3 and M4: switches:
  - Alternatively closed and opened:
  - Current always flows into or out of the filter capacitor (never directly between V<sub>dd</sub> and ground);
- Reference leads:
  - M4 closed, M3 opened
  - Control voltage increases.
- VCO leads:
  - M3 closed, M4 opened
  - Control voltage decreases
- Keep sink and source currents well matched:
  - minimize static (average) phase error;
- Charge sharing effects need be controlled (discussed later).





#### Bang - Bang Operation Overview



Paulo.Moreira@cern.ch

**Delay-Locked Loops** 

#### **Tracking jitter:**

- The loop tracking behavior introduces jitter:
  - In lock output phase constantly oscillates back and forward around the phase of the reference signal:
  - It is a result of no phase error magnitude information.
- Possible to reduce the loop tracking jitter to insignificant levels;
- Other jitter sources:
  - Thermal and shot noise;
  - Substrate noise;
  - Power supply noise.

#### **Tradeoffs:**

- Optimization for low-jitter:
  - Increase the loop-capacitor C;
  - Decrease:  $I_{cp}$  and  $K_{vcdl}$ .
- Optimization for fast-lock:
  - Decrease the loop-capacitor C;
  - Increase:  $I_{cp}$  and  $K_{vcdl}$ .
- Optimization for low-jitter and fastlock:
  - It is possible to optimize for both:
  - Use a large I<sub>cp</sub> during lock-acquisition;
  - Use a small I<sub>cp</sub> after locking.
- Optimization against substrate and power supply noise:
  - Same as for fast-lock;

## DLL: linear analysis

- Loop filter:
  - Charge-pump + capacitor.
- Phase detector:
  - Considered Linear  $\rightarrow$  signal proportional to the phase error.
- Phase detector output:
  - Pulse of duration proportional to the phase error (e.g. △T(high)-△T(low) in an XOR phase detector).

- Continuous time approximation:
  - Valid for bandwidths a decade or more below the operating frequency. (Keep in mind that DLLs are in fact nonlinear devices.)
- A single pole is present in the loop filter:
  - The DLL is a 1<sup>st</sup> order network.
- Combination charge-pump and loopcapacitor:
  - Acts as a perfect integrator;
  - Modeled as an integrator.



## **DLL Modeling**

#### **Choice of variables:**

- DLL response formulated in terms:
  - Input delay;
  - Output delay;
- Output delay:
  - The VCDL delay:  $D_0(t)$  or  $D_0(s)$
- Input delay:
  - The delay to which the phase detector compares the output delay:  $D_I(t)$  or  $D_I(s)$
- Note that D<sub>I</sub>(t):
  - It is phase detector dependent;
  - It s frequency dependent;





### **DLL Transfer Function**



#### Paulo.Moreira@cern.ch

### The DLL is a 1<sup>st</sup> Order System



- Choose I<sub>cp</sub> and C.
- K<sub>vcdl</sub> 'fixed' by the VCDL design and technology parameters (some degree of control but not much).
- T is fixed by the operation frequency/frequencies.
- Since the system is 1<sup>st</sup> order it is inherently stable:
  - Make sure the higher order, unwanted but unavoidable, poles are at least 10 times higher that  $\omega_{\rm n}.$

- The closed-loop behavior is similar to that of a 1<sup>st</sup> order low-pass RC filter:
  - Settling to 2%  $\rightarrow$  t  $\approx$  4 $\tau$
  - Settling to  $0.1\% \rightarrow t \approx 7\tau$
- Fast settling requires large ω<sub>n</sub>:
  - Trades off against low tracking jitter.
  - $\omega_n$  might start approaching the higher order poles.

## **DLL Design**

$$\omega_n = \frac{I_{cp} \cdot K_{vcdl}}{T \cdot C}$$

- The parameters:
  - I<sub>cp</sub>
  - C
  - K<sub>vcdl</sub>

are technology, temperature and supply voltage dependent

- ω<sub>n</sub> would track the operation frequency (i.e. proportional to 1/T) if the other parameters were 'absolutely' constant:
  - Self-biasing techniques can make ω<sub>n</sub> track the operation frequency over several decades: see Maneatis 1996

- Example:
  - F = 100 MHz
    - T = 10 ns
  - $I_{cp} = 1 \ \mu A$
  - C = 100 pF

- 
$$K_{vcdl} = 2 \text{ ns/V}$$

This leads to:

- $\omega_n = 2 \text{ krad/s}$
- τ = 0.5 ms

#### Notice that:

- The DLL bandwidth is many orders of magnitude lower than the operation frequency.
- When locked to a low jitter clock signal this PLL will display low tracking jitter.
- A VCDL, when subjected to substrate or power supply noise, will generate jitter. Under such circumstances, a DLL with such a low bandwidth will be ineffective tracking the input phase and thus suppressing its own jitter.

## Bang-Bang DLL Nonlinear Analysis

Period

- When a DLL uses a <u>DFF as the phase</u> <u>detector</u>, the continuous time approximation can not be used.
- Simple expressions can be found for:
  - The response to a period step;
  - The tracking jitter.

#### **Phase step:**

- The new period is  $2/3 \times T_i < T_f < 2 \times T_i$ :
  - DLL will regain lock to the new phase;
  - The VCDL delay will ramp to the new value.
- The new period is outside the above bounds:
  - The Phase-Detector will give the wrong phase information and the DLL will lose phase lock.

Tinitial t The DLL will try to catch the new period at a rate given by:

Reference

$$\left|\frac{dD(t)}{dt}\right| = K_{vcdl} \left|\frac{dV_{control}}{dt}\right| = K_{vcdl} \frac{I_{cp}}{C}$$

Units: [rad/s] or [s/s]

#### Example:

Using the previous example the tracking slope is: 20 ns/ms

Ifinal

## Frequency Step $f_2 > f_1$



## Frequency Step $f_1 > f_2$



### Frequency Step: Limit Values



If  $T_2 < 2/3 T_1$  the phase detector will activate the early output instead of the late. The delay will increase instead of decreasing.



If  $T_2 > 2 T_1$  the phase detector will activate the late output instead of the early. The delay will decrease instead of increasing.

## Bang-Bang Tracking Jitter



#### Jitter:

• Uncertainty on the position of the falling and rising edges.

• Seen in a scope as 'thick' traces on the rising and falling positions.



- Ideally every clock cycle the phasedetector should alternate between an early and a late decision.
- In practice, due to charge-pump unbalance or jitter, it is very likely that the PD decision will be frequently maintained during two consecutive clock cycles to either side.
- The minimum P-P tracking jitter is thus given by:

$$4 \cdot \left| \frac{dD(t)}{dt} \right| \cdot T = 4 \cdot K_{vcdl} \frac{I_{cp}}{C} \cdot T$$



#### Example:

Using the tracking slope from the previous example:

 $J_{pp} = 4 \times (20 \text{ ns/ms}) \times (10 \text{ ns})$  $J_{pp} = 0.8 \text{ ps}$ 

The tracking jitter can be thus made to be very small. <u>The jitter is likely to be dominated by</u> thermal, supply and substrate noise.

## **DLL Lock Acquisition**

Typical Bang-Bang DLL startup procedure:

- Set the VCDL to its minimum value (maximum control voltage)
- Force the VCDL delay to increase until the phase detector gives a consistent early indication (e.g. 32 consecutive early detections)
- Once the PD consistently indicates early, pass the control of the loop to the phase detector which will finally take the DLL to lock.


• Charge-pumps perform almost like ideal integrators however charge sharing might degrade their performance.



This node charges to  $V_{dd}$  when M4 is open

When M4 closes  $V_{control}$  jumps of:

$$\Delta V_{cont} = \frac{C_{d2}}{C + C_{d2}} \cdot (V_{dd} - V_{cont})$$

When M3 closes  $V_{control}$  jumps of:

$$\Delta V_{cont} = -\frac{C_{d1}}{C + C_{d1}} \cdot V_{cont}$$

Notice that:

• The voltage jump is proportional to the control voltage itself;

- $\approx$  proportional to C<sub>d1</sub> and C<sub>d2</sub>;
- $\approx$  inverse proportional to C; (usually C>> C<sub>d1</sub> or C<sub>d2</sub>):

Example:

If C = 100 pF,  $C_{d1}$  = 10 fF and  $V_{control}$  = 1V:  $\Delta V_{control}$  = -100  $\mu V$ Compare with:  $I_{cp} \times T/C$  = 100  $\mu V$ 

This node discharges to gnd when M3 is open

### Charge Sharing Control



### Delay chain feed through



## **130nm Demonstrator Results**











Ecole de Microélectronique IN2P3 2017

114

# **Code Density Test**

- Uniformly distributed events across clock cycle
  asynchronous clock domains
- Number of collected hits => bin size



### 

### Before Global Calibration





4 O F

 $\sigma_{\rm LSB}$  = 2.1 ps



Ecole de Microélectronique IN2P3 2017

- 🔝 🖁

2•4 On

## **Interpolator Linearity**

#### After Global Calibration









## DNL after global calibration

 $DNL = \pm 0.9 LSB$ 

**RMS < 0.28 LSB (1.4 ps-rms)** 

### no missing codes





# INL after global calibration

 $INL = \pm 1.3 LSB$ 

### RMS = < 0.43 LSB (2.2 ps-rms)

(could correct for INL offline)

### expected rms resolution w/ custom FF:

including quantization noise, INL & DNL

2.3 ps-rms <  $\sigma_{qDNL/wINL}$  < 2.9 ps-rms

### ideal 5 ps LSB TDC: 1.44 ps-rms



## **Reconstructed Transfer Function**





TWEPP 2013

# **Standard Cell FF - Weak Matching**



 $INL = \pm 2.5 LSB$ 

RMS = < 0.69 LSB (3.45 ps-rms)

RMS = < 0.87 LSB (4.35 ps-rms)

expected time resolution: < 5.9 ps-rms (w/ standard cell FF)







Ecole de Microélectronique IN2P3 2017

121

# **Measured Single Shot Precision**

- Three measurement series
  - both hits arriving within one reference clock cycle
  - second hit arrives one clock cycle later
  - second hit arrives multiple clock cycles later (~5ns)





## Inter Channel Crosstalk





## **PVT variations**



