



**Ph.D. Dissertation** 

# Design of High-Speed Receiver with Stochastic CTLE Adaptation and Dead-Zone Free PAM-4 PD

Stochastic CTLE 적응 및

Dead-Zone Free PAM-4 PD 를 이용한

고속 수신기 설계

by

**Minkyo Shim** 

August 2023

Department of Electrical and Computer Engineering College of Engineering Seoul National University

# Design of High-Speed Receiver with Stochastic CTLE Adaptation and Dead-Zone Free PAM-4 PD

지도 교수 정 덕 균

이 논문을 공학박사 학위논문으로 제출함 2023 년 8 월

> 서울대학교 대학원 전기·정보공학부 심 민 교

심민교의 박사 학위논문을 인준함 2023 년 8 월

| 위육 | 빈 장 | (인)        |
|----|-----|------------|
| 부위 | 원장  | (인)        |
| 위  | 원   | <u>(인)</u> |
| 위  | 원   | <u>(인)</u> |
| 위  | 원   | (인)        |

# Design of High-Speed Receiver with Stochastic CTLE Adaptation and Dead-Zone Free PAM-4 PD

by

Minkyo Shim

A Dissertation Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

at

SEOUL NATIONAL UNIVERSITY

August 2023

Committee in Charge:

Professor Suhwan Kim, Chairman

Professor Deog-Kyoon Jeong, Vice-Chairman

Professor Dongsuk Jeon

Professor Yongsam Moon

Professor Young-Ha Hwang

## Abstract

This thesis proposes the use of CTLE adaptation and PAM-4 baud-rate phase detector for receiver design to improve robustness. The thesis analyzes and discusses the basic concepts of equalization and CDR in section II. In section III, this thesis proposes an 8-to-16 Gb/s referenceless receiver that utilizes stochastic CTLE adaptation to maximize the eye width. The CTLE adaptation technique involves weighted summation of 32 symbol histograms of sequential samples to obtain the golden weight using the epsilon-constraint optimization-based weight searching algorithm. Sharing edge and data samples enable both the referenceless CDR and CTLE adaptation without additional analog hardware. The prototype chip occupies an active area of 0.029 mm<sup>2</sup> and is fabricated in 28-nm CMOS technology. The measurement results indicate that the proposed adaptation technique achieves the optimal CTLE coefficient and exhibits superior power efficiency of 1.11 pJ/b.

In section IV of this thesis, a PAM-4 receiver design for a CMOS image sensor link is presented. The proposed solution addresses the dead zone issue encountered in conventional sign-sign Mueller-Muller (SS-MM) PDs when combined with adaptive decision feedback equalizers (DFEs). The proposed approach utilizes a sign-sign minimum mean squared error (SS-MMSE) PD with a biased state to achieve an optimal and unique locking point, avoiding dead zones and preventing wandering. Moreover, the proposed SS-MMSE PD exhibits phase-detecting transitions that are 1.5 times higher than the conventional SS-MM PD. The proposed solution is tested with a prototype PAM-4 RX chip fabricated in 40-nm CMOS technology, demonstrating a bit error rate (BER) of less than 10<sup>-9</sup> with a 15.8 dB loss channel. The total power consumption of the prototype is 46.6 mW at 12 Gb/s, achieving a figure of merit of 0.24 pJ/b/dB, which represents energy efficiency per channel loss at Nyquist frequency.

**Keywords** : adaptive equalization, baud-rate phase detector (PD), continuous-time linear equalizer (CTLE), clock and data recovery (CDR), CMOS image sensor (CIS) link, dead zone, decision feedback equalizer (DFE), epsilon-constraint optimization, 4 pulse amplitude modulation (PAM-4), programmable gain amplifier (PGA), Receiver (RX), referenceless, sign-sign minimum mean squared error (SS-MMSE) PD, sign-sign Mueller-Muller (SS-MM) PD.

Student Number: 2018-24147

## Contents

| ABSTRACT                                     | Ι   |
|----------------------------------------------|-----|
| CONTENTS                                     | III |
| LIST OF FIGURES                              | VI  |
| LIST OF TABLES                               | X   |
| CHAPTER 1 INTRODUCTION                       | 1   |
| 1.1 MOTIVATION                               | 1   |
| 1.2 THESIS ORGANIZATION                      | 3   |
| CHAPTER 2 BACKGROUND ON HIGH-SPEED RECEIVER  | 5   |
| 2.1 BASIC CONCEPTS IN SERAL INTERFACE        | 5   |
| 2.1.1 Serial Links                           | 5   |
| 2.1.2 MULTI-LEVEL PULSE-AMPLITUDE MODULATION | 8   |
| 2.2 Equalization                             | 10  |
| 2.2.1 Overview                               | 10  |
| 2.2.2 Continuous Time Linear Equalizer       | 14  |
| 2.2.3 PROGRAMMABLE GAIN AMPLIFIER            | 19  |
| 2.2.4 DECISION FEEDBACK EQUALIZER            | 22  |
| 2.2.5 EQUALIZER ADAPTATION                   | 25  |
| 2.3 CLOCK AND DATA RECOVERY                  | 29  |

| 2.3.1 Overview                                              | 29     |
|-------------------------------------------------------------|--------|
| 2.3.2 PI-BASED CDR                                          | 31     |
| 2.3.3 Types of PAM-4 Phase Detectors                        | 35     |
| CHAPTER 3 PAM-2 RECEIVER WITH STOCHASTIC CTLE               |        |
| ADAPTATION                                                  | 42     |
| 3.1 Overview                                                | 42     |
| 3.2 PROPOSED CTLE ADAPTATION                                | 44     |
| 3.2.1 CONCEPT                                               | 44     |
| 3.2.2 NUMBER OF SAMPLES                                     | 47     |
| 3.2.3 WEIGHT SEARCHING ALGORITHM                            | 49     |
| 3.3 CIRCUIT IMPLEMENTATION                                  | 56     |
| 3.4 Measurement Results                                     | 58     |
| CHAPTER 4 PAM-4 RECEIVER WITH DEAD-ZONE FREE SS-MMS         | SE PD  |
| FOR CIS LINK                                                | 63     |
| 4.1 Overview                                                | 63     |
| 4.2 ANALYSIS OF CONVENTIONAL BAUD-RATE PDS AND PROPOSED DEA | D-ZONE |
| Free PAM-4 PD                                               | 67     |
| 4.2.1 COMPARISON BETWEEN MM PD AND MMSE PD                  | 67     |
| 4.2.2 DEAD-ZONE EFFECT OF CONVENTIONAL BAUD-RATE PD WITH    |        |
| Adaptive DFE                                                | 73     |
| 4.2.3 PROPOSED DEAD-ZONE FREE PAM-4 BAUD-RATE PD            | 75     |
| 4.3 CIRCUIT IMPLEMENTATION                                  | 81     |
| 4.3.1 ARCHITECTURE OF THE PROPOSED PAM-4 RX                 | 81     |

| 4.3.2 FIXED DTLEV CALIBRATION | 87  |
|-------------------------------|-----|
| 4.3.3 MULTIPLE LOOP STABILITY | 89  |
| 4.4 MEASUREMENT RESULTS       | 91  |
| CHAPTER 5 CONCLUSION          | 97  |
| BIBLIOGRAPHY                  | 99  |
| 초 록                           | 105 |

# **List of Figures**

| FIG. 2.1 TRENDS OF DATA RATE OF INDUSTRIAL I/O STANDARDS IN [9]                     |
|-------------------------------------------------------------------------------------|
| FIG. 2.2 TYPICAL ARCHITECTURE OF HIGH-SPEED TRANSCEIVER IN [10]7                    |
| FIG. 2.3 EYE DIAGRAM COMPARISON BETWEEN PAM-2 AND PAM-4 SIGNALING                   |
| FIG. 2.4 CONCEPTUAL DIAGRAM OF SINGLE BIT RESPONSE                                  |
| FIG. 2.5 CONCEPTUAL DIAGRAM OF SAMPLED SINGLE BIT RESPONSE                          |
| FIG. 2.6 CONCEPTUAL DIAGRAM OF EQUALIZERS AT THE RECEIVER SIDE                      |
| FIG. 2.7 SCHEMATIC DIAGRAM OF THE CONVENTIONAL ACTIVE CTLE                          |
| Fig. 2.8 Gain controllability of CTLE: (a) Controlling $R_S$ and (b) $C_S.16$       |
| FIG. 2.9 GAIN CURVES OF CHANNEL, CTLE, CHANNEL•CTLE                                 |
| Fig. 2.10 Gain curves and eye diagrams of various boosting conditions: under-       |
| BOOSTED, OPTIMUM-BOOSTED, AND OVER-BOOSTED18                                        |
| FIG. 2.11 PAM-4 EYE DIAGRAMS WITH VARIOUS PGA GAIN CONDITIONS: OPTIMUM, WEAK,       |
| AND STRONG19                                                                        |
| FIG. 2.12 SCHEMATIC DIAGRAM OF THE CONVENTIONAL ACTIVE PGA                          |
| Fig. 2.13 Gain curves of the PGA with adjustable degeneration resistance $\dots 21$ |
| FIG. 2.14 SCHEMATIC DIAGRAM OF THE PAM-4 1-TAP DFE                                  |
| FIG. 2.15 CONCEPT OF DFE WITH SINGLE-BIT RESPONSE                                   |
| FIG. 2.16 CONCEPT OF CTLE ADAPTATION WITH SPECTRUM BALANCING METHOD IN [7]27        |
| FIG. 2.17 CONCEPT OF CTLE ADAPTATION WITH EYE-OPENING MONITOR IN [3]28              |
| FIG. 2.18 (A) MESOCHRONOUS CLOCKING ARCHITECTURE AND (B) PLESIOCHRONOUS             |
| CLOCKING ARCHITECTURE                                                               |

| FIG. 2.19 CONCEPTUAL DIAGRAM OF THE PI-BASED CDR                                                            |
|-------------------------------------------------------------------------------------------------------------|
| Fig. 2.20 Z-domain system model of (a) $1^{\mbox{st}}$ order and (b) $2^{\mbox{nd}}$ order PI- based CDR 32 |
| Fig. 2.21 Comparison between two types of PDs in terms of required samplers: (a)                            |
| 2X OVERSAMPLING PD AND (B) BAUD-RATE PD                                                                     |
| FIG. 2.22 EDGE DISTRIBUTIONS FOR (A) PAM-2 AND (B) PAM-4 SIGNALING                                          |
| FIG. 2.23 CONCEPT OF THE PROPOSED PAM-4 PD IN [5]                                                           |
| FIG. 2.24 CONCEPT OF THE BAUD-RATE PD IN [15]                                                               |
| Fig. 2.25 Single-bit responses with corresponding PD gain curves for (a) symmetric                          |
| IMPULSE RESPONSE WITHOUT ADAPTIVE DFE AND (B) ASYMMETRIC IMPULSE RESPONSE                                   |
| WITH ADAPTIVE DFE                                                                                           |
| FIG. 2.26 UNEQUALIZED MM CDR WITH PULSE RESPONSE IN [16]40                                                  |
| FIG. 2.27 STOCHASTIC PAM-4 PD PROPOSED IN [17]41                                                            |
| FIG. 3.1 PROPOSED RECEIVER WITH STOCHASTIC CONTROL ENGINE                                                   |
| FIG. 3.2 CONCEPT OF THE CONVENTIONAL EDGE-BASED CTLE ADAPTATION AND PROPOSED                                |
| CTLE ADAPTATION BY DETECTING SEQUENTIAL EDGE AND DATA PATTERNS45                                            |
| FIG. 3.3 ISI INFORMATION OBTAINED THROUGH CORRELATION ACCORDING TO DIFFERENT                                |
| SEQUENTIAL PATTERNS                                                                                         |
| Fig. 3.4 (a) Selected channel models and CTLE model, (b) collected eye diagrams                             |
| AND HISTOGRAMS FOR VARIOUS CTLE CODES WITH $15 \text{dB}$ loss channel model49                              |
| FIG. 3.5 WEIGHT CONDITIONS THAT SCGSOUT SHOULD SATISFY FOR ONE CHANNEL (TOP)                                |
| WHERE $M$ is the optimum CTLE code, and several examples of the SCGS gain                                   |
| CURVES THAT APPLIES THE WEIGHT THAT SATISFIES THE ABOVE CONDITIONS WITH -                                   |
| 15DB LOSS CHANNEL (BOTTOM)                                                                                  |
| FIG. 3.6 (A) HOW THE STATE COUNT NUM IS PERFORMED, AND (B) WEIGHT SEARCHING                                 |

| ALGORITHM THAT FINDS THE GOLDEN WEIGHT SET BASED-ON EPSILON CONSTRAINT             |
|------------------------------------------------------------------------------------|
| OPTIMIZATION                                                                       |
| FIG. 3.7 FINAL STOCHASTIC CTLE ADAPTATION GAIN CURVE WITH THE GOLDEN WEIGHT SET    |
| FOR THE SELECTED CHANNEL MODELS                                                    |
| Fig. 3.8 Circuit implementation of the proposed receiver with referenceless $CDR$  |
| AND STOCHASTIC CTLE ADAPTATION                                                     |
| Fig. 3.9 Microphotograph of the prototype and power breakdown                      |
| FIG. 3.10 MEASURED CHANNEL INSERTION LOSS                                          |
| Fig. 3.11 Measured BER with sinusoidal jitter of 0.2 UIPP at 100 MHz by manually   |
| CONTROLLED CTLE CODE60                                                             |
| Fig. 3.12 Measured JTOL of channel 3 with both referenceless CDR and CTLE          |
| ADAPTATION IS ON                                                                   |
| FIG. 4.1 CONCEPTUAL DIAGRAM OF THE PROPOSED PAM-4 ADAPTIVE RECEIVER WITH DEAD-     |
| ZONE FREE BAUD-RATE CDR AND ADAPTIVE CONTROL ENGINE                                |
| Fig. 4.2 Detected transitions of (a) MM PD and (b) MMSE PD which assumes 2 $$      |
| ERROR SAMPLERS PER SAMPLING PHASE70                                                |
| Fig. 4.3 Single bit response and corresponding timing function of (a) conventional |
| BAUD-RATE PD WITHOUT DFE, (B) CONVENTIONAL BAUD-RATE PD WITH DFE73                 |
| Fig. 4.4 Working principle of the proposed SS-MMSE PD and the biased state75 $$    |
| FIG. 4.5 PROPOSED DF SS-MMSE PD: SUMMATION OF SS-MMSE PD AND WEIGHTED BIASED       |
| STATE                                                                              |
| FIG. 4.6 CONCEPTUAL DIAGRAM OF THE MAXIMUM VEO ACHIEVEMENT                         |
| FIG. 4.7 SIMULATION RESULTS OF OPTIMUM BETA FOR DIFFERENT INSERTION LOSSES         |
| FIG. 4.8 OVERALL BLOCK DIAGRAM OF THE PROPOSED PAM-4 ADAPTIVE RX                   |

| FIG. 4.9 (A) SCHEMATIC DIAGRAM OF THE ATT AND (B) AC GAIN CURVES OF ATT         |
|---------------------------------------------------------------------------------|
| FIG. 4.10 (A) SCHEMATIC DIAGRAM OF THE PGA AND (B) AC GAIN CURVES OF PGA        |
| FIG. 4.11 (A) SCHEMATIC DIAGRAM OF THE CTLE AND (B) AC GAIN CURVES OF CTLE 85   |
| FIG. 4.12 CIRCUIT IMPLEMENTATION OF DF SS-MMSE CDR                              |
| FIG. 4.13 FIXED DTLEV CALIBRATION FOR NON-LINEARITY AND OFFSET OF SAMPLERS      |
| FIG. 4.14 SCHEMATIC DIAGRAM OF SKEW CALIBRATION                                 |
| Fig. 4.15 Simulation result of the simultaneous adaptation at data rate of $12$ |
| GB/S                                                                            |
| FIG. 4.16 BLOCK DIAGRAM OF THE MEASUREMENT SETUP91                              |
| Fig. 4.17 (a) Chip photomicrograph and (b) measured power breakdown and (c)     |
| DIFFERENTIAL INSERTION LOSS                                                     |
| FIG. 4.18 MEASURED RECOVERED EYE DIAGRAM WITH ON-CHIP EYE MONITOR               |
| FIG. 4.19 MEASURED (A) BATHTUB CURVE AND (B) JITTER TOLERANCE CURVE             |

## **List of Tables**

| TABLE 2.1 CHARACTERISTICS OF PAM-M SIGNALING.                                       | 9    |
|-------------------------------------------------------------------------------------|------|
| TABLE 2.2 COMPARISON BETWEEN 1 <sup>st</sup> and 2 <sup>nd</sup> order PI-based CDR | 33   |
| TABLE 3.1 GOLDEN WEIGHT TABLE                                                       | 53   |
| TABLE 3.2 PERFORMANCE COMPARISON                                                    | 62   |
| TABLE 4.1 COMPARISON BETWEEN MM PD AND MMSE PD ASSUMING 2 ERROR SAMP                | LERS |
| per UI.                                                                             | 71   |
| TABLE 4.2 PERFORMANCE SUMMARY AND COMPARISON                                        | 96   |

## Chapter 1

## Introduction

#### **1.1 Motivation**

In wireline communication systems, data rates for required specs are continuously increasing to the limitation of channel bandwidth. Frequency-dependent loss such as skin-effect loss or dielectric loss and impedance discontinuities mainly cause the bandwidth of the channel. In case of automotive CMOS image sensor link (CIS) interfaces, there also have been the increasing demands of higher bandwidth due to the increasing request on the advanced driver assistance systems utilizing deep learning models such as convolutional neural networks. The interface between high-resolution CIS and an electronic control unit or an image signal processor (ISP) in automotive applications [1], however uses a maximum cable length of 12 m including up to six connectors, limiting the channel bandwidth to only about 3 GHz. Furthermore, cable lengths vary from 3 m to 15 m, the corresponding channel losses also vary drastically from one another. Therefore, the insertion loss should be adaptively compensated to attain the lowest bit error rate (BER) performance. An adaptive equalizer is utilized

to achieve this objective. Mostly, coefficients of the equalizer at the transmitter side are preset. The adaptation of the equalizer at the receiver side is inevitable.

The most representative equalizers at the receiver side are continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE). While DFE cancel the postcursor inter-symbol interferences (ISIs), the CTLE controls the overall shape of the single-bit response (SBR). Therefore, various methods of CTLE adaptation are reported by many previous works [2] - [7]. However, these methods demand extra hardware overhead and long adaptation time. In this thesis, a referenceless receiver with a stochastic CTLE adaptation is proposed to resolve these issues. By sharing the data and edge samples with the CDR, the proposed CTLE adaptation does not require additional analog hardware. As well as hardware efficiency, the maximum horizontal eye opening is achieved by proposing weight searching algorithm.

As well as equalization, multi-level signaling is adopted to enhance the link bandwidth beyond 10 Gb/s. However, PAM-4 signaling degrades the signal-to-noise ratio (SNR), making hard to meet the stringent automotive prerequisites for reliability. Additionally, since the number of the samplers increase logarithmically in multi-level signaling, the baud-rate sampling is widely adopted. Baud-rate CDR with adaptive DFE causes the dead-zone problem [8]. In this thesis, DF SS-MMSE PD is proposed to remove the dead zone and achieves the maximum vertical eye opening. Measurement results verifies the stochastic CTLE adaptation and DF SS-MMSE CDR with enhanced BER and eye opening.

#### **1.2 Thesis Organization**

This thesis is organized as follows. In chapter 2, background of equalization and CDR is provided. Especially, CTLE adaptation is mainly introduced for equalization part. For CDR part, the types of PD are introduced and compared.

In Chapter 3, an 8-to-16 Gb/s referenceless receiver with a stochastic CTLE adaptation is presented. The proposed stochastic CTLE gain selector (SCGS) achieves a maximum horizontal eye margin and avoids sub-optimal settling by utilizing sequential edge and data samples. The proposed SCGS detects the optimum CTLE coefficient with the weighted summation of the histograms obtained under various data patterns and channel conditions. The stochastic CTLE adaptation shares the deserialized edge and data samples used for the CDR. Therefore, it does not require additional hardware in the analog domain, achieving low power consumption. The golden weight of the SCGS is obtained through an epsilon-constraint weight searching algorithm. A prototype chip fabricated in 28-nm CMOS technology occupies an active area of 0.029 mm<sup>2</sup>. The measured CTLE adaptation behavior shows that the maximum eye width is achieved with the proposed SCGS. The prototype chip achieves BER over a channel with 14-dB loss at 8 GHz and consumes 17.7 mW at 16 Gb/s, exhibiting a power efficiency of 1.1 pJ/b.

In Chapter 4, a PAM-4 receiver (RX) for CMOS image sensor (CIS) link is presented. A robust baud-rate PAM-4 phase detector (PD) is proposed, which alleviates the dead zone issue encountered in the conventional sign-sign Mueller-Muller (SS-MM) PD when combined with an adaptive decision feedback equalizer (DFE). An optimum and unique locking point is reached by using a sign-sign minimum mean squared error (SS-MMSE) PD with a biased state which exhibits a constant probability for all phase error ranges. The proposed PD with the biased state evades the dead zone and prevents wandering. Furthermore, the phase-detecting transitions of the proposed SS-MMSE PD are 1.5 times higher compared to the conventional SS-MM PD. The proposed solution is verified with a prototype PAM-4 RX chip using 40-nm CMOS technology. It demonstrates a bit error rate (BER) of less than 10-9 with a 15.8 dB loss channel. The total power consumption is 46.6 mW at 12 Gb/s, achieving a figure of merit (energy efficiency per channel loss at Nyquist frequency) of 0.24 pJ/b/dB.

Chapter 5 summarizes the proposed works and concludes this thesis.

### Chapter 2

# Background on High-Speed Receiver

# 2.1 Basic Concepts in Seral Interface2.1.1 Serial Links

As the introduction of artificial intelligence and big data spreads in various fields, the data bandwidth for wireline interface has grown exponentially. To meet these needs, the specifications of several wireline interfaces are being upgraded as generations are being updated. The high data rate per-lane demand appears in various applications such as Optical Internetworking Forum Common Electrical I/O (OIF CEI), Peripheral Component Interconnect Express (PCIe), Universal Serial Bus (USB), and Ethernet [9], as shown in Fig. 2.1. Accordingly, the transceiver design is becoming more and more challenging.



Fig. 2.1 Trends of data rate of industrial I/O standards in [9].

The simplified block diagram of the high-speed transceiver is shown in Fig. 2.2 [10]. The parallel digital data is serialized in the transmitter side. Then, the equalization is performed, and the clocked driver transmits the serialized data through the channel. The driver and front-end of receiver are 50-ohm terminated for impedance matching. The equalized input signal is sampled, and clock and data recovery (CDR) is performed. The sampled data is deserialized and fed to the processors. The increasing data rates limit the bandwidth of the circuit, and the design complexity increases.



Fig. 2.2 Typical architecture of high-speed transceiver in [10].

Additionally, the unit interval (UI) becomes narrower and more vulnerable to bounded and unbounded noise. The representative deterministic noise caused by the limited channel bandwidth is inter-symbol interference (ISI). The equalization for both the transmitter and receiver is essential nowadays. Since the equalization of the TX needs back-channel data to fully adapt, the need of accurate equalization at the receiver side is increasing. As well as equalization, accurate sampling clock recovery is required at the receiver side for the synchronous system.

#### 2.1.2 Multi-Level Pulse-Amplitude Modulation

As the required data rate increases, the bandwidth of a transmission line and CMOS circuit is approaching its physical limit. Many specifications such as IEEE 802.3, OIF, MIPI A-PHY are selecting PAM-4 signaling for their next-generation of specifications [9], [10], [11]. In the specification of Ethernet or OIF, PAM-2 is adopted in 25G, and PAM-4 is adopted in case of 50G/100G/200G. For MIPI A-PHY, In the case of MIPI A-PHY, the introduction of PAM-8 and PAM-16 is being reviewed along with PAM-4. However, PAM-M signaling requires careful design consideration of signal-to-noise ratio (SNR).

From the point of view of large signal analysis, the affordable voltage range is fixed in which the linearity of the circuit is guaranteed. Therefore, the maximum vertical eye margin of PAM-M signaling is limited to the circuit design. Fig. 1.3 shows the eye diagram compassion between PAM-2 and PAM-4 signaling. Assuming that



Fig. 2.3 Eye diagram comparison between PAM-2 and PAM-4 signaling.

the both maximum voltage level of PAM-2 and PAM-4 signaling is same, the vertical eye margin of PAM-4 signaling is 1/3 of PAM-2 signaling. Therefore, as M levels of symbol increases, the vertical eye margin decreases with ratio of 1/(M-1). Another assumption that noise power is same for all PAM-M signaling, the SNR loss of employing PAM-4 signaling is  $20 \cdot \log (1/3) \approx -9.5$  dB. Since the maximum bit-error rate is related to the SNR by Shannon's equation and is express as

$$C = B \cdot \log_2(1 + SNR) \tag{2.1}$$

where C is the channel capacity and B is the channel bandwidth. The SNR degradation is critical to the bit-error rate. Therefore, adaptive on-chip control of deterministic noise is essential in multi-level signaling. Table 2.1 summarizes the characteristics of PAM-M signaling.

|                              | PAM-M                         |
|------------------------------|-------------------------------|
| Bits per symbol              | log <sub>2</sub> M            |
| Bandwidth extension          | log <sub>2</sub> M            |
| Electrical Levels            | М                             |
| Number of eye diagram per UI | M - 1                         |
| SNR degradation              | $20 \cdot \log \frac{1}{M-1}$ |

Table 2.1 Characteristics of PAM-M signaling.

#### 2.2 Equalization

#### 2.2.1 Overview

In serial links, signals are transmitted from the source device to the sink device via channels such as cables or traces. These channels, including the microstrip line, twisted pair, or coaxial cable, have low-pass filter characteristics due to the skin effect and dielectric losses. At higher data rates, the channel loss becomes more pronounced, and bandwidth-limited signals experience inter-symbol interference, leading to a significant reduction in SNR. To address this issue, high-pass filtered equalizers are employed to compensate for the bandwidth-limiting channel response, thus flattening the overall frequency response.

The channel response is demonstrated by observing the transmitted single bit pulse signal through channel. This is represented as single-bit response (SBR). The SBR of the channel, sbr(t) is written as



Fig. 2.4 Conceptual diagram of single bit response.

$$sbr(t) = \phi(t) * h(t) \tag{2.2}$$

where  $\Phi(t)$  is the transmitted single bit pulse and h(t) is the channel impulse response. Because of the insertion loss of the channel, the inter-symbol interference is occurred as shown in Fig. 2.4. With synchronous system, the received continuoustime signal can be sampled with clocks at the receiver side. Fig. 2.5 shows the conceptual diagram of sampled single bit response. The received discrete signal, y[n] can be expressed as

$$y[n] = x[n] * h[n]$$
 (2.3)



Fig. 2.5 Conceptual diagram of sampled single bit response.



Fig. 2.6 Conceptual diagram of equalizers at the receiver side.

where x[n] is the transmitted discrete signal, h[n] is the sampled impulse response. For PAM-2 signaling, the vertical eye margin (VEM) of the received signal without unbounded noise is expressed as

$$VEM_{PAM-2} = h[0] - \sum_{k=1}^{\infty} |h[k]| - \sum_{k=-\infty}^{-1} |h[k]|.$$
(2.3)

The VEM of PAM-2 signaling is reduced by the total amount of ISIs The ISIs are deterministic and bounded voltage noise, the equalizers such as feed-forward equalizer (FFE) and decision feedback equalizer (DFE) can cancel them at the sampling point.

However, the VEM of PAM-4 signaling is degraded since the main cursor level is reduced by 1/3. The VEM<sub>PAM-4</sub> can be written as,

$$VEM_{PAM-4} = \frac{h[0]}{3} - \sum_{k=1}^{\infty} |h[k]| - \sum_{k=-\infty}^{-1} |h[k]|.$$
(2.4)

In PAM-4 signaling, the effect of the ISIs is three times higher than PAM-2 signaling. Therefore, precise equalization of ISIs is essential for multi-level signaling. Since the FFE at the transmitter side needs additional back channel to adapt the coefficients for the taps, the equalization at the receiver side becomes more important.

The representative equalizers at the receiver side are continuous-time linear equalizer (CTLE), programmable gain amplifier (PGA), and decision feedback equalizer (DFE). Fig. 2.6 shows the representative configuration of the equalizers at the receiver side. In next chapters, we will focus on the detailed descriptions of equalizers.

#### 2.2.2 Continuous Time Linear Equalizer

Continuous-time linear equalizer (CTLE) is the linear equalizer, and it is located at the analog-front end of the receiver side. CTLE is high-pass filter to compensate for insertion loss of the low-pass filtered channel response.

The conventional active CTLE is based on current-mode logic driver, and the controllable RC-degeneration is implemented for the tenability. Fig. 2.7 shows the schematic diagram of the conventional active CTLE. With half circuit analysis, the transfer



Fig. 2.7 Schematic diagram of the conventional active CTLE.

function of the conventional active CTLE is expressed as,

$$H(s) = A_{DC} \cdot \frac{\left(1 + \frac{s}{\omega_z}\right)}{\left(1 + \frac{s}{\omega_{p1}}\right)\left(1 + \frac{s}{\omega_{p2}}\right)}.$$
(2.5)

*a* \

The one zero and two poles, and DC gain are written as

$$\omega_{z} = \frac{1}{R_{S}C_{S}}$$

$$\omega_{p1} = \frac{1 + \frac{R_{S}C_{S}}{2}}{R_{S}C_{S}}$$

$$\omega_{p2} = \frac{1}{R_{L}C_{L}}$$

$$A_{DC} = \frac{g_{m}R_{L}}{1 + \frac{g_{m}R_{S}}{2}}$$
(2.6)

where  $g_m$  is the transconductance of the input NMOS,  $R_L$  is the load resistance,  $C_L$  is the load capacitance,  $R_S$  is the degeneration resistance, and  $C_S$  is the degeneration capacitance respectively. Design considerations of the CTLE are as follows. When peaking frequency is equal to Nyquist frequency of the CTLE, the main cursor is amplified, and the SNR is enhanced. Since, the dominant pole is inversely proportional to  $C_L$ , the conventional active CTLE has the trade-off between gain and bandwidth.

To compensate various insertion losses of the channel, the gain conventional active CTLE is controlled by adjusting degeneration resistance and capacitance. Fig. 2.8 shows the gain adjustment of the CTLE with respect to the variance of  $R_s$  and  $C_s$ . When  $R_s$  is controlled, the DC gain and the zero frequency is adjusted. Additionally,



Fig. 2.8 Gain controllability of CTLE: (a) Controlling  $R_{S}$  and (b)  $C_{S}.$ 



Fig. 2.9 Gain curves of channel, CTLE, channel•CTLE.

the Nyquist gain is adjusted when controlling  $C_s$ . The circuit designer should choose the optimum parameter of the circuit to compensate various channel losses.

The RX input signal passing through the channel is amplified through CTLE. A comparison of gain curves in the frequency domain is shown in Fig. 2.9. Since channel loss causes low-pass filtering of the signal power spectral density, CTLE boosts the high-frequency gain which the peaking frequency is near the Nyquist frequency. The example of over-boosting and under-boosting condition is extracted and compared as shown in Fig. 2.10.

When the gain at the Nyquist frequency is smaller than the DC gain, the overall boosting is under-boosted and vice versa. Therefore, the gain of the overall frequency range should be flat for optimum boosting. The CTLE gain should be well controlled to fulfill the above condition. In this thesis, the adaptation of the CTLE is presented in Section 3.



Fig. 2.10 Gain curves and eye diagrams of various boosting conditions: under-boosted,

optimum-boosted, and over-boosted.

#### 2.2.3 Programmable Gain Amplifier

Programmable Gain Amplifier (PGA) is the amplifier that controls the input swing level. Therefore, instead of boosting certain frequency range, the overall gain is controlled by the PGA. As the required data rate increases, the PGA is essential for multilevel signaling. Fig. 2.11 shows the conceptual PAM-4 eye diagrams with various PGA gain conditions. The mismatch between the input swing level and the threshold



Fig. 2.11 PAM-4 eye diagrams with various PGA gain conditions: Optimum, weak, and strong.

levels of the samplers causes critically to low BER of the LSB data. Therefore, controllability of the input swing level with the PGA is very important for multi-level signaling.

Fig. 2.12 shows the schematic diagram of the conventional active PGA. Since overall flat gain is required, the architecture of active PGA is configured as currentmode-logic driver without degeneration capacitance. The transfer function of the



Fig. 2.12 Schematic diagram of the conventional active PGA.

conventional active PGA is expressed as

$$H(s) = A_{DC} \cdot \frac{1}{\left(1 + \frac{s}{\omega_{p1}}\right)}.$$
(2.7)

The one pole, and DC gain are written as

$$\omega_{p1} = \frac{1}{R_L C_L}$$

$$A_{DC} = \frac{R_L}{\frac{1}{g_m} + \frac{R_S}{2}}$$
(2.8)

where  $g_m$  is the transconductance of the input NMOS,  $R_L$  is the load resistance,  $R_S$  is the degeneration resistance. The design consideration for PGA is the dominant pole frequency should be higher than the Nyquist frequency.

The controllability of the PGA gain is performed by adjusting the degeneration resistance. For fine tuning, the linear transistor with gate voltage control is adopted. The gain curves of the PGA are shown in Fig. 2.13. The DC gain is controlled, and the overall swing level is controlled to fit to the threshold levels of the samplers.



Fig. 2.13 Gain curves of the PGA with adjustable degeneration resistance

#### 2.2.4 Decision Feedback Equalizer

Decision feedback equalizer (DFE) is widely adopted equalizer at the receiver side. The DFE is composed of the summer, sampler, and the feedback filter with weight coefficient multiplier. Fig. 2.14 shows the schematic diagram of the PAM-4 1-tap DFE. Since the feedback loop is configured with non-linear detection block, sampler. The DFE cannot be analyzed with the transfer function. With the single bit



Fig. 2.14 Schematic diagram of the PAM-4 1-tap DFE

response analysis, the tap coefficients can be analyzed. As shown in Fig. 2.15, the DFE removes the post-cursor inter-symbol interferences (ISIs). The vertical eye margin (VEM) for PAM-2 signaling after DFE cancels the post-cursor ISIs can be expressed as

$$VEM_{PAM-2} = h[0] - \sum_{k=-\infty}^{-1} h[k]$$
 (2.9)

where the VEM is only affected by the pre-cursor ISIs. In the system of the receiver with DFE, the control of pre-cursor ISI is critical to the deterministic eye margin. The VEM for PAM-4 signaling is demonstrated as

$$VEM_{PAM-4} = \frac{h[0]}{3} - \sum_{k=-\infty}^{-1} h[k]$$
(2.10)

where the effect of ISI is three times bigger than PAM-2 signaling. Therefore, the pre-cursor ISI control is important in achieving low bit-error rate. This issue is figured out in Section 4.



Fig. 2.15 Concept of DFE with single-bit response.

### 2.2.5 Equalizer Adaptation

### 2.2.5.1 Least-mean square adaptation algorithm

Equalizers are used to compensate for Inter-Symbol Interference (ISI) caused by frequency-dependent channel loss. However, if the equalizer used to remove ISI does not compensate for the correct value, residual ISI may remain, leading to decreased Signal-to-Noise Ratio (SNR). The effect of the ISI varies according to the sampling timing, and predicting its exact amount is challenging. Therefore, an adaptation method must accompany the equalizer to achieve high SNR.

The least-mean square algorithm is the conventional algorithm for adaptation. To reduce the error, e[n] to minimum value, the equalizer coefficient, w[k] should be updated with the equation shown as,

$$w[k]_{i+1} = w[k]_i - \frac{\mu}{2} \frac{\partial e^2[n]}{\partial w[k]}$$

$$= w[k]_i - \mu e[n] \cdot y[n-k]$$
(2.11)

a - -

where i is the sampling time index, k is the tap location, and  $\mu$  is the update coefficient, and y[n] is the equalizer input. Since the error and the equalizer input should be evaluated with the multi-bit analog-to-digital converter, it is difficult to implement. Therefore, sign-sign least-mean-square (SSLMS) algorithm is employed for simplification. The update equation of the SSLMS algorithm can be rewritten as,

$$w[k]_{i+1} = w[k]_i - \mu \cdot sgn(e[n]) \cdot D[n-k]$$
(2.12)

where D[n] is the sampled output of the equalizer output. The SSLMS algorithm is widely adopted to update coefficients of the PGA, DFE due to its simple implementation However, the assumption that error and data samples are detected correctly is necessary, stating that the SSLMS falsely locks with the initial unequalized conditions. Especially for multi-level signaling, the initial eye diagram is mostly closed due to the reduced SNR, the demand for more robust adaptation algorithm rises.

While DFE removes post-cursors and PGA adjusts the signal swing level to the target level. The SSLMS algorithm is widely adopted to adaptation of DFE and PGA due to its simplicity. The additional error samplers are required for SSLMS adaptation. Since the error samplers share the sampling clocks with the data samplers, it is not a large burden to the receiver implementation. However, CTLE controls the overall impulse response by simultaneously controlling pre and post-cursor ISIs and the main cursor level. Therefore, SSLMS adaptation is not widely adopted to the CTLE, and other approaches are studied and analyzed.

### 2.2.5.2 Previous Works of CTLE adaptation



Fig. 2.16 Concept of CTLE adaptation with spectrum balancing method in [7].

Commonly used technique for adapting continuous-time linear equalizers (CTLEs) is known as the spectrum balancing method, which is illustrated in Fig. 2.16 [16 TCAS2, Y.-H. Kim]. The primary objective of this approach is to align the output spectrum of the equalizer with that of an ideal random binary data. To achieve this goal, the equalizer output is separated into low- and high-frequency components using a critical frequency, which ensures equal power distribution between the two frequency regions. The power in each frequency band is then detected by respective high-pass and low-pass filters and compared to generate a control voltage for the equalizer. This method has the added advantage of eliminating the need for a power-hungry comparator circuit. However, it should be noted that this technique is only effective for input data streams that are either purely or pseudo-random.

Another method is the eye-opening monitor-based adaptation which is an accu-



Fig. 2.17 Concept of CTLE adaptation with eye-opening monitor in [3].

rate way to set CTLE coefficients with an optimal BER as shown in Fig. 2.17. Since it is a counter-based methodology, many registers are used, occupying a large on-chip area and power consumption.

## 2.3 Clock and Data Recovery

## 2.3.1 Overview



Fig. 2.18 (a) Mesochronous clocking architecture and (b) Plesiochronous clocking archi-

tecture.

Clock and data recovery (CDR) circuits play a critical role in various high-speed communication systems, including optical communications, chip-to-chip interconnects, and backplane routing. These circuits enable synchronous operations, such as demultiplexing and retiming, on random data. CDR circuits should track both frequency and phase error of the clocking between transmitter and the receiver. Fig. 2.18 shows the asynchronous clocking architectures of the serial link.

For the mesochronous clocking architecture, the clock generated from the clock multiplier of the transmitter is forwarded to the receiver. The frequency is matched for both clocks of the transmitter and receiver. However, the phase error due to the delay mismatch should be aligned. Therefore, the delay line or phase interpolator circuit is utilized for delay matching.

For many serial link applications, multiple channels are not guaranteed for both data and clock transmission. Therefore, the plesiochronous clocking architecture is attractive to consumers of the serial link. Instead of forwarded clocking, the separate reference clocks are needed for transmitter and receiver. The frequency offset of the reference clocks causes the design complexity of the receiver CDR circuits. Both the frequency and phase error tracking is required for plesiochronous clocking architecture.

As data rate becomes higher, unit interval becomes narrower. The jitter from the clock generator becomes more critical in achieving low bit error rate. Since the flicker noise becomes larger as technology scales down, the on-chip clock generator such as ring oscillator is not an affordable design for ultra-high speed receiver implementation. Therefore, phase interpolator-based CDR becomes more attractive for high speed interface.

## 2.3.2 PI-based CDR



Fig. 2.19 Conceptual diagram of the PI-based CDR.

Fig. 2.19 shows the conceptual block diagram of the PI-based CDR. Due to the frequency offset between the reference clock and the input data in plesiochronous clocking architecture, digital loop filter design is important. This section shows the analysis of static phase and frequency error with regard to the order of the DLF.



Fig. 2.20 z-domain system model of (a) 1<sup>st</sup> order and (b) 2<sup>nd</sup> order PI- based CDR.

Static error is analyzed with the final value theorem and can be expressed as,

$$e(\infty) = \lim_{z \to 1} (z - 1)E(z) = \lim_{z \to 1} (z - 1)I(z)\{1 - H(z)\}$$
(2.13)

where E(z) is the error transfer function, H(z) is the closed-loop jitter transfer function of the CDR, and I(z) is the input. With step input, steady-state phase error can be found. Additionally, steady-state frequency error can be found with ramp input. The step input and ramp input can be expressed as

$$I_{step}(z) = \frac{pz}{z-1}$$

$$I_{ramp}(z) = \frac{wTz}{(z-1)^2}$$
(2.14)

where p is the initial phase error, w is the initial frequency error and T is the sampling period respectively.

Fig. 2.20 compares the z-domain system model of the PI-based CDR regarding the order of DLF. By applying the final value theorem, both 1<sup>st</sup> and 2<sup>nd</sup> order PI-based CDR can track steady-state phase error, but the steady-state frequency error is tracked by only 2<sup>nd</sup> order PI-based CDR. Therefore, with the frequency offset in plesiochro-

Table 2.2 Comparison between 1<sup>st</sup> and 2<sup>nd</sup> order PI-based CDR.

|                                 | 1 <sup>st</sup> order | 2 <sup>nd</sup> order |  |  |
|---------------------------------|-----------------------|-----------------------|--|--|
| Steady-state<br>phase error     | = 0                   | = 0                   |  |  |
| Steady-state<br>frequency error | <b>≠ 0</b>            | = 0                   |  |  |

nous clocking architecture, 2<sup>nd</sup> order PI-based CDR should be adopted. However, stability analysis and jitter peaking are inevitable. Table 2.2 summarizes the steady-state error analysis.



### **2.3.3 Types of PAM-4 Phase Detectors**

Fig. 2.21 Comparison between two types of PDs in terms of required samplers: (a) 2x oversampling PD and (b) baud-rate PD.

The phase detector (PD) is essential circuit to detect phase error between input data and recovered clock. For PI-based CDR, the needs for frequency error is not mandatory due to the small frequency offset. Therefore, various studies of PDs are widely presented for robustness and clocking power consumption efficiency. There are two types of PD with regard to the number of phases per unit interval (UI). 2x oversampling PD utilizes edge samples to detect phase error which requires five edge samplers and three data samplers per UI. However, baud-rate PD requires only one sampling clock phase per UI. Baud-rate PD needs four error samplers and three data samplers. Fig. 2.21 compares 2x oversampling and baud-rate PD in terms of required samplers. In this Section, two types of PAM-4 PD are studied.

### 2.3.3.1 2x Oversampling PAM-4 PD

2x oversampling PD utilizes two number of sampling clock phases. This PD detects the data transition by implementing the edge sampler. For multi-level signaling, there are multiple zero-crossing issues which causes data-dependent jitter. Fig. 2.22 shows the edge distribution for PAM-2 signaling and PAM-4 signaling. With random



Fig. 2.22 Edge distributions for (a) PAM-2 and (b) PAM-4 signaling.

data input, the probability of the edge distribution is 0.5 and has unique crossing point for PAM-2 signaling. However, edge distributions are separated with three crossing points with probability of 0.125, 0.25, and 0.125. These multiple edge distributions appeared in multi-level signaling degrades the recovered clock jitter.

This multiple zero-crossing issue is serious issue for PI-based CDR. This issue has been solved by removing the improper transitions [26]. However, due to the reduced number of detecting transitions in PAM-4 PD, the transition density is reduced and degrades bandwidth of the jitter tolerance curve. To overcome



Fig. 2.23 Concept of the proposed PAM-4 PD in [5].

this issue, the very early/late concept is proposed [5]. By not eliminating the bad transitions, the weighted summation of good transitions and bad transitions solves the transition density reduction. However, this scheme has the trade-off between the bangbang jitter and lock time.

For multi-level signaling, the 2x oversampling PD suffers the multiple zero crossing issue. As well as the jitter problem, the clocking power consumption increases too much since the number of data samplers are three times more than PAM-2 signaling. Therefore, baud-rate PDs are widely adopted due to the above-mentioned issues.



#### 2.3.3.2 Baud-rate PAM-4 PD

Fig. 2.24 Concept of the baud-rate PD in [15].

Baud-rate PD utilizes only one sampling clock phase per symbol. This approach is beneficial compared to oversampling PD in terms of area and power. In case of ultrahigh speed wireline transceivers, half-rate and quarter-rate clocking architecture are inevitably adopted. These clocking architectures require multi-phase clocks and the need for reducing sampling clock phases for timing recovery is reasonable.

The baud-rate timing recovery outlined by Mueller/Muller requires multi-bit analog to digital converter (ADC) [14]. For simplicity, a sign-sign Mueller-Muller (SS-MM) PD has been presented for PAM-2 signaling as shown in Fig. 2.24 [15].



Fig. 2.25 Single-bit responses with corresponding PD gain curves for (a) symmetric impulse response without adaptive DFE and (b) asymmetric impulse response with adaptive DFE.

The SS-MM PD utilizes error and data samplers with two sequential symbols. The SS-MM PD has the locking point of " $h_{-1}=h_{+1}$ " where  $h_{-1}$  is the 1<sup>st</sup> pre-cursor ISI, and  $h_{+1}$  is the 1<sup>st</sup> post-cursor ISI.

Fig. 2.25 shows the conceptual diagram of single-bit responses and the corresponding PD gain curves. For equalized symmetric single-bit response, the SS-MM PD can achieve the maximum vertical eye opening at the locking point. Since the



Fig. 2.26 Unequalized MM CDR with pulse response in [16].

residual ISIs remain, the overall SNR is degraded which is critical disadvantage for PAM-4 signaling. To overcome this issue, the adaptive DFE is implemented with the baud-rate PAM-4 PD. However, due to the locking point of " $h_{-1}=h_{+1}=0$ " causes the dead-zone of recovered clock since the adaptive DFE cancels the post-cursor ISI for every sampling clock phase. The dead-zone causes the wandering of the recovered clock which is the reason for high BER.

Many baud-rate PDs are studied and published to overcome the dead-zone prob-



Fig. 2.27 Stochastic PAM-4 PD proposed in [17].

lem. The unequalized MM CDR is proposed [16]. By adding digital offset to the phase-error accumulator to shift the locking point. However, the manually added the digital offset varies to meet different channel conditions. The stochastic PAM-4 baudrate PD is presented in [17]. By utilizing Bayes' theorem, the locking point can be pre-determined with known ISI conditions. However, this scheme needs the digital coefficient of weighted summation which is externally controlled.

# Chapter 3 PAM-2 Receiver with Stochastic CTLE Adaptation

## **3.1 Overview**

As the data rates required for various wireline standards continue to increase, frequency dependent-loss due to limited channel bandwidth increases. Thus, the design of the equalizer to compensate for signal-to-noise ratio (SNR) degradations becomes essential in the receiver (RX) design. The popular combination of equalizers at the receiver side is a continuous-time linear equalizer (CTLE) and a decision feedback equalizer (DFE). In addition, it is necessary to adapt equalization coefficients according to the channel conditions to minimize the bit error rate (BER). For the adaptation of the DFE, the sign-sign least-mean-square (SSLMS) algorithm is commonly used to cancel post-cursor inter-symbol interference (ISI). However, since the CTLE changes the overall shape of ISI, including the main cursor, pre-, and post-cursors, the CTLE adaptation is not straightforward. Therefore, various adaptation techniques for the CTLE have been reported [2], [3], [4], [5], [6], [7]. A spectrum balancing technique is implemented by balancing low- and high-frequency components of input data, which is an analog-intensive method [7]. An adaptation technique using asynchronous undersampling monitors monitoring histograms of stored sampler outputs by sweeping sampler threshold levels [2]. Although it can be performed without a synchronous sampling clock, it requires a sizeable digital hardware overhead. Eye-opening monitor (EOM)-based adaptation is an accurate way to set equalizer coefficients with an optimal BER [3]. Since it is a counter-based methodology, many registers are used, occupying a large chip area. Sequential search and genetic algorithms are recently published adaptation methods [4], [5]. Both methods require a long adaptation time and additional DACs to sweep sampler threshold levels. The SSLMS can also be applied to the CTLE adaptation, but erratic location selection can cause the RX equalizer to fall into local minima [6].

In order to solve these issues, we propose a referenceless receiver with a stochastic CTLE adaptation methodology. The proposed stochastic CTLE selector (SCGS) is implemented by detecting sequential edge and data samples obtained for the CDR. Thereby, the proposed adaptation technique does not require additional analog hardware. The SCGS is fully synthesizable in the digital domain.

The remainder of this brief is organized as follows. First, the CTLE adaptation methodology is presented. Next, the circuit implementation of the RX is described. Finally, the experimental results of the prototype chip show the performance and compares with other receivers with CTLE adaptation.

## **3.2 Proposed CTLE Adaptation**

### 3.2.1 Concept



Fig. 3.1 Proposed receiver with stochastic control engine.

One of the main objectives of this work is to develop a robust referenceless and adaptive receiver while keeping the conventional 2x oversampling architecture. As shown in Fig. 3.1, the proposed receiver achieves the referenceless operation and the CTLE adaptation by adding a stochastic control engine. The overall receiver consists of a CTLE, samplers, demultiplexers (DEMUXs), a digitally controlled oscillator (DCO), a voltage- to-analog converter (VDAC), and the stochastic control engine. A stochastic phase-frequency detector (SPFD) controls the DCO for the referenceless



Fig. 3.2 Concept of the conventional edge-based CTLE adaptation and proposed CTLE adaptation by detecting sequential edge and data patterns.

CDR operation, and the SCGS controls the CTLE gain through the VDAC. The SPFD is implemented by utilizing three sequential data and edge samples to detect a frequency error as well as phase error [18].

In addition to the referenceless CDR, equalizer adaptation is obtained by utilizing the same data and edge samples. The conventional edge-based adaptation makes that the correlation between the data sample and the edge sample 1.5 UI away converges to zero [19]. As shown in Fig. 3.2(a), this adaptation method drives the value of  $h_{1.5}$ to zero, removing the correlation between  $D_{-2}$  and  $E_{-1}$ . However, the conventional edge-based adaptation may fall into suboptimum depending on the shape of the singlebit response (SBR) of the channel. This brief proposes a stochastic CTLE adaptation algorithm examining five consecutive edge and data samples to overcome the shortcomings of the suboptimum settling of the conventional edge-based adaptation, especially in the presence of a significant precursor ISI. As shown in Fig. 3.2(b), the output of the SCGS is produced by the lookup table of the weights assigned to each of the 5bit sequential patterns. The SCGS<sub>OUT</sub> is positive (negative) when the pattern is underboosted (over-boosted). Its validity is verified by calculating the weighted sum of all probabilities of each symbol occurrence in both under- and over-boost cases and comparing it with the desired output.

Before explaining the detailed operation of the proposed SCGS, we investigate the optimal boost condition of the CTLE. In the recently published RX design [20], three equalizers are used: CTLE, DFE, and VGA. While the DFE removes post-cursors, the VGA adjusts the signal swing to the target level while maintaining the overall SNR. As a result, the combination of the DFE and the VGA can maximize the vertical eye margin. However, the horizontal eye margin cannot be enlarged by the DFE and the VGA. On the other hand, CTLE can increase the horizontal eye margin by simultaneously controlling the main, pre, and post-cursor ISIs. Thus, we define optimal boosting of the CTLE as the point where the horizontal margin of the equalized eye diagram is maximized.

## **3.2.2 Number of Samples**



Fig. 3.3 ISI information obtained through correlation according to different sequential patterns.

Our goal is to extract ISI information by examining as few sequential edge and data samples as possible for channels with a high insertion loss. Fig. 3.3 shows how

the ISI information can be extracted according to the number and combination of the samples through the SBR. If {D, E, D} is detected for three samples, information on  $h_{0.5}$  and  $h_{0.5}$  is obtained through the correlation between the data and edge samples. However, the ISI information obtained from the three samples is not sufficient to implement CTLE adaptation for high-loss channels where ISI spans across many symbols. For five samples of {D, E, D, E, D}, information from  $h_{-1.5}$  to  $h_{1.5}$  can be obtained by analyzing the SBR in the same manner as above.

In this brief, {E, D, E, D, E} is selected to implement the CTLE adaptation, which contains information from  $h_{.2.5}$  to  $h_{2.5}$ . The reason for using the edge samples at both ends instead of the data samples is that  $E_{.3}$  and  $E_{.1}$  include ISI information for patterns of  $D_{.3}$  and  $D_0$  with a probability of 1/4 for a random data input, which yields ample information for the target channel. We would have expanded the number of samples to look if the channel had a longer ISI. The design trade-off is the selection of the target loss range and the number of monitored patterns.

If the receiver is implemented with combined CTLE and DFE, the DFE eliminates post-cursor ISIs, and the CTLE controls the residual long-tail and pre-cursor ISIs. When the long-tail ISIs are not cancelled in the DFE, the number of monitored sequential patterns should be enlarged to detect the residual long-tail ISIs.



### 3.2.3 Weight Searching Algorithm





(0)

Fig. 3.4 (a) Selected channel models and CTLE model, (b) collected eye diagrams and histograms for various CTLE codes with 15dB loss channel model.



Fig. 3.5 Weight conditions that SCGSOUT should satisfy for one channel (top) where M is the optimum CTLE code, and several examples of the SCGS gain curves that applies the weight that satisfies the above conditions with -15dB loss channel (bottom).

The weight is looked up by the SCGS to determine whether to boost more or less in the CTLE. The weights corresponding to each symbol are obtained by a weight searching algorithm. Before starting the weight searching sequence, various target



(b)

Fig. 3.6 (a) How the state Count Num is performed, and (b) weight searching algorithm that finds the golden weight set based-on epsilon constraint optimization.

channel models and implemented CTLE models should be prepared. In this brief, the selected channel models cover the loss from -10 dB to -20 dB at the Nyquist frequency of 10 GHz, as shown in Fig. 3.4(a). The CTLE has a 7 dB gain at the Nyquist frequency, while the DC gain and the zero frequency are controlled with 4-bit resolution by varying the source degeneration resistance.

With the channel and CTLE models ready, eye diagrams and histograms of the 5bit symbols are extracted with pseudo-random data for all channels and CTLE models. Fig. 3.4(a) shows an example of the collected histograms and eye diagrams for each CTLE code with a 15-dB loss. The CTLE code with the maximum eye width is identified among the collected eye diagrams, which is the optimum CTLE code that the SCGS should produce. For the smaller or larger CTLE codes than the optimum code, the SCGS should be able to recognize whether the CTLE is under-boosting or overboosting. This collection procedure is executed for all pre-determined channel and CTLE models. Based on the data obtained in the procedure, the proposed SCGS implements a weighted summation that satisfies for all pre-determined channel losses to settle the optimum code.

We present a straightforward method to determine the set of weights in the form of a lookup table to search for each detected symbol. The weights for each symbol are selected from a set of {-8, -4, -2, -1, 0, 1, 2, 4, 8} that can be trivially implemented as a shift in the digital domain. The output of the SCGS is then determined by

$$SCGS_k = \sum_{N=0}^{31} P_k(S_N) \cdot W_N$$
 (3.1)

where k is the CTLE code,  $P_k(S_N)$  is the occurrence probability of symbol  $S_N$ , and  $W_N$  is the weight for the corresponding symbol. Fig. 5 shows the selection criteria

| Symbol                                         | Weight | Symbol                                         | Weight | Symbol                                          | Weight | Symbol                                          | Weight |
|------------------------------------------------|--------|------------------------------------------------|--------|-------------------------------------------------|--------|-------------------------------------------------|--------|
| <b>S</b> <sub>0</sub> , <b>S</b> <sub>31</sub> | 1      | <b>S</b> <sub>4</sub> , <b>S</b> <sub>27</sub> | 0      | <b>S</b> <sub>8</sub> , <b>S</b> <sub>23</sub>  | 1      | S <sub>12</sub> , S <sub>19</sub>               | 4      |
| <b>S</b> <sub>1</sub> , <b>S</b> <sub>30</sub> | -1     | S <sub>5</sub> , S <sub>26</sub>               | 0      | S <sub>9</sub> , S <sub>22</sub>                | -8     | <b>S</b> <sub>13</sub> , <b>S</b> <sub>18</sub> | -2     |
| S <sub>2</sub> , S <sub>29</sub>               | 1      | S <sub>6</sub> , S <sub>25</sub>               | 2      | S <sub>10</sub> , S <sub>21</sub>               | 0      | S <sub>14</sub> , S <sub>17</sub>               | -1     |
| <b>S</b> <sub>3</sub> , <b>S</b> <sub>28</sub> | -1     | S <sub>7</sub> , S <sub>24</sub>               | -4     | <b>S</b> <sub>11</sub> , <b>S</b> <sub>20</sub> | 0      | S <sub>15</sub> , S <sub>16</sub>               | 2      |

Table 3.1 Golden weight table

under which the weight set must be met. First, the absolute value of the SCGS<sub>OUT</sub> must be smaller than  $\varepsilon$  at the optimum code M. By adopting the epsilon-constraint optimization [10], a suitable selection of  $\varepsilon$  ensures a unique weight set. Second, for all the CTLE codes without the optimum code, the derivative of the SCGS<sub>OUT</sub> must be negative. This monotonicity ensures that the SCGS correctly detects the direction of the CTLE conditions. Under these two criteria, we find the weight groups that satisfy the target channel condition. As shown in Fig. 3.5, multiple weight sets can satisfy the two criteria for one channel if the value of  $\varepsilon$  is large.

By utilizing the epsilon constraint optimization, which solves the multi-objective optimization problem [21], a golden weight set that makes the SCGS satisfy various channel models is obtained. As shown in Fig. 3.6 (a), the Count-Num, which counts the number of weight sets that satisfy the weight selection criteria, is described. The total number of possible weight sets is initially 916 but recalculated and is reduced rapidly as another channel loss is added during the weight searching sequence, as illustrated in Fig. 3.6 (b). For each channel loss, weight sets that satisfy the criteria are collected from the group of weight sets that satisfy the criteria for all channel losses



Fig. 3.7 Final stochastic CTLE adaptation gain curve with the golden weight set for the selected channel models.

are obtained. If there is no satisfactory weight set during the process, the value of  $\varepsilon$  is increased to loosen the criterion. If the number of weight sets satisfied up to the last channel exceeds one, the value of the  $\varepsilon$  is decreased to make the criterion tighter. The final weight set obtained from the searching algorithm is a golden weight set and is applied to the SCGS. In this work, the number of weight sets that satisfy for the first channel is reduced to 291514 ( $\approx$ 95.7), which is significantly reduced. Therefore, the computation time is superbly reduced when passing the channel 1.

The weight table shown below in table 3.1 shows the golden weight obtained from the weight searching sequence. Due to the histogram symmetry, the 32 symbols are grouped into 16 groups, and the weights for each group are shown. When CTLE adaptation is implemented with the proposed methodology, maximum eye width can be achieved for all targeted channels. Fig. 3.7 shows the overall stochastic CTLE adaptation gain curve for all pre-determined channel models when the golden weight is applied. The SCGS gain curve has zero-crossing at the optimum CTLE code for each channel loss, and the monotonicity is assured. As long as the channel loss is varied within the target training range, the SCGS provides an optimized performance not just for the target channels.

## **3.3 Circuit Implementation**



Fig. 3.8 Circuit implementation of the proposed receiver with referenceless CDR and stochastic CTLE adaptation.

Fig. 3.8 shows the circuit implementation of the proposed receiver with the stochastic control engine. Half-rate clocking is employed to reduce the timing constraint. The receiver circuit consists of a CTLE, samplers, DEMUXs, a DCO, a VDAC, and a synthesized digital loop filter with the stochastic control engine. The implementation of the CTLE utilizes resistive and capacitive source degeneration. By controlling degenerated NMOS gate voltage, VGAIN through the VDAC, it boosts up to 8 dB at 10 GHz while providing a DC gain range of -5 to 5 dB. The CTLE is followed by four samplers for half-rate 2x oversampling. Strong-arm latch samplers are used, and the offset of each sampler is externally compensated. DEMUXs deserialize the data and edge samples to use them in the synthesized stochastic CTLE adaptation as well as the referenceless CDR loop. The 32 patterns of the E-D-E-D-E sequence are filtered through a symbol filter, and the number of occurrences is counted for each symbol. While the number of patterns is 32, they can be classified into 16 groups for a simple implementation because of the symmetry of the histograms. The weights obtained through the weight searching are applied with binary shift implementation. The weighted summation of the occurrences generates the gain error. The gain error is accumulated 4-bit binary gain control word (GCW). The GCW controls the VDAC and generates VGAIN. Also, the SPFD is implemented for the CDR loop. The three sequential data and edge samples are filtered and counted for phase-frequency error (PFerr). The PFerr is accumulated and passed through a first-order delta-sigma modulator (DSM) and generates a 10-bit frequency control word (FCW). While the FCW controls the digitally controlled resistor of the DCO, the direct proportional path with conventional BBPD is implemented to reduce the CDR loop latency [18], [22].

## **3.4 Measurement Results**

The prototype chip is fabricated in 28-nm CMOS technology. The chip photomicro-



Fig. 3.9 Microphotograph of the prototype and power breakdown.

graph and the measured power breakdown are shown in Fig. 3.9. The proposed adaptive RX occupies the chip area of 0.029 mm2. All the blocks operate



Fig. 3.10 Measured channel insertion loss.

from a 1.0-V supply. The stochastic engine consumes 5.34 mW, while the total power is 17.76 mW. The RX is tested from 8 Gb/s to 16 Gb/s with the PRBS7 pattern and the swing level of 1 Vpp differential. The BER is measured with a BERT (Anritsu MP1800A). Fig. 3.10 illustrates the insertion loss of the three channels. The length of FR-4 trace differs for the three channels. The measured insertion losses are 10 dB at 8 GHz for Channel 1, 14 dB at 8 GHz for Channel 2, and 16 dB at 7 GHz for Channel 3.

To demonstrate the performance of the SCGS, the BER is measured by manually changing the CTLE coefficient, as shown in Fig. 3.11. To effectively show the



Fig. 3.11 Measured BER with sinusoidal jitter of 0.2  $UI_{pp}$  at 100 MHz by manually controlled CTLE code.

result, sinusoidal jitter of 0.2  $UI_{pp}$  at 100 MHz is injected into the input data NRZ signal. As a result, the SCGS adaptively selects the optimum CTLE coefficients for the three measured channels. For Channel 3, the swing is optimum-boosted and the lowest BER is obtained. Since the CTLE with fixed degeneration capacitance offers the 7 dB gain at the Nyquist frequency, the equalized swing is over-boosted for the low-loss channel. Fig. 3.12 shows the jitter tolerance curve (JTOL) at a BER of  $10^{-12}$  with the highest insertion loss of channel 3. The input data is a 14-Gb/s signal with a 16-dB loss at 7 GHz. The JTOL is measured with



Fig. 3.12 Measured JTOL of channel 3 with both referenceless CDR and CTLE adaptation is on.

both the SCGS, and SPFD enabled. Table 3.2 shows the performance comparison with the state-of-the-art RXs with CTLE adaptation. It shows that the proposed RX achieves the lowest power consumption without additional analog hardware and of-fers an energy efficiency of 1.11 pJ/bit.

|                     | TCAS-II 2012<br>[2]                         | TVLSI 2016<br>[7]     | TCAS-I 2017<br>[3]     | ISSCC 2019<br>[4]    | ISSCC 2019<br>[5]     | APCCAS 2019<br>[23]                      | JSSC 2020<br>[6]      | This Work                                       |
|---------------------|---------------------------------------------|-----------------------|------------------------|----------------------|-----------------------|------------------------------------------|-----------------------|-------------------------------------------------|
| <b>CMOS</b> Process | 0.13µm                                      | 65nm                  | 40nm                   | 28nm                 | 7nm FinFet            | 65nm                                     | 65nm                  | 28nm                                            |
| Data rate           | 5.4 Gb/s                                    | 21 Gb/s               | 28 Gb/s                | 36 Gb/s              | 56 Gb/s               | 32 Gb/s                                  | 10.8 Gb/s             | 16 Gb/s                                         |
| Equalizer           | CTLE                                        | CTLE<br>1-tap DFE     | CTLE<br>1-tap DFE      | CTLE<br>1-tap DFE    | TX FIR<br>CTLE        | CTLE<br>LFEQ<br>2-tap DFE<br>3-tap e-DFE | CTLE<br>2-tap DFE     | CTLE                                            |
| CTLE<br>Adaptation  | Asynchronous<br>undersampling<br>histograms | Spectrum<br>balancing | Eye-opening<br>monitor | Sequential<br>search | Genetic<br>algorithm  | CTLE+LFEQ<br>adaptation                  | SSLMS                 | Edge-Data<br>sequential<br>pattern<br>detection |
| Channel loss        | -16 dB                                      | -14.9 dB              | -25 dB                 | -18.25 dB            | -22.3 dB              | -21 dB                                   | -34 dB                | -14 dB                                          |
| Power               | 35 mW                                       | 34.2 mW               | 43.9 mW                | 106.3 mW             | 104 mW                | 113 mW                                   | 37.2 mW               | 17.7 mW                                         |
| Area                | 0.18 mm <sup>2</sup>                        | 0.027 mm <sup>2</sup> | 0.259 mm²              | 1.232 mm²            | 0.128 mm <sup>2</sup> | 0.300 mm²                                | 0.174 mm <sup>2</sup> | 0.029 mm <sup>2</sup>                           |
| Energy Efficiency   | 6.48 pJ/b                                   | 1.63 pJ/b             | 1.57 pJ/b              | 3.04 pJ/b            | 1.87 pJ/b             | 3.53 pJ/b                                | 3.4 pJ/b              | 1.11 pJ/b                                       |

| Table 3.2   |  |
|-------------|--|
| Performance |  |
| Comparison  |  |

## Chapter 4

# PAM-4 Receiver with Dead-zone Free SS-MMSE PD for CIS Link

### 4.1 Overview

As the demand for autonomous driving increases, high data rates and robust operations are required for the serial links of an automotive CMOS image sensor (CIS) [1], [20], [24]. In the CIS link, since the bandwidth of the channel is limited to only 3 GHz [20], [24], a multi-level signaling is attractive to realize the link bandwidth beyond 10 Gb/s. However, the multi-level signaling degrades a signal-to-noise ratio (SNR), making hard to meet the stringent automotive prerequisites for reliability. Therefore, circuit techniques for robust operation are needed in the CIS link.

A clock and data recovery (CDR) circuit is a critical block in the reliable RX design. A 2x oversampling CDR with a bang-bang phase detector (PD) is widely used due to its robustness [25], [26], [27]. However, the additional samplers and clock phases for edge sampling result in excessive clock power consumption. In addition, since datadependent jitter inevitably occurs due to multiple transitions in PAM-4 signaling, the elimination of these transitions is unavoidable [25], [26], [27]. This could limit the CDR bandwidth and degrade the link performance.

Thus, a baud-rate PAM-4 CDR is attractive for a power-efficient RX design [16], [17], [28], [29], [30], [31], [32]. The Sign-Sign Mueller-Muller (SS-MM PD) is widely used for the baud-rate CDR [16], [28], [29]. However, when the SS-MM PD operates with an adaptive DFE, precursor dependency is caused, which results in a dead zone [16]. The dead zone degrades CDR performance with increased jitter in the recovered clock caused by phase wandering. Several PAM-4 baud-rate PDs have been proposed to resolve the dead-zone problem, but they depend on the data level (dLev), which is the threshold level of the error sampler [17], [29], [30]. Since the detection levels for data samplers are dependent on the dLevs, additional background calibration schemes for offset and non-linearity detection are required for those schemes.

Another conventional baud-rate PD is a sign-sign minimum mean squared error (SS-MMSE) PD [32]. However, the SS-MMSE PD with the adaptive DFE also retains the dead zone problem unless an additional slope detector is implemented. Given these limitations, we propose a dead-zone free (DF) SS-MMSE PD which uses a reduced number of error samplers and introduces a biased state that gives a certain value by default. By using a weighted summation of the SS-MMSE PD and the biased state, the proposed PD eliminates the dead zone without additional hardware. Furthermore, maximum vertical eye-opening (VEO) is achieved by providing the criterion of selecting weight of the biased state.





engine.

Fig. 4.1 illustrates the conceptual diagram of the proposed PAM-4 receiver (RX). To fulfill the stringent automotive prerequisites for reliability, a dead-zone free baudrate CDR and an adaptive control engine is presented. The RX provides the unique locking point of phase interpolator (PI) with DF SS-MMSE PD. Furthermore, the adaptive control engine provides the coefficients of the equalizers and calibration of the non-linearity and offset of the circuits.

In this paper, we offer analysis on two types of the conventional baud-rate PDs and the proposed technique for removing the dead zone in SS-MMSE PAM-4 PD. Then, a circuit implementation of the prototype RX with non-linearity calibration and multiple loop stability control is presented. Finally, the measurement results are shown and summarizes this work.

## 4.2 Analysis of Conventional Baud-rate PDs and Proposed Dead-zone Free PAM-4 PD

#### 4.2.1 Comparison between MM PD and MMSE PD

Considering a channel with a symmetric impulse response h(t), the timing function for an MM PD [14] can be written as

$$f_{MM}(\tau) = \frac{1}{2}(h_{-1} - h_{+1}) = \frac{1}{2}[h(\tau - T) - h(\tau + T)]$$
(4.1)

where T and  $\tau$  are sampling period and phase, respectively. The timing function in (1) will produce a locking point where the first pre-cursor and the first post-cursor inter symbol interference (ISI) are the same. The discrete signal at the RX x[n] can be expressed by the transmitted symbols and channel response as

$$x[n] = \sum_{k} d_{tx}[n-k]h[k]$$
(4.2)

where  $d_{tx}[n]$  is the transmitted symbol that is assumed to be equi-probable and independent. If the transmitted symbols are available at the RX with amply low BER, the timing function of the MM PD can be extracted by correlating received sliced symbols and received signals as follows

$$f_{MM}(\tau) = \frac{1}{2} E\{x[n-1]d[n] - x[n]d[n-1]\}$$
(4.3)

where  $E{X}$  is the expectation of X and d[n] is the received data symbol. The x[n]

can be replaced with the error signal e[n]. Then the final timing function of MM PD can be extracted as

$$f_{MM}(\tau) = \frac{1}{2} E\{e[n-1]d[n] - e[n]d[n-1]\}$$
(4.4)

68

The timing function shows that the phase error is derived by detecting two consecutive data and error samples. With symmetric impulse response, this decision-aided timing function provides a locking point where the main-cursor level is maximized.

On the other hand, the MMSE PD obtains a locking point minimizing an expected value of an error that is the difference between the received discrete symbol and the ideal transmitted symbol, and therefore

$$e[n,\tau] = x[n,\tau] - d_{tx}[n].$$
 (4.5)

Then, the timing function of the MMSE PD is written as [33]

$$f_{MMSE}(\tau) = E\left\{\frac{de[n,\tau]^2}{d\tau}\right\}$$

$$= 2E\left\{e[n]\frac{dx[n,\tau]}{d\tau}\right\}$$
(4.6)

In (6), the signal slope is determined by the difference between the two adjacent received symbols [34], [35]. Thus, the timing function at  $\tau=0$  is simplified to

$$f_{MMSE}(\tau) \approx 2E\{e[n] \cdot (x[n+1] - x[n-1])\}$$
(4.7)

The timing function of MMSE PD in (7) shows that the MMSE PD needs three consecutive samples to generate phase error while the MM PD detects two sequential UI patterns. To determine lock point of the MMSE PD, we also assume sufficient low BER as well as MM PD, and substitute (5) in (7)

$$f_{MMSE}(\tau) = 2E\{(x[n] - d[n])(x[n+1] - x[n-1])\}$$
$$= 2E\{d[n](x[n-1] - x[n+1])\}$$
(4.8)

$$= 2(h_{-1} - h_{+1})$$



(0)

Fig. 4.2 Detected transitions of (a) MM PD and (b) MMSE PD which assumes 2 error sam-

plers per sampling phase.

|                                                   | MM PD                                          | MMSE PD                                         |
|---------------------------------------------------|------------------------------------------------|-------------------------------------------------|
| Timing<br>function                                | PD <sub>OUT,MM</sub><br>=e[n-1]d[n]-e[n]d[n-1] | PD <sub>OUT,MMSE</sub><br>=e[n]•{d[n-1]-d[n+1]} |
| Phase-de-<br>tecting tran-<br>sition den-<br>sity | 0.25                                           | 0.375                                           |
| Locking<br>point                                  | h <sub>-1</sub> =h <sub>+1</sub>               | h.1=h+1                                         |

Table 4.1 Comparison between MM PD and MMSE PD assuming 2 error samplers per UI.

Fig. 4.2 (a) and (b) shows the data transitions that MM PD and MMSE PD detect, respectively. To reduce the loading capacitance of a DFE summer, we considered two error samplers per UI for circuit implementation. Based on the phase error functions of the baud-rate PDs, the MM PD detects 4 transitions out of 16 transitions, and achieves phase-detecting transition density of 0.25. However, the MMSE PD uses 24 transitions out of 64 transitions which yields phase-detecting transition density of 0.375.

The phase error function, transition density, and the locking point of the MM PD and MMSE PD are compared as shown in Table 4.1. While the MM PD detects two sequential data and error samples, the MMSE PD considers three sequential patterns. Despite this hardware overhead, the transition density of the MMSE PD is 1.5 times higher than that of the MM PD. To obtain higher transition density and reduced summer loading, this paper adopted the SS-MMSE PD with reduced error samplers instead of the SS-MM PD for the baud-rate PAM-4 CDR.

### **4.2.2 Dead-zone Effect of Conventional Baud-rate PD** with Adaptive DFE



Fig. 4.3 Single bit response and corresponding timing function of (a) conventional baud-rate PD without DFE, (b) conventional baud-rate PD with DFE.

Since PAM-4 signaling has the SNR penalty of 9.5 dB compared to the NRZ signaling, zero-forcing of ISIs is essential to fully recover the received symbols. The typical RX equalizer is a DFE which removes post-cursor ISIs without degrading the main-cursor level. Also, baud-rate clocking is widely adopted to reduce clocking power consumption of the multi-level signaling RX implementation. As a result, combination of the baud-rate PD and the DFE adaptation is the mostly adopted PAM-4 RX architecture for both CDR and equalization.

However, this combination causes a dead zone that increases the recovered clock jitter. As shown in Fig. 4.3 (a) and (b), single-bit responses and the corresponding timing function of the conventional baud-rate PD are illustrated for both cases with and without adaptive DFE. Without adaptive DFE, residual post-cursor ISI guarantees a unique locking point of the conventional baud-rate PDs. With the received symmetric impulse response, sampling phase is where the main cursor level is maximum. On the other hand, an adaptive DFE cancels post-cursor ISIs for every sampling clock phase. Since the baud-rate PD creates the locking point of  $h_{.1}=h_{+1}=0$ , the clock phases where pre-cursor ISI is zero can be the locking point. This multiple-locking problem causes the recovered clock to wander.

#### 4.2.3 Proposed Dead-zone Free PAM-4 Baud-rate PD



Fig. 4.4 Working principle of the proposed SS-MMSE PD and the biased state.

Fig. 4.4 depicts how the proposed DF SS-MMSE PD generates phase error with the reduced error samplers in PAM-4 signaling. Based on the phase error equation of the SS-MMSE PD shown in Fig. 4.4 (top right), the proposed SS-MMSE PD detects three sequential samples of  $D_{k-2}$ ,  $E_{k-1}$ , and  $D_k$ . For the middle error sample,  $E_{k-1}$  has three regions of ERR<sub>0</sub>, ERR<sub>1</sub>, and ERR<sub>2</sub>. When there is a transition from  $D_{k-2}$  to  $D_k$ , the PD output is determined according to the error region where  $E_{k-1}$  belongs. The truth table shows the early and late states according to the timing function, considering only rising transitions for simplicity. When implementing the proposed PD, falling transitions are also considered.

However, four rising  $-3(D_{k-2})$  to  $+3(D_k)$  transitions contain the same number of early and late states by sharing the same ERR<sub>1</sub> region, as shown in Fig. 4.4 (left below). Since the same number of early and late states overlap, the conventional SS-MMSE PD considers these transitions as a hold state. However, we defined it as a biased state to utilize these transitions. The biased state provides the uniform probability for the overall phase error range. As mentioned before, falling transitions for the biased state are also considered as well as early and late states when implementing the proposed PD.





77



Fig. 4.6 Conceptual diagram of the maximum VEO achievement.

The dead-zone problem is solved by adding a weighted biased state,  $\beta$ -biased to the SS-MMSE PD as shown in Fig. 4.5. By weighted summation, the overall PD<sub>OUT</sub> is downshifted and achieves a unique locking point. Fig. 4.5(b) depicts the timing function of the DF SS-MMSE PD, f<sub>DF</sub>(t) obtained through the relational expression of the timing function of SS-MMSE PD, f<sub>MMSE</sub>(t). Since the adaptive DFE eliminates all post-cursor ISIs, the timing function relationship between f<sub>DF</sub>(t) and f<sub>MMSE</sub>(t) is expressed as

$$f_{DF}(t) = f_{MMSE}(t) - \beta \cdot biased$$

$$= 2h(t - T) - \beta \cdot biased$$
(4.9)



Fig. 4.7 Simulation results of optimum beta for different insertion losses

where  $\beta$ -biased as shown in Fig. 4.6. This down-shift moves the locking point of the DF SS-MMSE PD to rightward where the pre-cursor ISI, h<sub>-1,DF</sub> is non-zero value. Therefore, this paper examines the optimal value of  $\beta$  to maximized VEO.

Initially, the VEO is estimated based on the assumptions that the cursor level is determined for each sampling clock phase and all post-cursor ISIs are eliminated through the adaptive DFE. Assuming that the remaining pre-cursor ISIs are zero except the first pre-cursor level, then the VEO for PAM-4 signaling is expressed as follows

$$VEO(t) = h(t) - 3 \cdot h(t - T).$$
 (4.10)

To maximize the VEO, the optimal sampling phase,  $\tau$  should be shifted to rightward where the gradient of VEO is zero. It is expressed as

$$\frac{dVEO}{dt} = \frac{dh(t)}{dt} \bigg|_{t=\tau} - 3 \cdot \frac{dh(t-T)}{dt} \bigg|_{t=\tau} = 0$$
(4.11)

where  $h(\tau)=h_{0,DF}$  and  $h(\tau-T)=h_{-1,DF}$  respectively. The maximum VEO can be obtained by the sampling phase when  $t=\tau$ , which satisfies the equation (11). At this optimal sampling phase,  $\beta \cdot biased=2h_{-1,DF}$  is satisfied.

Fig. 4.7 shows numerical simulation results of the locking point dependency on various insertion losses. We defined phase error as how much the sampling clock phase is differed from where the main cursor level is maximum. When  $\beta=0$ , the locking point is where  $h_{.1}=h_{+1}=0$ . As  $\beta$  increases, the locking point moves rightward. The DF SS-MMSE PD can obtain a locking point of maximum VEO with an optimum  $\beta$ . By selecting the optimum  $\beta$  with serial link modeling, the DF SS-MMSE achieves a uniform locking point as well as maximum VEO.

## **4.3 Circuit Implementation**

#### 4.3.1 Architecture of the Proposed PAM-4 RX



Fig. 4.8 Overall block diagram of the proposed PAM-4 adaptive RX.

To ensure the safety-critical functionality in the CIS link, on-chip calibrations are needed to be equipped in the RX design. Fig. 4.8 shows the overall architecture of the proposed adaptive PAM-4 RX and AC gain curves of the equalizers. An analog front end consists of an attenuator (ATT), a programmable gain amplifier (PGA), a continuous time linear equalizer (CTLE), and a 3-tap half-rate DFE. An adaptive gain controller (AGC) controls the ATT and PGA to create a constant data swing at the sampler input regardless of the channel conditions. The reference levels for samplers are fixed with offset and non-linearity compensated data and threshold levels (dtLevs). A Fixed dtLev calibration adjusts the 8-bit DACs so that the error and data samplers have optimum dtLevs [20]. The constant data swing and fixed dtLev make the DFE robust. In the DFE, a common-mode compensation tap (Hcm) with a coefficient equal to half of the sum of three DFE tap coefficient is also implemented. Three data samplers (DH, DM, DL) with 2:12 deseraializers (DESs) samples the received data symbols.

Along with the analog front-end and the adaptation logics, the proposed CDR is implemented by using error samplers (EH, EL) and PI. The sign-sign least mean square (SS-LMS) algorithm is employed for the AGC, CTLE, and DFE adaptation logics [6]. There is no additional analog hardware is implemented to eliminate the dead zone. The DF SS-MMSE PD with 2nd order digital loop filter is implemented to track the phase and frequency error between input data and recovered clock. One monitor sampler (MON) and additional PI (PI<sub>MON</sub>) is added to monitor two-dimensional eye-opening under the condition of the CDR loop enabled [26].





છે











#### 4.3.2 Fixed dtLev Calibration



Fig. 4.13 Fixed dtLev calibration for non-linearity and offset of samplers.

Fig. 4.13 illustrates calibration techniques for fixed dtLev and skew between multiphase clocks for reliable RX operation. The fixed dtLev calibration compensates the non-linearity of equalizers and the offset of samplers [20]. First, we set the sampler threshold by dtLev1 obtained through initial calibration with the automatic offset calibration (AOC) [36]. The AOC detects the offset by sweeping the input dc level while the reference level is fixed to the ideal level. Since the dtLev<sub>1</sub> is a value obtained assuming a linear PAM-4 input, it is not optimal for the non-linear upper and lower eye.

Therefore, an additional adjustment is performed after operating the AGC and the CDR with the fixed dtLev<sub>1</sub>. By fixing the monitor sampler threshold and sweeping the data sampler threshold, the vertical eye opening is detected by com-



Fig. 4.14 Schematic diagram of skew calibration.

paring the two sampler outputs. The center level of the vertical eye opening is selected for the final dtLev<sub>2</sub>. In addition to the non-linearity detection, skew between half-rate sampling clocks is compensated by duty cycle corrector and single to differential circuit as shown in Fig. 4.14. Due to the above calibration techniques, a robust PAM-4 RX for harsh automotive requirements on reliability is achieved.



#### 4.3.3 Multiple Loop Stability

Fig. 4.15 Simulation result of the simultaneous adaptation at data rate of 12 Gb/s.

The proposed RX has four loops: PGA, CDR, DFE, and CTLE. The multiple loops could interact and diverge if two or more loops have a similar bandwidth. As well as the loop stability, the PAM-4 baud-rate PD is sensitive to input swing level and ISI. Therefore, the loop bandwidth of the multiple loops should be determined carefully [6].

First, the settling time of the PGA adaptation should be the fastest because both CDR and equalizer adaptations work properly when the input swing level is settled to

the fixed dtLev. After the PGA adaptation loop, the CTLE adaptation should be faster than the DFE adaptation. The CTLE adaptation provides the pulse shaping and the reduction of long-tail ISIs. Then, the DFE adaptation removes the residual near-end post-cursor ISIs. Finally, the baud-rate CDR finds the proper clock phase based on the DF SS-MMSE PD. After the AGC and CDR is settled, the non-linearity calibration is performed to lower the BER.

Due to the sensitivity of the baud-rate PD to swing level and ISI, the bandwidth of the PGA and equalizer adaptations are set larger than that of the CDR. The dominant pole formed by the CDR stabilizes the simultaneous adaptation system. With the coincident adaptation, the total adaptation time of 12  $\mu$ s, which corresponds to 144,000 UI, is achieved at 12 Gb/s. As shown in Fig. 4.15, the control words for PGA, PI, CTLE, and DFE are simultaneously adapted with different loop bandwidths.

## **4.4 Measurement Results**



Fig. 4.16 Block diagram of the measurement setup.

The measurement setup is depicted in Fig. 4.16. With the signal quality analyzer (Anritsu MP1800A) including a pulse pattern generator (PPG), error detector, and a jitter modulation source are used to verify the proposed RX. The measurements are accomplished by using I2C with Python scripts. For jitter tolerance measurement, the jittery clock is forwarded to PPG while the reference clock is forwarded to the test chip.



Fig. 4.17 (a) Chip photomicrograph and (b) measured power breakdown and (c) differential insertion loss.

The prototype PAM-4 RX with the DF SS-MMSE PD is fabricated in a 40-nm CMOS process and is tested with 1.1-V supply. The die photomicrograph, measured



power breakdown, and insertion loss are shown in Fig. 4.17. The RX occupies an active area of 0.274 mm2 and consumes 46.6 mW at 12 Gb/s, and a detailed power

breakdown is shown in Fig. 4.17(b). Fig. 4.17(c) shows the measured differential insertion loss of the 5-m shielded twisted quad cable is 15.8 dB at 3 GHz.

To verify the effectiveness of the proposed DF SS-MMSE PD, we compared the eye diagrams, bathtub curves, and jitter tolerance curves with two different conditions of the DF mode enabled and disabled. Fig. 4.18 shows the measured eye diagram by using the on-chip eye monitor. Since the on-chip eye monitor detects the eye-opening of the DFE summer by comparing the sampled data, the measurement result is independent of the analog front-end settings, which are identical for both modes of DF off and on. Therefore, it is suitable for verifying the improvement of the recovered clock jitter due to the DF SS-MMSE PD. The  $\beta$  is externally controlled to zero for DF mode off and non-zero value for the DF mode on. The PAM-4 eye margins for BER of 10<sup>-6</sup> with the DF mode enabled and disabled are 0.234 UI x 80 mV and 0.156 UI x 40 mV, respectively. In addition, as shown in Fig. 4.19(a), the timing bathtub curves show a lower BER than 10<sup>-9</sup> with the DF mode enabled larger than 0.06 UI than without the DF mode.

The measured jitter tolerance of the proposed RX is shown in Fig. 4.19(b). With the proposed DF SS-MMSE PD, jitter tolerance for BER of  $10^{-6}$  is improved by 0.1 UI<sub>pp</sub> for the overall frequency range.

Table 4.2 summarizes the performance of the proposed PAM-4 RX and compares it with previous works. This work demonstrates adaptive PAM-4 RX with the DF SS-MMSE PD and various on-chip calibration schemes. It achieves the best figure of merit of 0.24 pJ/b/dB, which is lower than any other RXs.



|                                | [16]<br>2015<br>ISSCC               | [25]<br>2017<br>ISSCC             | [28]<br>2017<br>ISSCC | [32]<br>2021<br>ISSCC                       | [17]<br>2022<br>JSSC      | [31]<br>2022<br>TCAS-II                       | This work                                   |
|--------------------------------|-------------------------------------|-----------------------------------|-----------------------|---------------------------------------------|---------------------------|-----------------------------------------------|---------------------------------------------|
| Technology<br>[nm]             | 14                                  | 40                                | 16                    | 7                                           | 40                        | 40                                            | 40                                          |
| Data-rate<br>[Gb/s]            | 10                                  | 56                                | 56                    | 112                                         | 48                        | 64                                            | 12                                          |
| Modulation                     | NRZ                                 | PAM-4                             | PAM-4                 | PAM-4                                       | PAM-4                     | PAM-4                                         | PAM-4                                       |
| Equalizatio<br>n               | CTLE<br>4-tap DFE                   | CTLE<br>3-tap DFE                 | CTLE<br>10-tap DFE    | CTLE<br>1-tap DFE<br>Multi-tap FFE          | CTLE<br>1-tap DFE         | CTLE<br>2-tap DFE                             | CTLE<br>3-tap DFE                           |
| PD                             | Baud-rate<br>(Unequalized<br>SS-MM) | 2x<br>Oversampling<br>(Bang-Bang) | Baud-rate<br>(SS-MM)  | Baud-rate<br>(Decision<br>directed<br>MMSE) | Baud-rate<br>(Stochastic) | Baud-rate<br>(Transition<br>Weighted<br>Gain) | Baud-rate<br>(Dead-zone<br>Free<br>SS-MMSE) |
| On chip<br>calibration         | AGC<br>DFE<br>AOC<br>EOM            | dLev<br>DFE<br>EOM                | -                     | dLev<br>AGC<br>Skew<br>AOC                  | dLev<br>DFE               | dLev<br>DFE                                   | AGC<br>CTLE<br>DFE<br>Fixed dtLev<br>EOM    |
| Channel<br>loss<br>[dB]        | 24                                  | 24                                | 10                    | 26                                          | 4                         | 6                                             | 16                                          |
| Power<br>[mW]                  | <b>59.0</b> *                       | 382.0                             | 230.0                 | 921.0                                       | 116.3                     | 152.0                                         | 46.6                                        |
| Energy<br>efficiency<br>[pJ/b] | 5.90*                               | 6.82                              | 4.11                  | 8.22*                                       | 2.42                      | 2.37                                          | 3.83                                        |
| FoM**<br>[pJ/b/dB]             | 0.25*                               | 0.28                              | 0.41                  | 0.31*                                       | 0.61                      | 0.395                                         | 0.24                                        |

#### Table 4.2 Performance summary and comparison

-\* : Transceiver measurement
\*\* : FoM = (energy efficiency ) / (channel loss @ Nyquist frequency)

### Chapter 5

## Conclusion

In this thesis, CTLE adaptation and the PAM-4 baud-rate phase detector (PD) are proposed for robust receiver design. Basic concept of equalization and CDR are studied and analyzed in section II. An 8-to-16 Gb/s referenceless receiver with a stochastic CTLE adaptation is proposed in section III. The proposed CTLE adaptation achieves the maximum eye width by utilizing the weighted summation of 32 symbol histograms of the sequential samples. The golden weight is obtained by using the epsilon-constraint optimization-based weight searching algorithm. By sharing edge and data samples, both the referenceless CDR and the CTLE adaptation are achieved without additional analog hardware. The prototype chip fabricated in 28-nm CMOS technology and occupies an active area of 0.029 mm<sup>2</sup>. The measurement results show that the proposed adaptation converges to the optimum CTLE coefficient. The prototype exhibits superior power efficiency of 1.11 pJ/b.

In section IV, a PAM-4 receiver for CMOS image sensor link is presented. A robust PAM-4 PD is proposed, which alleviates the dead zone issue encountered in the conventional sign-sign Mueller-Muller (SS-MM) PD when combined with an adaptive decision feedback equalizer (DFE). An optimum and unique locking point is reached by using a sign-sign minimum mean squared error (SS-MMSE) PD with a biased state which exhibits a constant probability for all phase error ranges. The proposed PD with the biased state evades the dead zone and prevents wandering. Furthermore, the phase-detecting transitions of the proposed SS-MMSE PD are 1.5 times higher compared to the conventional SS-MM PD. The proposed solution is verified with a prototype PAM-4 RX chip using 40-nm CMOS technology. It demonstrates a bit error rate (BER) of less than 10<sup>-9</sup> with a 15.8 dB loss channel. The total power consumption is 46.6 mW at 12 Gb/s, achieving a figure of merit (energy efficiency per channel loss at Nyquist frequency) of 0.24 pJ/b/dB.

#### **Bibliography**

- G. W. d. Besten, "30.1 Single-Pair Automotive PHY Solutions from 10Mb/s to 10Gb/s and Beyond," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 474-476.
- [2] W.-S. Kim, C. Seong, and W.-Y. Choi, "A 5.4-Gbit/s adaptive continuoustime linear equalizer using asynchronous undersampling histograms," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 59, no. 9, pp. 553-557, Sep. 2012.
- [3] H. Won et al., "A 28-Gb/s receiver with self-contained adaptive equalization and sampling point control using stochastic sigma-tracking eye opening monitor," *IEEE Trans. Circuits Syst. I Reg. Papers*, vol. 64, no. 3, pp. 664-674, Mar. 2017.
- [4] D. Yoo, M. Bagherbeik, W. Rahman, A. Sheikholeslami, H. Tamura, and T. Shibasaki, "A 36Gb/s adaptive baud-rate CDR with CTLE and 1-tap DFE in 28nm CMOS," *in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 126-128, 2019.
- [5] S. Shahramian et al., "A 1.41pJ/b 56Gb/s PAM-4 wireline receiver employing enhanced pattern utilization CDR and genetic adaptation algorithms in 7nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 482-484, 2019.

- [6] J. Lee, K. Lee, H. Kim, B. Kim, K. Park, and D.-K. Jeong, "A 0.1-pJ/b/dB 1.62-to-10.8-Gb/s video interface receiver with jointly adaptive CTLE and DFE using biased data-level reference," *IEEE J. Solid-State Circuits*, vol. 55, no. 8, pp. 2186-2195, Aug. 2020.
- [7] Y.-H. Kim, Y. J. Kim, T. Lee, and L. S. Kim, "A 21-Gbit/s 1.63-pJ/bit adaptive CTLE and one-tap DFE with single loop spectrum balancing method," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 2, pp. 789-793, Feb. 2015.
- [8] R. Dokania et al., "10.5 A 5.9pJ/b 10Gb/s serial link with unequalized MM-CDR in 14nm tri-gate CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1-3
- [9] Specifications. [Online]. Available: https://ece.osu.edu/mixed-signal-integrated-circuits-and-systems-lab#publications
- [10] "T11: Basics of Equalization Techniques: Channels, Equalization, and Circuits," in *IEEE International Solid- State Circuits Conference (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 523-526.
- [11] IEEE P802.3bs 400 Gb/s Ethernet Task Force. Accessed: Nov. 10, 2018.
   [Online]. Available: http://www.ieee802.org/3/bs/.
- [12] OIF. [Online]. Available: http://www.oiforum.com

- [13] MIPI A-PHY. [Online]. Available: https://mipi.org/specifications/a-phy
- [14] K. Mueller and M. Muller, "Timing Recovery in Digital Synchronous Data Receivers," *IEEE Trans. Commun.*, vol. 24, no. 5, pp. 516-531, May 1976.
- [15] F. Spagna et al., "A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 366-367.
- [16] R. Dokania et al., "10.5 A 5.9pJ/b 10Gb/s serial link with unequalized MM-CDR in 14nm tri-gate CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1-3.
- [17] H. Ju, K. Lee, K. Park, W. Jung and D. -K. Jeong, "Design Techniques for 48-Gb/s 2.4-pJ/b PAM-4 Baud-Rate CDR with Stochastic Phase Detector," *IEEE J. Solid-State Circuits*, vol. 57, no. 10, pp. 3014-3024, Oct. 2022.
- [18] K. Park et al., "A 6.4-to-32Gb/s 0.96 pJ/b referenceless CDR employing ML-inspired stochastic phase-frequency detection technique in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 124-126, Feb. 2020.
- [19] S. Shahramian, B. Dehlaghi and A. Chan Carusone, "Edge-Based Adaptation for a 1 IIR + 1 Discrete-Time Tap DFE Converging in 5 μs," *IEEE Journal* of Solid-State Circuits, vol. 51, no. 12, pp. 3192-3203, Dec. 2016.

- [20] W. Lee et al., "0.37-pJ/b/dB PAM-4 Transmitter and Adaptive Receiver with Fixed Data and Threshold Levels for 12-m Automotive Camera Link," in *Proc. IEEE Eur. Solid State Circuits Conference (ESSCIRC)*, Sep. 2021, pp. 475-478.
- [21] Haimes, Y. Y., Wismer, D. A., "Integrated system modeling and optimization via quasilinearization," *Journal of Optimization Theory Applications*, vol. 8, pp. 100-109, Aug. 1971.
- [22] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, "A 1.0-4.0-Gb/s all-digital CDR with 1.0-ps period resolution DCO and adaptive proportional gain control," *IEEE J. Solid-State Circuits*, vol.46, no. 2, pp. 424-434, Feb. 2011.
- [23] A. Balachandran, Y. Chen and C. C. Boon, "A 32-Gb/s 3.53-mW/Gb/s Adaptive Receiver AFE Employing a Hybrid CTLE, Edge-DFE and Merged Data-DFE/CDR in 65-nm CMOS," 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 221-224, 2019.
- [24] Y. Lee, W. Lee, M. Shim, S. Shin, W. -S. Choi and D. -K. Jeong, "0.41pJ/b/dB Asymmetric Simultaneous Bidirectional Transceivers with PAM-4 Forward and PAM-2 Back Channels for 5-m Automotive Camera Link," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2022, pp. 30-31.
- [25] A. Roshan-Zamir et al., "A 56 Gb/s PAM4 receiver with low-overhead threshold and edge-based DFE FIR and IIR-tap adaptation in 65nm CMOS," in

Proc. IEEE Custom Integr. Circuits Conf. (CICC), Apr. 2018, pp. 1-4.

- [26] P. -J. Peng, J. -F. Li, L. -Y. Chen and J. Lee, "6.1 A 56Gb/s PAM-4/NRZ transceiver in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 110-111.
- [27] M. Verbeke, G. Torfs and P. Rombouts, "The Truth About 2-Level Transition Elimination in Bang-Bang PAM-4 CDRs," *IEEE Trans. Circuits Syst. I, Regular Papers*, vol. 68, no. 1, pp. 469-482, Jan. 2021.
- [28] J. Im et al., "A 40-to-56 Gb/s PAM-4 Receiver with Ten-Tap Direct Decision-Feedback Equalization in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3486-3502, Dec. 2017.
- [29] M.-C. Choi, H.-G. Ko, J. Oh, H.-Y. Joo, K. Lee, and D.-K. Jeong, "A 0.1pJ/b/dB 28-Gb/s maximum-eye tracking, weight-adjusting MM CDR and adaptive DFE with single shared error sampler," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2020, pp. 1–2.
- [30] W. Jung, K. Lee, K. Park, H. Ju, J. Lee and D. -K. Jeong, "A 48 Gb/s PAM-4 Receiver with Pre-Cursor Adjustable Baud-Rate Phase Detector in 40 nm CMOS," *IEEE J. Solid-State Circuits*, 2022.
- [31] S. Roh, K. Lee, M. Shim, M. -C. Choi and D. -K. Jeong, "A 64-Gb/s PAM-4 Receiver with Transition-Weighted Phase Detector," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 69, no. 9, pp. 3704-3708, Sept. 2022.

- [32] D. Xu et al., "8.5 A Scalable Adaptive ADC/DSP-Based 1.25-to-56Gbps/112Gbps High-Speed Transceiver Architecture Using Decision-Directed MMSE CDR in 16nm and 7nm," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, Feb. 2021, pp. 134-136.
- [33] S. U. H. Qureshi, "Timing recovery for equalized partial-response systems," *IEEE Trans. Commun.*, vol. COM-24, pp. 1326-1331, Dec. 1976.
- [34] P. Roo, R. R. Spencer and P. J. Hurst, "A CMOS analog timing recovery circuit for PRML detectors," *IEEE J. Solid-State Circuits*, vol. 35, no. 1, pp. 56-65, Jan. 2000.
- [35] T. Musah and A. Namachivayam, "Robust Timing Error Detection for Multilevel Baud-Rate CDR," *IEEE Trans. Circuits Syst. I, Regular Papers*, vol. 69, no. 10, pp. 3927-3939, Oct. 2022.
- [36] Y. Lee, W. Lee, M. Shim and D. -K. Jeong, "A Sequential Two-step Algorithm for DC Offset Cancellation of PAM-4 Receiver," in *Proc. Int. SoC Design Conf. (ISOCC)*, Oct. 2021, pp. 379-380.

# 초 록

이 논문은 수신기 설계를 개선하기 위해 CTLE 적응 및 PAM-4 Baudrate 위상 검출기의 사용을 제안합니다. 이 논문은 섹션 II 에서 등화 및 CDR 의 기본 개념을 분석하고 논의합니다. 섹션 III 에서는 확률론적 CTLE 적응을 활용하여 아이 폭을 최대화하는 8-16 Gb/s 의 레퍼런스리스 수신기를 제안합니다. CTLE 적응 기술은 순차적 샘플의 32 개 심볼 히스 토그램의 가중 합산을 통해 epsilon-제약 최적화 기반 가중치 검색 알고리 즘을 사용하여 금색 가중치를 얻는 것을 포함합니다. 엣지 및 데이터 샘 플 공유는 추가 아날로그 하드웨어 없이 레퍼런스리스 CDR 및 CTLE 적 응 모두를 가능하게 합니다. 프로토타입 칩은 0.029 mm²의 유효 면적을 차지하며 28-nm CMOS 기술로 제작되었습니다. 측정 결과 제안된 적응 기 술은 최적의 CTLE 계수를 달성하며 1.11 pJ/b 의 우수한 전력 효율성을 나 타냅니다.

이 논문의 섹션 IV 에서는 CMOS 이미지 센서 링크를 위한 PAM-4 수 신기 설계가 제시됩니다. 제안된 솔루션은 적응형 결정 피드백 이퀄라이 저(DFE)와 결합할 때 전통적인 sign-sign Mueller-Muller(SS-MM) PD 에서 겪 는 데드 존 문제를 해결합니다. 제안된 접근 방식은 바이어스 상태를 가 진 sign-sign 최소 평균 제곱 오차(SS-MMSE) PD 를 사용하여 최적의 유일 한 락킹 포인트를 달성하여 데드 존을 회피하고 불안정성을 방지합니다. 또한, 제안된 SS-MMSE PD 는 전통적인 SS-MM PD 보다 1.5 배 높은 상태 검출 전환을 나타냅니다. 제안된 솔루션은 40-nm CMOS 기술로 제작된 프 로토 타입 PAM-4RX 칩으로 테스트되며, 15.8 dB 손실 채널에서 10-9 미만 의 비트 오류율(BER)을 보입니다. 프로토타입의 총 전력 소비는 12 Gb/s 에서 46.6 mW 이며, Nyquist 주파수에서 채널 손실당 에너지 효율성인 0.24 pJ/b/dB 의 성능 지표를 달성합니다.

주요어 : 적응 이퀄라이제이션, 보드레이트 페이즈 디텍터(PD), 연속 시 간 선형 이퀄라이저(CTLE), 클락 및 데이터 복구(CDR), CMOS 이미지 센 서(CIS) 링크, 데드 존, 결정 피드백 이퀄라이저(DFE), 이프실론-제한 최적 화, 4 펄스 진폭 변조(PAM-4), 프로그램 가능 증폭기(PGA), 수신기(RX), 레퍼런스리스, 사인-사인 최소 평균 제곱 오차(SS-MMSE) PD, 사인-사인 뮐러-멀러(SS-MM) PD.

학 번 : 2018-24147