



Ph.D. DISSERTATION

# A Study on High-Speed Wireline Receivers with Adaptive Equalization and Clock-and-Data Recovery

적응 제어 균등화와 클럭-데이터 복원 기술을 적용한 고속 유선 수신기에 관한 연구

BY

Lee Sanghee

### AUGUST 2023

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING COLLEGE OF ENGINEERING SEOUL NATIONAL UNIVERSITY Ph.D. DISSERTATION

# A Study on High-Speed Wireline Receivers with Adaptive Equalization and Clock-and-Data Recovery

적응 제어 균등화와 클럭-데이터 복원 기술을 적용한 고속 유선 수신기에 관한 연구

BY

Lee Sanghee

### AUGUST 2023

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING COLLEGE OF ENGINEERING SEOUL NATIONAL UNIVERSITY

## A Study on High-Speed Wireline Receivers with Adaptive Equalization and Clock-and-Data Recovery

## 적응 제어 균등화와 클럭-데이터 복원 기술을 적용한 고속 유선 수신기에 관한 연구

## 지도교수 정 덕 균 이 논문을 공학박사 학위논문으로 제출함

### 2023년 8월

서울대학교 대학원

전기·정보 공학부

## 이상희

이상희의 공학박사 학위 논문을 인준함

### 2023년 8월



## A Study on High-Speed Wireline Receivers with Adaptive Equalization and Clock-and-Data Recovery

BY

Lee Sanghee

A Dissertation Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at

SEOUL NATIONAL UNIVERSITY

### AUGUST 2023

Committee in Charge:

Professor Suhwan Kim, Chairman

Professor Deog-Kyoon Jeong, Vice-Chairman

Professor Dongsuk Jeon

Professor Yongsam Moon

Professor Young-Ha Hwang

## Abstract

This dissertation presents a study on high-speed wireline receivers with adaptive equalization and clock-and-data recovery. The introduction of the unique structural characteristics specific to high-speed wireline receivers and the analysis of prior research provide circuit-level insights. Furthermore, the proposal of the gradient maximumeye-tracking algorithm (GMET) offers a means to optimize the performance of the receiver. By optimizing the widely used gradient ascent method for robust operation, GMET achieves low computational power and high stability. Also, simultaneous adaptation using the unified algorithm enables low power consumption and low design complexity. The prototype receiver is fabricated in 28-nm CMOS and occupies an active area of 0.106 mm<sup>2</sup>. The receiver includes a baud-rate clock and data recovery and 2-tap adaptive decision feedback equalization. Compared with the previous baud-rate sampling CDR, superior performance is demonstrated under high-loss conditions. At the data rate of 28 Gb/s, the receiver consumes 47 mW, corresponding to an energy efficiency of 1.68 pJ/b. Furthermore, with a channel loss of 27.7 dB at the Nyquist frequency, joint operation of the CDR and the DFE adaptation offers measured results with a margin of 0.17 UI at a BER lower than  $10^{-12}$ .

**keywords**: Clock-and-data recovery (CDR), Equalization, high-speed links, wireline communication, maximum-eye-tracking (MET), adaptive equalization, receiver (RX) **Student Number**: 2018-29272

# Contents

| Al | bstrac  | rt      |                                    | i  |
|----|---------|---------|------------------------------------|----|
| Co | onten   | ts      |                                    | ii |
| Li | st of ' | Fables  |                                    | iv |
| Li | st of ] | Figures |                                    | v  |
| 1  | Intr    | oductio | n                                  | 1  |
|    | 1.1     | Motiva  | ation                              | 1  |
|    | 1.2     | Thesis  | organization                       | 4  |
| 2  | Bac     | kgroun  | d of High-Speed Wireline Receivers | 5  |
|    | 2.1     | Overv   | iew                                | 5  |
|    | 2.2     | Clock   | and-Data Recovery                  | 6  |
|    |         | 2.2.1   | Signal-Clock Synchronization       | 6  |
|    |         | 2.2.2   | Clocking Architecture              | 9  |
|    |         | 2.2.3   | Clock-and-Data Recovery            | 12 |
|    |         | 2.2.4   | Jitter Tolerance of the CDRs       | 14 |
|    | 2.3     | Equali  | zation                             | 26 |
|    |         | 2.3.1   | CTLE                               | 26 |
|    |         | 2.3.2   | FFE                                | 31 |
|    |         | 2.3.3   | DFE                                | 34 |

| 3  | Prio | r Work                | s on CDR and EQ Adaptation          | 39 |
|----|------|-----------------------|-------------------------------------|----|
|    | 3.1  | Overvi                | ew                                  | 39 |
|    | 3.2  | Clock-                | and-Data Recovery                   | 40 |
|    |      | 3.2.1                 | Bang-Bang CDR                       | 40 |
|    |      | 3.2.2                 | Blind Oversampling CDR              | 41 |
|    |      | 3.2.3                 | Mueller-Müller CDR                  | 42 |
|    |      | 3.2.4                 | Minimum Mean Squared Error CDR      | 45 |
|    |      | Sub-rate Sampling CDR | 47                                  |    |
|    | 3.3  | EQ ad                 | aptation                            | 48 |
|    |      | 3.3.1                 | Least Mean Square                   | 48 |
|    |      | 3.3.2                 | BER-based Adaptation                | 50 |
|    |      | 3.3.3                 | EOM-based Adaptation                | 51 |
| 4  | CDI  | R and D               | FE Adaptation with                  |    |
|    | Gra  | dient M               | laximum-Eye-Tracking                | 52 |
|    | 4.1  | Overvi                | iew                                 | 52 |
|    | 4.2  | Vertica               | al Eye Height                       | 53 |
|    | 4.3  | Biased                | Data Level                          | 58 |
|    | 4.4  | Gradie                | ent Maximum Eye Tracking            | 62 |
|    |      | 4.4.1                 | Sampling Phase Adaptation with GMET | 64 |
|    |      | 4.4.2                 | DFE Adaptation with GMET            | 68 |
|    |      | 4.4.3                 | Simultaneous Adaptation with GMET   | 70 |
|    | 4.5  | Circuit               | Implementation                      | 72 |
|    | 4.6  | Measu                 | rement Results                      | 79 |
| 5  | Con  | clusion               |                                     | 85 |
| BI | BLIC | OGRAP                 | НҮ                                  | 88 |
| 국  | 문초   | 록                     |                                     | 96 |

# **List of Tables**

| 2.1 | Classification of signal-clock synchronization.                 | 6  |
|-----|-----------------------------------------------------------------|----|
| 2.2 | Classification of CDRs                                          | 13 |
| 2.3 | Values of $\alpha$ according to the target bit-error-rate (BER) | 15 |
| 4.1 | Performance Summary and Comparison.                             | 84 |

# **List of Figures**

| 1.1 | Annual size of the Global Datasphere from [1]                                         | 1  |
|-----|---------------------------------------------------------------------------------------|----|
| 1.2 | (a) Doubled data rate of PCIe [2] and (b) data rate increase of various               |    |
|     | applications                                                                          | 2  |
| 1.3 | (a) Various frequency dependent losses of video cables [3] and (b)                    |    |
|     | power efficiency versus channel loss at Nyquist frequency [4]                         | 3  |
| 2.1 | Classification of signal-clock synchronization: (a) synchronous, (b)                  |    |
|     | mesochronous, (c) plesiochronous, (d) periodic, and (e) asynchronous.                 | 7  |
| 2.2 | Cases of embedded clocking: (a) shared reference clock (mesochronous),                |    |
|     | and (b) distinct reference clocks (plesiochronous)                                    | 10 |
| 2.3 | Cases of forwarded clocking: (a) source-synchronous, and (b) meso-                    |    |
|     | chronous                                                                              | 11 |
| 2.4 | Reconstruction of the analog signal with various sampling rate, $f_{\text{Sample}}$ . | 12 |
| 2.5 | Block diagram of the CDR circuit.                                                     | 13 |
| 2.6 | Data eye diagram with substantial random jitter: (a) Ideal, (b) 100 cy-               |    |
|     | cles, (c) 1,000 cycles, and (d) 10,000 cycles                                         | 14 |
| 2.7 | Gaussian distribution of jitter                                                       | 15 |
| 2.8 | Effect of (a) slowly varying jitter, and (b) rapidly varying jitter                   | 17 |
| 2.9 | Block diagram of the conventional PLL-based CDR.                                      | 18 |

| 2.10 | (a) Jitter transfer functions of the embedded clocking CDR with vari-                                                      |    |
|------|----------------------------------------------------------------------------------------------------------------------------|----|
|      | ous corner frequencies ( $\omega_{-3dB}$ ), and (b) the corresponding jitter toler-                                        |    |
|      | ance curves.                                                                                                               | 20 |
| 2.11 | The time domain jitter profile of the embedded clocking CDR                                                                | 21 |
| 2.12 | $JTOL$ curves of forwarded clocking architecture with different $T_{skew}$ .                                               | 23 |
| 2.13 | Block diagram of the DLL-based CDR                                                                                         | 24 |
| 2.14 | Input and output clocks of DLL.                                                                                            | 24 |
| 2.15 | (a) RC passive linear EQ, and (b) RC-degenerated active linear EQ. $% \left( {{{\bf{R}}_{{\rm{s}}}}_{{\rm{s}}}} \right)$ . | 26 |
| 2.16 | SBR: (a) without EQ, (c) with the passive linear EQ, (e) with the RC-                                                      |    |
|      | degenerated active linear EQ and data eye diagram: (b) w/o EQ, (d)                                                         |    |
|      | with the passive linear EQ, (f) with the RC-degenerated active linear EQ.                                                  | 29 |
| 2.17 | (a) Conventional 2-stage CTLE, (b) conventional RC-degenerated CTLE                                                        |    |
|      | with inductive peaking, (c) feed-forward CTLE, and (d) CTLE with                                                           |    |
|      | Cherry-Hooper topology.                                                                                                    | 30 |
| 2.18 | Block diagram of the conventional FIR filter.                                                                              | 31 |
| 2.19 | (a) SBR w/o EQ, (b) data eye diagram w/o EQ, (c) SBR with the 3-tap                                                        |    |
|      | FFE, and (d) data eye diagram with 3-tap FFE                                                                               | 33 |
| 2.20 | Block diagram of the conventional DFE                                                                                      | 34 |
| 2.21 | (a) SBR of the 15-dB loss channel, (b) corresponding data eye dia-                                                         |    |
|      | gram, (c) SBR of the 15-dB channel with 3-tap DFE, and (d) data eye                                                        |    |
|      | diagram with 3-tap DFE                                                                                                     | 35 |
| 2.22 | Block diagram of the loop-unrolled DFE                                                                                     | 37 |
| 3.1  | (a) BBPD update table and (b) lock point of BBPD.                                                                          | 40 |
| 3.2  | (a) Bit boundary detection and data sample selection of 5x blind over-                                                     |    |
|      | sampling CDR and (b) probability distributions of sampled data.                                                            | 41 |
| 3.3  | The timing recovery principle of MM CDR.                                                                                   | 43 |
| 3.4  | (a) SS-MMPD update table and (b) lock point of SS-MMPD                                                                     | 44 |
|      | L L L L L L L L L L L L L L L L L L L                                                                                      |    |

| 3.5  | (a) Data recovery and (b) clock recovery of the sub-rate sampling CDR          |    |
|------|--------------------------------------------------------------------------------|----|
|      | from [32]                                                                      | 47 |
| 3.6  | (a) BER-based adaptation algorithm from [37] and (b) from [38]                 | 50 |
| 3.7  | (a) EOM to detect the effective eye-opening area and (b) example of            |    |
|      | the EOM operation: EOM output versus sampling timing with various              |    |
|      | threshold voltages.                                                            | 51 |
| 4.1  | Simulated (a) BER and (b) voltage margin contour and the conver-               |    |
|      | gence points of SS-LMS, maximum voltage margin, and minimum BER.               | 53 |
| 4.2  | Model of transmission line: (a) lumped RLGC and (b) frequent depen-            |    |
|      | dent lossy model.                                                              | 54 |
| 4.3  | Example of SBR and discrete cursor value                                       | 55 |
| 4.4  | Eye level dispersion according to the cursors.                                 | 57 |
| 4.5  | Divided eye levels by $h_{-1}$ and $h_{resi,max}$                              | 59 |
| 4.6  | Bdlev is lowered from $h_0 -  h_{-1} $ by $\Delta d$ when there exists AWGN    | 61 |
| 4.7  | Normalized $h_{-1}$ versus normalized $\Delta d$                               | 61 |
| 4.8  | Flow chart of the control code for coefficient, $C$                            | 63 |
| 4.9  | Illustration of sampling phase adaption with GMET                              | 64 |
| 4.10 | Measured VEH according to sampling phase through behavior simu-                |    |
|      | lation                                                                         | 64 |
| 4.11 | Simulated convergence point of MM CDR and GMET CDR on (a)                      |    |
|      | SBR and (b) eye diagram                                                        | 66 |
| 4.12 | Behavior simulation results over (a) various update gain $\alpha$ and (b) var- |    |
|      | ious patterns with different run lengths.                                      | 67 |
| 4.13 | Illustration of DFE adaptation with GMET.                                      | 68 |
| 4.14 | Measured VEH according to $w_1$ through behavior simulation                    | 69 |
| 4.15 | Lock point comparison of CDR with and without simultaneous 2-tap               |    |
|      | DFE adaptation using GMET on (a) SBR and (b) eye diagram                       | 71 |
| 4.16 | Block diagram of the prototype receiver.                                       | 72 |

| 4.17 | (a) Die photograph and (b) power consumption                                  | 73 |
|------|-------------------------------------------------------------------------------|----|
| 4.18 | (a) Schematic of CTLE with Cherry-Hooper topology and AC simula-              |    |
|      | tion corresponding to (b) $R_{CTRL}$ and (c) $V_{CTRL}$ .                     | 75 |
| 4.19 | Schematic of (a) the comparator and simulation results: (b) Delay for         |    |
|      | $1^{st}$ and $2^{nd}$ tap feedback data and (c) Offset Monte Carlo simulation | 76 |
| 4.20 | Schematic of (a) CML summer with 2-tap feedback and (b) Strong-               |    |
|      | ARM comparator.                                                               | 77 |
| 4.21 | Schematic of (a) PI and simulation results: (b) delay and (c) linearity.      | 78 |
| 4.22 | Measurement setup                                                             | 79 |
| 4.23 | Measured insertion channel losses: SMA cable, channel emulation board         |    |
|      | and FR4 trace                                                                 | 80 |
| 4.24 | Measured eye diagrams (a) from PPG, (b) after the channel                     | 80 |
| 4.25 | Measured jitter tolerance curve for BER $< 10^{-12}$                          | 81 |
| 4.26 | Measured 7-GHz recovered clock and its jitter histogram                       | 81 |
| 4.27 | Measured bathtub curves without DFE and with DFE                              | 82 |
| 4.28 | Measured Bdlev codes over sampling phases without DFE and with                |    |
|      | DFE                                                                           | 83 |

## **CHAPTER 1**

#### **INTRODUCTION**

### 1.1 Motivation

In today's economy, "Data" has become incredibly important. Companies are collecting vast amounts of consumer data to enhance customer experiences, introduce new business models, and individuals are seamlessly experiencing numerous social media platforms, entertainment options, and real-time personalized services in their daily lives [1]. Furthermore, the increasing importance of data in the future is evident. Figure 1.1 illustrates the growth and projected trends of IDC's Global Datashpere.



Figure 1.1: Annual size of the Global Datasphere from [1]



Figure 1.2: (a) Doubled data rate of PCIe [2] and (b) data rate increase of various applications.

The demand for higher bandwidth in data centers and telecommunication infrastructure drives the advancements in high-speed interfaces to handle the ever-growing volume of data. Therefore, next-generation interface applications and research are also focusing on the need for higher bandwidth requirements as shown in Figure 1.2. Despite the process integration and various techniques for high-speed operation, wireline communication faces bandwidth limitations due to the physical transmission medium, such as cables and interposers, used to transmit data. Figure 1.3 (a) illustrates the frequency-dependent loss of channels with various lengths. Non-idealities arising from the channel, including frequency-dependent loss, skin effect, dielectric loss, reflections, and crosstalk, degrade signal integrity and make the signal more susceptible to various types of noise. Moreover, the channel loss necessitates additional circuits for compensation and recovery. Therefore, as the data rate increases and frequencydependent loss rises, energy consumption and the occupied area also increase proportionally, Figure 1.3 (b).

Clock and Data Recovery and Equalization are the most crucial and challenging aspects of high-speed receiver design. CDR plays a vital role in robustly sampling and recovering data from noise and intersymbol interference, adjusting the clock timing accordingly. Equalization compensates for channel non-idealities and restores



Figure 1.3: (a) Various frequency dependent losses of video cables [3] and (b) power efficiency versus channel loss at Nyquist frequency [4].

high-frequency components. Emphasizing the importance of one circuit over the other is meaningless because both are essential for successful communication. While research on CDR and Equalization has been extensively introduced independently, research on optimizing both circuits simultaneously to achieve the receiver's maximum performance has been relatively scarce. Furthermore, previous studies often require excessive time for simultaneous optimization or demand additional circuits for algorithm implementation and computation, resulting in degradation in terms of energy efficiency and occupied area. In this thesis, a gradient maximum eye tracking algorithm for a sampling phase adjustment and optimizing equalization coefficients is proposed. This approach aims to resolve the aforementioned problems and achieve optimal performance in terms of energy efficiency and area utilization. The thesis begins with an analysis of previous studies, their limitations, and the existing challenges. After that, an analysis of the proposed algorithm and discussions on its circuit-level implementation are introduced.

#### 1.2 Thesis organization

This dissertation is organized as follows. In Chapter 2, the background of highspeed wireline receivers is provided, especially an overview of the CDRs and equalizations. High-speed wireline receivers are classified based on their clocking architectures, and the CDR methods appropriate for each architecture and their jitter characteristics are analyzed. Furthermore, the widely used equalizers (EQs) in receivers, such as continuous-time linear equalizers (CTLEs), feed-forward equalizers (FFEs), and decision feedback equalizers (DFEs), are introduced in terms of their structures, principles, and pros and cons.

In Chapter 3, prior works on CDR and EQ adaptation are presented. The CDR methods are classified based on the sampling rate of the data, and representative structures from oversampling, baud-rate sampling, and sub-rate sampling CDR are selected and introduced. Also, from the least mean square algorithm, the most conventional and widely used adaptation method, to the eye-opening monitor method, EQ adaptation algorithms are introduced and evaluated in terms of their effectiveness, practicality, and limitations.

The proposed gradient maximum eye tracking algorithm and in-depth analysis of the algorithm are provided in Chapter 4. Specifically, the chapter discusses the stability and strategies for effective hardware implementation of the algorithm, as well as the circuit-level implementation details for a prototype receiver. Behavior simulations and measurement results demonstrate the performance of the algorithm.

Chapter 5 summarizes the proposed works and concludes this dissertation.

## **CHAPTER 2**

### **BACKGROUND OF HIGH-SPEED WIRELINE RECEIVERS**

#### 2.1 Overview

In this chapter, an overview of the background knowledge and key circuits employed in high-speed receivers is provided. Firstly, the differences and characteristics of receiver structures based on the correlation between clock generation and the received data are introduced. Next, clock-and-data recovery (CDR) is discussed, which extracts the optimal sampling timing information from the received signal and recovers the data using the optimized clock. The differences in CDR methods, based on the receiver structure, are analyzed from the perspective of their jitter characteristics. At the last, equalization, which compensates for input loss and restores the frequency response. is discussed. Various widely used equalization methods in high-speed receivers are thoroughly analyzed, along with their advantages and limitations.

#### 2.2 Clock-and-Data Recovery

#### 2.2.1 Signal-Clock Synchronization

In order for two distinct digital systems to communicate successfully, their clocks must be precisely synchronized. When there is a difference in the clocks, this difference should be calibrated continuously or periodically. If small offsets are accumulated over time, the optimal sampling timing for transmitted data and the receiving clock may be disrupted, potentially resulting in catastrophic errors. Table 2.1 shows the classification of signal-clock synchronization [5]. When two systems communicate, there are five possible relationships between the transmitted data and the receiving clock: synchronous, mesochronous, plesiochronous, periodic, and asynchronous, Figure 2.1. Typically, synchronization of the clock and data can be achieved in three cases: synchronous, mesochronous, and plesiochronous.

Synchronous clocking refers to the case where the frequency and phase of the clock in the receiving end match exactly with those of the transmitted data. This means that no additional manipulation is required for clock frequency or phase for data sampling. Synchronous systems are the simplest and most efficient method for wireline communication when the data bandwidth is low or there is minimal channel loss, but they may not be suitable for serial communication with high data rates. If there is a

| Classification | Synchronous | Mesochronous | Mesochronous Plesiochronous |              | Asynchronous |  |
|----------------|-------------|--------------|-----------------------------|--------------|--------------|--|
| Periodicity    | 0           | 0            | 0                           | 0            | X            |  |
| $\Delta \phi$  | 0           | $\phi_c$     | Varies                      | -            | -            |  |
| $\Delta f$     | 0           | 0            | $< \epsilon$                | $> \epsilon$ | -            |  |

Table 2.1: Classification of signal-clock synchronization.



Figure 2.1: Classification of signal-clock synchronization: (a) synchronous, (b) mesochronous, (c) plesiochronous, (d) periodic, and (e) asynchronous.

data path and clock path mismatch or inter-symbol interference (ISI), the sampling timing margin may be reduced, resulting in significant performance degradation. Additionally, synchronous clocking architectures are vulnerable to local variations in the system, making them difficult to apply in off-chip communication. For these reasons, synchronous clocking architectures are primarily used in relatively low-speed, lowloss applications such as on-chip parallel communication.

Mesochronous clocking refers to the case where the frequencies of the transmitted data and the receiver clock are the same but there is a phase difference between them. Various phase shift schemes such as phase-locked loop (PLL), injection-locked oscillator (ILO), delay-locked loop (DLL), voltage controlled delay line (VCDL), and phase interpolator (PI) are used to align the phase of the receiver's clock with the optimal sampling phase of the input data, and delay adjustment is performed manually or periodically/continuously depending on the application. If there is no variation such as voltage and temperature drift or transistor aging, the transmitter and receiver phase can be aligned through a single training, but in reality, non-ideal variations over time exist, so background calibration is required.

Plesiochronous clocking architecture refers to the case where there is a slight frequency difference ( $\Delta f < \epsilon$ ) between the transmitted data and the receiver clock. In order to synchronize the two systems, not only the phase information of the transmitter clock from the data, but also the frequency information must be extracted. Unlike mesochronous clocking architecture, it requires additional hardware for frequency detecting and tracking, which significantly increases the design complexity of the receiver.

#### 2.2.2 Clocking Architecture

When a transmitter and a receiver are connected, there are two possible structures that can be classified depending on whether the clock is transmitted together with the data: embedded clocking architecture and forwarded clocking architecture.

In the case of embedded clocking, Figure 2.2, there is no extra clock channel, making it primarily used in narrow interfaces. Since the transmitter and receiver each need to generate their own clock, clock generation circuits are required, and it is crucial to synchronize the transmitter and receiver clocks. Especially for the receiver, a CDR circuit that can track not only phase but also frequency is essential to overcome mesochronous (shared reference clock) or plesiochronous (separate reference clock or referenceless) clocking states. Embedded clocking circuits are susceptible to data noise and ISIs and have poor jitter tolerance due to the lack of correlated jitter between data and clock.

In the case of forwarded clocking, Figure 2.3, extra channels are required to transmit clocks simultaneously with data, unlike embedded clocking. However, it eliminates the need for frequency tracking and allows for a simpler clocking circuit in the receiver, resulting in reduced power and area. If the transmitter and receiver are in a synchronous state, no clocking circuit may be needed at all, and in a mesochronous state, only a clocking circuit capable of compensating for phase mismatches between data and clock is required. Forwarded clocking exhibits better jitter tolerance characteristics than embedded clocking due to the presence of correlated jitter between data and clock.



(a)



Figure 2.2: Cases of embedded clocking: (a) shared reference clock (mesochronous), and (b) distinct reference clocks (plesiochronous).



(a)



(b)

Figure 2.3: Cases of forwarded clocking: (a) source-synchronous, and (b) meso-chronous.

#### 2.2.3 Clock-and-Data Recovery

In wireline communication, as mentioned earlier, two different systems must synchronize their clocks to communicate without problems. To achieve this, the "clockand-data recovery" process extracts clock information from the received data to allow synchronous operation and retimes the data to remove accumulated jitter and noise [6]. According to the Nyquist-Shannon sampling theorem [7], if an input signal is a bandlimited function, which does not contain frequencies faster than  $f_{Nyquist}/2$  Hz, the signal can be completely reconstructed if it is sampled at a rate of at least  $f_{Nyquist}$  Hz.

However, in order to implement this in actual circuitry, it requires sophisticated analog-to-digital converters (ADCs) that operate at high speeds and huge digital hardware capable of performing computations, which leads to increased latency, power consumption, and area. Fortunately, in wireline communication, the goal of the receiver is not to perfectly reconstruct the original analog signal, but to extract the discrete logical data contained in the input signal, which can be achieved without largescale circuits such as ADC as long as appropriate sampling rate and phase are set. Therefore, what is important in the receiver is the methodology to determine the op-



(a) Original signal



Figure 2.4: Reconstruction of the analog signal with various sampling rate,  $f_{\text{Sample}}$ .

timal sampling phase and extract the data without error from the input signal with minimal hardware and power consumption. The conventional structure of a CDR circuit is depicted in Figure 2.5. It utilizes timing information and sampled data information from the input signal in a complementary manner to obtain optimal timing and error-free data.



Figure 2.5: Block diagram of the CDR circuit.

CDR can be classified into oversampling, baud-rate sampling, and sub-rate sampling depending on the relationship between the data rate and sampling rate, Table 2.2. In general, the robustness of CDR against non-idealities, such as noise and jitter that are present along with the signal, is a key factor in determining the performance of CDR methodologies.

Table 2.2: Classification of CDRs.

| Classification                             | Oversampling                            | Baud-rate sampling                      | Sub-rate sampling                 |  |  |
|--------------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------|--|--|
| $f_{\text{Data}}$ vs $f_{\text{Sampling}}$ | $f_{\text{Data}} < f_{\text{Sampling}}$ | $f_{\text{Data}} = f_{\text{Sampling}}$ | $f_{\rm Data} > f_{\rm Sampling}$ |  |  |
| Data Recovery*                             | 0                                       | 0                                       | Х                                 |  |  |
| Clock recovery*                            | 0                                       | Only phase recovery                     | Х                                 |  |  |

\* Without an integrator, only with samplers.

#### 2.2.4 Jitter Tolerance of the CDRs

Jitter is a critical factor to consider in communication systems as it can significantly affect the performance of the system, Figure 2.6. When the input data contains substantial jitter, it can lead to a loss of synchronization between the clock and data, causing errors in communication. Moreover, jitter can also arise from various sources such as power supply noise, device noise, and other environmental factors, making it challenging to predict the exact amount of jitter present in a system. To mitigate the impact of jitter on the system, it is crucial to understand its characteristics and quantify it with appropriate statistical approaches.

The RMS jitter  $(J_{RMS})$  is a commonly used measure to determine the size of the jitter in a system, which is calculated based on the mean and standard deviation of



(a)

(b)



Figure 2.6: Data eye diagram with substantial random jitter: (a) Ideal, (b) 100 cycles, (c) 1,000 cycles, and (d) 10,000 cycles.

the jitter distribution. If the noise sources are truly random and uncorrelated, the RMS jitter follows a Gaussian distribution and is unbounded, Figure 2.7.

As the RMS jitter is unbounded, it is almost impossible to create a completely error-free system. On the other hand, quantifying the maximum jitter expected to occur probabilistically within a predetermined error rate would be a practical and reasonable approach. Peak-to-peak jitter ( $J_{PP}$ ) is another method metric to quantify jitter which refers to the maximum jitter size that exceeds a specified error rate. The relationship between RMS jitter and peak-to-peak jitter is expressed by the following equation, Equation (2.1), where the constant,  $\alpha$ , is determined according to the error rate, as organized in the table, Table 2.3.

$$J_{PP} = \alpha J_{RMS} \tag{2.1}$$



Figure 2.7: Gaussian distribution of jitter.

Table 2.3: Values of  $\alpha$  according to the target bit-error-rate (BER).

| BER | $10^{-3}$ | $10^{-4}$ | $10^{-5}$ | $10^{-6}$ | $10^{-7}$ | $10^{-8}$ | $10^{-9}$ | $10^{-10}$ | $10^{-11}$ | $10^{-12}$ |
|-----|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|------------|
| α   | 6.180     | 7.438     | 8.530     | 9.507     | 10.399    | 11.224    | 11.996    | 12.723     | 13.412     | 13.412     |

#### **Embedded** Clocking

Embedded clocking CDRs are generally constructed using PLL-based CDRs due to the phase tracking and noise rejection characteristics of PLLs, and their jitter tolerance characteristics are not significantly different from those of PLLs. As the jitter transfer function of a PLL exhibits a low-pass filter characteristic, the jitter transfer function of a CDR also exhibits a low-pass filter. If the jitter frequency is less than the CDR loop bandwidth, i.e., for slowly varying jitter, the recovered clock will track the changes in the phase of the data. Therefore, sampling would always occur at the center of the data eye, resulting in a low BER as shown in Figure 2.8 (a). Conversely, if the jitter frequency is higher than the CDR loop bandwidth, i.e., for rapidly varying jitter, the recovered clock will not be able to track the changes in the phase of the data fast enough to guarantee optimal data sampling, resulting in a high BER as shown in Figure 2.8 (b). Assuming an infinite phase detector (PD) gain and no voltage noise on the input data, the moment when the BER increases occurs when the phase difference, or phase error, between the input data ( $\phi_{in}$ ) and the clock ( $\phi_{out}$ ) exceeds 0.5 UI and the data is sampled after the zero-crossing point. In other words, the approximate condition for avoiding bit errors is as follows,

$$-\frac{1}{2} \text{ UI} < \phi_{in} - \phi_{out} < \frac{1}{2} \text{ UI}.$$
 (2.2)

Equivalently,

$$-\frac{1}{2} \operatorname{UI} < \phi_{in}(1 - H(s)) < \frac{1}{2} \operatorname{UI},$$
(2.3)



Figure 2.8: Effect of (a) slowly varying jitter, and (b) rapidly varying jitter.

where  $H(s)=\phi_{out}/\phi_{in},$  jitter transfer function, and hence,

$$-\frac{0.5 \text{ UI}}{1-H(s)} < \phi_{in} < \frac{0.5 \text{ UI}}{1-H(s)}.$$
(2.4)

Therefore, jitter tolerance, JTOL, can be expressed as,

$$JTOL(s) = \frac{1}{1 - H(s)}.$$
 (2.5)

The jitter transfer function of the embedded clocking CDR can be modeled similarly to the jitter transfer function of the PLL. Let us assume the PLL-based CDR structure illustrated in Figure 2.9.



Figure 2.9: Block diagram of the conventional PLL-based CDR.

The jitter transfer function of the CDR is expressed as,

$$H(s) = \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^s},$$

$$\omega_n = \sqrt{\frac{I_p K_{VCO}}{2\pi C_{LF}}}, \ \zeta = \frac{\sqrt{I_p K_{VCO} C_{LF}} R_{LF}}{2\sqrt{2\pi}},$$
(2.6)

where,  $I_p$ ,  $K_{VCO}$ ,  $R_{LF}$ , and  $C_{LF}$  are the charge pump gain, the voltage-controlled

oscillator (VCO) gain, the resistance value consisting the loop filter, and the capacitance value consisting the loop filter, respectively. Therefore,  $JTOL_{emb}$  is expressed as the following equation and the simulations with various bandwidths are illustrated in Figure 2.10,

$$JTOL_{emb}(s) = \frac{s^2 + 2\zeta\omega_n s + \omega_n^2}{s^2}.$$
(2.7)

Then, the jitter tolerance bandwidth  $\omega_{3dB}$  where the peak-to-peak jitter becomes 1-UI is determined as,

$$\omega_{3dB} \simeq \frac{I_p K_{VCO} R_{LF}}{2\pi} = 2\zeta \omega_n. \tag{2.8}$$

Note that the jitter tolerance bandwidth determines the tradeoff between the jitter tracking ability and the jitter transfer. In other words, if a CDR exhibits high jitter tracking characteristics, the recovered clock will reflect a similar level of jitter, resulting in a decrease in the quality of the overall system clock. On the other hand, when the recovered clock contains minimal jitter, it exhibits a stubborn nature against input jitter.

In practical systems, jitter tolerance often has a value of less than 1 UI. This is due to the spread of data caused by ISI from input loss and random timing noise, leading to a reduction in effective eye width. Conversely, jitter tolerance enables the estimation of the magnitude of ISI and the noise level in the input data.



(a)



(b)

Figure 2.10: (a) Jitter transfer functions of the embedded clocking CDR with various corner frequencies ( $\omega_{-3dB}$ ), and (b) the corresponding jitter tolerance curves.

#### **Forwarded Clocking**

The forwarded clocking architecture differs from the embedded clocking architecture in that it does not require internal clock generation and the clock is transmitted together with the data from the source [8]. If there is a skew  $(T_{skew})$  between the data and clock paths, and assuming that only fully correlated sinusoidal jitter is contained, the jitter profile in the time domain is depicted in Figure 2.11.



Figure 2.11: The time domain jitter profile of the embedded clocking CDR.

The timing error that can be caused by jitter is as follows,

$$|J_{data}(t) - J_{clk}(t)| = |A_j cos(2\pi f_j t) - A_j cos\{2\pi f_j (t - T_{skew})\}|$$
  
=  $|2A_j sin\{2\pi f_j (t - \frac{T_{skew}}{2})\}sin(\pi f_j T_{skew})|.$  (2.9)

Then  $t_{max}$ , when the timing error is maximized, is expressed as,

$$t_{max} = \frac{1}{2}(T_{skew} + \frac{T_j}{2}), \qquad (2.10)$$

when,

$$\sin\{2\pi f_j(t - \frac{T_{skew}}{2})\} = 1.$$
(2.11)

As like the embedded clocking CDR, the maximum jitter that the forwarded clocking CDR can tolerate without the error is  $\pm$  0.5 UI. From Equation (2.9), the peak-to-peak jitter boundary can be expressed as follows,

$$J_{pp} = 2A_j < \frac{0.5 \, UI}{\sin(\pi f_j T_{skew})}.$$
 (2.12)

Therefore, the jitter tolerance for the forwarded clocking CDRs,  $JTOL_{fwd}$ , can be expressed as following equation and simulated results are on Figure 2.12,

$$JTOL_{fwd} = \frac{0.5 \, UI}{\sin(\pi f_j T_{skew})}.$$
(2.13)

The corner frequency where  $JTOL_{fwd}$  becomes 1 UI can be obtained as follows,



 $f_{j,corner} = \frac{1}{6T_{skew}}.$ (2.14)

Figure 2.12: JTOL curves of forwarded clocking architecture with different  $T_{skew}$ .

As in the previous embedded clocking CDR, there is another approach using the jitter transfer function to analyze the jitter tolerance of the forwarded clocking CDRs. In forwarded clocking architecture, DLL is typically implemented to adjust the received clock to the optimum phase. Figure 2.13 shows a block diagram of conventional DLL-based CDRs.


Figure 2.13: Block diagram of the DLL-based CDR.

As shown in Figure 2.14, the output clock of the DLL is the delayed input clock. The transfer function of the DLL , G(s) is expressed as,

$$G(s) = \frac{D_{out}(s)}{D_{in}(s)} = \frac{1}{1 + s/\omega_{DLL}},$$

$$\omega_{DLL} = \frac{I_p}{2\pi} \frac{K_{VCDL}\omega_{In}}{R_{LF}C_{LF}}.$$
(2.15)



Figure 2.14: Input and output clocks of DLL.

It is noteworthy that the DLL is the  $1^{st}$  order system because there is no pole in the VCDL, which reduces the stability issue, so that the loop filter is sufficient to be an integrator or a  $1^{st}$  order low pass filter. Then, using the following relations,

$$D_{in}(s) = \omega_{In}[\phi_{in}(s)e^{sT} - \phi_{in}(s)]$$

$$\phi_{out}(s) = \phi_{in}(s)e^{sT} - \omega_{In}D_{out}(s),$$
(2.16)

the jitter transfer function of the DLL-based CDR can be expressed as,

$$H(s) = \frac{\phi_{out}(s)}{\phi_{in}(s)} = \frac{1 + s/\omega_{DLL}e^{sT}}{1 + s/\omega_{DLL}}.$$
(2.17)

As well known, the jitter transfer function is a weak high-pass filter, which is close to an all-pass filter [9]. The jitter tolerance criterion is the same as the previous analysis, Equation (2.5), therefore it can be expressed as,

$$JTOL(s) = \frac{1 + s/\omega_{DLL}}{(1 - e^{sT})s/\omega_{DLL}}.$$
(2.18)

## 2.3 Equalization

Various equalization methods are utilized in high-speed wireline to address the non-idealities of the channel, such as frequency-dependent loss, crosstalk, and re-flection. Equalizers (EQs) applied in receivers specifically target the compensation of channel attenuation in the received signal to restore data. The continuous-time linear EQ (CTLE), feed-forward EQ (FFE), and decision feedback EQ (DFE) are commonly implemented for this purpose. In this section, an introduction to each of these three EQs and an analysis of their individual characteristics will be provided.

#### 2.3.1 CTLE

The CTLE is a type of linear amplifier characterized by high-pass filter properties. By its very nature, the CTLE operates asynchronously; however, recent research has explored power-efficient applications that operate synchronously with the clock, such as the discrete-time linear EQ (DTLE) [10].



Figure 2.15: (a) RC passive linear EQ, and (b) RC-degenerated active linear EQ.

The CTLE is utilized to counteract attenuation in low-pass filter channels, with

the aim of appropriately positioning the pole and zero to achieve a flat transfer function in the wideband when integrated with the channel. The CTLE can be categorized into passive and active EQs, depending on the specific implementation method employed, Figure 2.15. Figure 2.15 (a) is a conventional passive EQ consisting of R and C, and its transfer function can be expressed as follows,

$$H(s) = A_{DC} \frac{\left(1 + \frac{s}{\omega_z}\right)}{\left(1 + \frac{s}{\omega_{p1}}\right)\left(1 + \frac{s}{\omega_{p2}}\right)}$$
where  $A_{DC} = \frac{R_2}{R_1 + R_2}, \, \omega_z = \frac{1}{R_1 C}, \, \omega_{p1} = \frac{R_1 + R_2}{R_1 R_2 C}, \, \omega_{p2} = \frac{1}{R_2 C_L}.$ 
(2.19)

A zero ( $\omega_z$ ), which is generated by  $R_1$  and C, makes high-frequency gain boosting against DC gain. Two poles ( $\omega_{p1}, \omega_{p2}$ ), which are respectively generated by  $R_1$ ,  $R_2$ , and C, and  $R_2$ , and  $C_L$ , attenuate the high-frequency gain by -40 dB/dec suppressing high-frequency noise. Passive EQs have the advantage of consuming no power since they are made up of only passive components. However, a disadvantage is that they decrease the overall signal amplitude,  $\frac{R_2}{R_1+R_2}$ , leading to a degradation in signal-tonoise ratio (SNR). 2.15 (b) is a conventional active linear EQ consisting of an RCdegenerated differential common-source amplifier. The transfer function is as follows,

$$H(s) = A_{DC} \frac{\left(1 + \frac{s}{\omega_z}\right)}{\left(1 + \frac{s}{\omega_{p1}}\right)\left(1 + \frac{s}{\omega_{p2}}\right)}$$
  
where  $A_{DC} = \frac{g_m R_L}{1 + \frac{g_m R_s}{2}}, \, \omega_z = \frac{1}{R_s C_s}, \, \omega_{p1} = \frac{1 + \frac{g_m R_s}{2}}{R_s C_s}, \, \omega_{p2} = \frac{1}{R_L C_L}.$  (2.20)

 $g_m$  is the transconductance of the input transistors. DC gain  $(A_{DC})$  and zero  $(\omega_z)$  are controlled by  $R_s$  and  $C_s$ . Compared to passive EQs, active EQs can amplify the

signal amplitude, rather than attenuating it near the Nyquist frequency. However, this comes at the cost of static power consumption and noise amplification at the corresponding frequency. The behavior simulation with the channel which has 15-dB loss at the Nyquist frequency to compare the performance of the passive linear EQ and the active linear EQ is depicted in Figure 2.16. Figure 2.16 (a),(c), and (e) show singlebit responses (SBRs) of channel and CTLEs, and Figure 2.16 (b),(d), and (f) show corresponding data eye diagrams. Properly positioned poles and zeros compensate for high-frequency attenuation and ensure a flat frequency response across the entire system, allowing for the transmission of signal components from DC to the Nyquist frequency without loss. However, excessive compensation or excessively wide bandwidth can actually increase power dissipation and boost noise, thereby degrading the circuit's performance.

Various methods, such as amplifier cascading, inductive peaking, feed-forward, and feedback, are employed either individually or collectively to implement an EQ that can effectively compensate for increasing Nyquist frequency [11–17], Figure 2.17. The multi-stage CTLE, Figure 2.17 (a), achieves greater gain and wider bandwidth through large gain boosting in the first stage, DC gain boosting, and load handling in the second stage. However, this approach has some limitations, including an increase in power consumption proportional to the number of stages and bandwidth limitations due to various parasitic components. Inductive peaking [11, 14], Figure 2.17 (b), can increase high-frequency gain without additional power consumption, but the bulky inductor occupies a significant area on the integrated chip, which can be a critical disadvantage. Furthermore, if a large inductance is used to increase bandwidth, phase delay variation increases, which can exacerbate ISIs [18]. Feed-forward CTLE from [16], Figure 2.17 (c), incorporates feed-forward topology to inductive peaking CTLE, which enhances the rapid response of the CTLE without disrupting DC response. An additional feedforward path increases the number of zeros, therefore, boosting the high-frequency response. Feedback CTLEs incorporate Cherry-Hooper topology to conventional am-



Figure 2.16: SBR: (a) without EQ, (c) with the passive linear EQ, (e) with the RC-degenerated active linear EQ and data eye diagram: (b) w/o EQ, (d) with the passive linear EQ, (f) with the RC-degenerated active linear EQ.

plifier [13] or multi-stage CTLE [15] (Figure 2.17 (d)) to widen the entire bandwidth and boost high-frequency components. A desired frequency component can be reinforced by connecting the outputs of the CTLE to the middle nodes through negative feedback.



Figure 2.17: (a) Conventional 2-stage CTLE, (b) conventional RC-degenerated CTLE with inductive peaking, (c) feed-forward CTLE, and (d) CTLE with Cherry-Hooper topology.

#### 2.3.2 FFE

FFE is a linear EQ that utilizes a finite-impulse response (FIR) filter to preemphasize or de-emphasize the signal and flatten the frequency response. The block diagram of the conventional FIR filter is illustrated in Figure 2.18.



Figure 2.18: Block diagram of the conventional FIR filter.

For N+1+M-tap FFE, N taps for removing pre-cursors, M taps for removing postcursors, and 1 tap for the main cursor, the transfer function is as follows,

$$H(s) = \sum_{k=-N}^{M} a_k e^{-ksT}$$
where 
$$\sum_{k=-N}^{M} |a_k| = 1$$
(2.21)

If the channel characteristic is known in advance, the optimal FFE coefficients can be calculated through zero forcing as follows,

$$\begin{bmatrix} h_{0} & h_{-1} & \dots & h_{-N-M+1} & h_{-N-M} \\ h_{1} & h_{0} & \dots & h_{-N-M+2} & h_{-N-M+1} \\ \vdots & & \ddots & & \vdots \\ h_{N+M-1} & h_{N+M-2} & \dots & h_{0} & h_{-1} \\ h_{N+M} & h_{N+M-1} & \dots & h_{1} & h_{0} \end{bmatrix} \times \begin{bmatrix} a_{-N} \\ \vdots \\ a_{0} \\ \vdots \\ a_{M} \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ \vdots \\ 0 \end{bmatrix}$$
(2.22)

where  $h_k$  are the channel coefficients. The behavior simulation result of 3-tap FFE with the channel of 15-dB loss at the Nyquist frequency is depicted in Figure 2.19. It can be applied at both the transmitter and receiver front ends, and has the advantage of effectively removing not only post-cursor ISIs but also pre-cursor ISIs. In practice, however, since the maximum swing is limited, boosting noise along with the signal causes a decrease in signal amplitude by implementing de-emphasis on low-frequency components rather than pre-emphasis on high-frequency components, which leads to degradation in SNR. In addition, it is difficult to know the characteristics of the channel in advance, and the optimum changes constantly due to variations in time or temperature, making it challenging to apply the optimal coefficients at each moment. Moreover, unlike in the transmitter, where digital information corresponding to "1" or "0" can be delayed using a shift register by 1-UI, in the receiver, accurate analog values must be delayed, which requires sophisticated design, such as ADC or sample-and-hold (S/H), resulting in significant power consumption. These limitations make FFE in the receiver less commonly used compared to other EQs.



Figure 2.19: (a) SBR w/o EQ, (b) data eye diagram w/o EQ, (c) SBR with the 3-tap FFE, and (d) data eye diagram with 3-tap FFE.

#### 2.3.3 DFE



Figure 2.20: Block diagram of the conventional DFE.

The DFE is a non-linear EQ and is generally composed of a summer, sampler, weight multiplier, and shift register, Figure 2.20. The DFE is characterized by a nonlinear nature due to the sampler's bang-bang feature that produces a logical "1" or "0" for input data. This characteristic endows the DFE with unique properties when compared to other linear EQs. The DFE has a strictly causal property as it operates by providing feedback based on the sampled data. Therefore, pre-cursor equalization is not feasible. Nonetheless, the hard decision ensures that there is no noise boosting, unlike other linear EQs. Consequently, the DFE exhibits superior post-cursor equalization performance. Figure 2.21 shows a behavior simulation with the channel. Without a DFE, Figure 2.21 (a) and (b), there are post-cursors that cause severe ISIs which fully close the data eye. With a 3-tap well-optimized DFE, on the other hand, post-cursors,  $h_1$ ,  $h_2$ , and  $h_3$ , are eliminated, Figure 2.21 (c). It is confirmed that the data eye, which was completely closed, is opened, Figure 2.21 (d).



Figure 2.21: (a) SBR of the 15-dB loss channel, (b) corresponding data eye diagram, (c) SBR of the 15-dB channel with 3-tap DFE, and (d) data eye diagram with 3-tap DFE.

The operation of the DFE can be divided into the following steps. Input data is sampled to make a hard decision of whether it is a logical "1" or "0". The result is then multiplied by an analog value of the weight and fed back through a summer to remove ISI components caused by post-cursors in the input data. The transfer function of the 1-tap DFE is as follows,

$$H(z) = \frac{\frac{1}{V_m h_0}}{1 + H_1 \frac{1}{V_m h_0} z^{-1}} = \frac{1}{V_m} \frac{1}{h_0 + h_1 z^{-1}}$$
(2.23)

where  $V_m$  is a swing of the input signal,  $h_0$  and  $h_1$  are the first and second post-cursor, respectively, and  $H_1$  is the multiplicative coefficient for 1<sup>st</sup>-tap feedback, which is ideal when it has a value equal to  $V_m h_1$ . The DFE is a highly effective EQ for removing post-cursor ISIs, making it a popular choice in receivers. However, the presence of a feedback loop necessitates the satisfaction of stringent timing constraints, which constitutes a significant challenge for high-speed receivers. The timing constraint can be expressed as follows since the feedback value must settle prior to the next sample,

$$1 UI > T_{clk-to-q} + T_{setup} + T_{feedback}$$

$$(2.24)$$

where,  $T_{clk-to-q}$ ,  $T_{setup}$ , and  $T_{feedback}$  are the clock-to-Q delay of the sampler, the setup time of the sampler, and the settling time for the signal at the summing node, respectively.



Figure 2.22: Block diagram of the loop-unrolled DFE.

In order to alleviate the timing constraint, loop-unrolled, or speculative, DFE architectures have been proposed, Figure 2.22. The conventional DFE is referred to as "Direct feedback DFE" because it removes the corresponding ISIs by directly feeding back the previously sampled values. In contrast, the loop-unrolled DFE eliminates all possible combinations of ISIs in advance from the input signal, and all sampled values are created beforehand. As a result, the direct feedback loop is removed so that the design is called "loop-unrolled". Additionally, it is also referred to as "speculative" because the values are prepared before the previous data are determined. Loopunrolled DFE is constructed by separating the existing single data path. As described earlier, each path removes possible ISIs in advance, and one of them is selected by the multiplexer based on the previously sampled bits and output as the final value. When implemented in this way, the timing constraint can be expressed as follows,

$$1 UI > T_{MUX} + T_{clk-to-q,FF} + T_{setup,FF}$$

$$(2.25)$$

where,  $T_{MUX}$ ,  $T_{clk-to-q,FF}$ , and  $T_{setup,FF}$  are the delay of the multiplexer, the clockto-Q delay of the shift register, and the setup time for the shift register, respectively. The clock-to-q delay of the sampler ( $T_{clk-to-q}$ ) and the settling time of the feedback data ( $T_{feedback}$ ) dominate the largest portion of the timing constraint of the direct feedback DFE, Equation (2.24). However, both of them are the most challenging parameters to reduce. Due to the operational characteristics of the sampler, which must produce rail-to-rail swing output while discerning small input voltage differences, there is a limit to reducing the clock-to-Q delay. On the other hand, the settling time of feedback data is dominated by the RC delay of the summing node, and this value is also limited by the summer gain and the nodal parasitics. The loop-unrolled DFE replaces the clock-to-Q delay of the sampler and the settling time of the feedback data with the clock-to-Q delay of the shift register and the delay of the multiplexer, which can significantly relax the timing constraint. To implement a multi-tap DFE, however, the data path of the DFE needs to be exponentially separated as the number of taps increases, resulting in power and area overhead.

# **CHAPTER 3**

## PRIOR WORKS ON CDR AND EQ ADAPTATION

## 3.1 Overview

In this chapter, prior works on CDR and EQ adaptation are presented. Among the various CDR methodologies widely used in high-speed receivers, representative structures from oversampling CDRs, baud-rate sampling CDRs, and sub-rate sampling CDRs are selected and analyzed from the perspective of their operating principles and characteristics. From the most traditional Bang-Bang CDR to recently introduced sub-rate sampling CDRs, various CDR methodologies are discussed to gain insights into CDR. In the context of EQ adaptation, comprehensive adaptation methodologies, which are not limited to specific EQs, are analyzed. From the well-known LMS algorithm, BER-based and eye-opening monitor based adaptation algorithm are presented.

#### 3.2 Clock-and-Data Recovery

#### 3.2.1 Bang-Bang CDR

Bang-Bang CDR (BB CDR), or 2x-oversampling CDR, is one of the most widely used CDR in wireline communication due to its simple hardware and robust operation [19–22]. The CDR samples the input data at a double sampling rate to determine the correlation between the data and clock and adjusts the clock to an optimal timing. The name "Bang-Bang" is from the operating principle of the PD, which is so-called Bang-Bang PD (BBPD). A pair of samplers, typically referred to as a data sampler and an edge sampler, are used to detect transitions in the input data, and the edge sampler's output is used to detect the sign of the phase error between the data and clock, as shown in Figure 3.1. When the clock is early, UP signal is generated by XORing the current data sample and following edge sample. When the clock is late, DN signal is generated by XORing the current data sample and preceding edge sample. UP and DN signal is sent to phase control block, such as VCO or PI. After convergence, the edge sampler's sampling timing aligns with the data transition moments, where  $h_{-0.5} = h_{+0.5}$ .



Figure 3.1: (a) BBPD update table and (b) lock point of BBPD.

#### 3.2.2 Blind Oversampling CDR

Blind oversampling CDR differs from most other CDRs that continuously adjust the sampling clock to the center of the data eye [23, 24]. Instead, it samples the input data at a multiple frequency of the data rate and recovers the most accurate data among the multiple samples. The term "blind oversampling" is used because the CDR samples without taking into account the phase correlation between the data and the clock. The operating principle of the CDR is as follows: The input data stream is oversampled using a clock frequency that is N-times the data rate. Bit boundary information is detected through transitions in the data. The points at which transitions occur are used to identify the start and end of each bit and to determine the timing information of the clock and data. Additionally, the most reliable sample, which is sampled at the center of the adjacent bit boundaries, is utilized for data recovery. Figure 3.2 shows the operation of the 5x blind oversampling CDR. Odd oversampling rate is favored because it is less ambiguous to select the sample at the center phase between the bit boundaries.



Figure 3.2: (a) Bit boundary detection and data sample selection of 5x blind oversampling CDR and (b) probability distributions of sampled data.

#### 3.2.3 Mueller-Müller CDR

The oversampling CDRs discussed earlier operate at a sampling speed faster than the data rate, which presents challenges in clock generation and distribution as the operating speed increases. On the other hand, baud-rate sampling CDRs operate with a clock speed that equals the data rate, offering the advantage of reducing power consumption. Among the various baud-rate sampling CDR methods, the Mueller-Müller CDR (MM CDR) is widely employed due to its ability to be implemented with minimal hardware [25–27]. MM CDR locks at the point where  $h_{-1}$  equals to  $h_1$  with the assumption that the input data is independent and equiprobable. The detailed principle is as follows: Assuming the received signal, x(t), is expressed as

$$x(t) = \sum_{m} A_m h(t - mT_b), \qquad (3.1)$$

where  $A_m$  are the data symbols and  $T_b$  is the bit interval. For the  $k^{th}$  sample taken at the time  $t = kT_b + \tau_k$ ,

$$x_k = x(kT_b + \tau_k) = \sum_m A_m h[(k - m)T_b + \tau_k] = \sum_i A_{k-i}h(iT_b + \tau_k).$$
(3.2)

 $\tau_k$  is the sampling phase as illustrated in Figure 3.3. With simple additional calculations,



Figure 3.3: The timing recovery principle of MM CDR.

$$E[x_k A_{k-1}] = \sum_i E[A_{k-1} A_{k-i} h(iT_b + \tau_k)] \simeq A^2 h(T_b + \tau_k).$$
(3.3)

Therefore,

$$E[x_k A_{k-1} - x_{k-1} A_k] \simeq A^2 [h(\tau_k + T_b) - h(\tau_k - T_b)], \qquad (3.4)$$

the principle of the MM CDR can be derived.

However, MM CDR requires ADC to compute the Equation (3.4). Instead of using ADC, MM CDR can be simplified to sign-sign MM CDR (SS-MM CDR) [28]. SS-MM CDR uses the data sampler and two error samplers, which compare the data with the reference voltage. The timing recovery function of SS-MMPD, y can be expressed as,

$$y = x_k A_{k-1} - x_{k-1} A_k$$
  
=  $(x_k - A_k) A_{k-1} - (x_{k-1} - A_{k-1}) A_k$   
 $\simeq sign(x_k - A_k) A_{k-1} - sign(x_{k-1} - A_{k-1}) A_k.$  (3.5)

It is noteworthy that MMPD is valid regardless of the transition. On the other hand, SS-MMPD is only valid when the transition is present. The operation of the conventional SS-MM CDR is shown in Figure 3.4. For every sample, there are three states: UP when the clock lags data, DN when the clock leads data, and Hold when the phase difference cannot be determined, Figure 3.4 (a). After convergence, SS-MM CDR locks at the point where  $h_{-1}$  equals  $h_1$ , which coincides with the lock point of MM CDR.



Figure 3.4: (a) SS-MMPD update table and (b) lock point of SS-MMPD.

#### 3.2.4 Minimum Mean Squared Error CDR

The minimum mean squared error (MMSE) algorithm adjusts the sampling clock phase in a way to minimize the expected value of the squared error  $e_k^2$  [29–31].

$$E_k = E[e_k^2] = E[(R_k - y(kT + \tau_k))^2].$$
(3.6)

 $R_k$  is the k-th bit, y(t) is the received signal, T is the data rate and  $\tau_k$  is the sampling phase for the k-th received bit. As MMSE algorithm adjusts the clock phase in the opposite direction of its gradient, the update equation is as follows,

$$\tau_{k+1} = \tau_k - \mu(\frac{\delta E_k}{\delta \tau_k}). \tag{3.7}$$

Here,  $\mu$  determines the tradeoff between the convergence time and jitter. The stochastic update equation is derived from substituting the  $E_k$  in Equation 3.7 to Equation 3.6:

$$\tau_{k+1} = \tau_k + 2\mu e_k \left(\frac{\delta y(kT + \tau_k)}{\delta \tau_k}\right). \tag{3.8}$$

However, the calculation of analog value can lead to additional circuitry and bandwidth degradation, as seen in the case of MM CDR. The bang-bang representation of error and slope is employed, which is the so-called sign-sign MMSE (SS-MMSE).

$$\tau_{k+1} = \tau_k + 2\mu \cdot sgn(e_k) \cdot sgn(\frac{\delta y(kT + \tau_k)}{\delta \tau_k}).$$
(3.9)

Further, assuming the case that  $R_k = +1$ , the sign of the error will be positive because the received signal  $y(kT + \tau_k)$  will be less than + 1 due to the attenuation. The same applies when  $R_k = 0$ . Therefore, Equation 3.9 can be modified as,

$$\tau_{k+1} = \begin{cases} \tau_k + 2\mu \cdot sgn(\frac{\delta y(kT + \tau_k)}{\delta \tau_k}), & y(kT + \tau_k) > 0, \\ \tau_k - 2\mu \cdot sgn(\frac{\delta y(kT + \tau_k)}{\delta \tau_k}), & y(kT + \tau_k) < 0. \end{cases}$$
(3.10)

Note that the sign of the error is substituted with the sign of the received signal. Therefore, Equation 3.10 is simplified as,

$$\tau_{k+1} = \tau_k + 2\mu \cdot sgn(y(kT + \tau_k) \cdot sgn(\frac{\delta y(kT + \tau_k)}{\delta \tau_k}),$$
(3.11)

Or,

$$\tau_{k+1} = \tau_k + 2\mu \cdot sgn(\frac{1}{2} \frac{[\delta y(kT + \tau_k)]^2}{\delta \tau_k}).$$
(3.12)

#### 3.2.5 Sub-rate Sampling CDR

Sub-rate sampling CDRs [32, 33], designed to further reduce power consumption and hardware requirements compared to baud-rate sampling CDRs, employ a slower sampling rate than the data rate for clock and data recovery. It operates quite differently from other CDRs that sample every bit or more and the operation principle is illustrated in Figure 3.5. The received data is divided into two paths: odd data and even data. Odd data is sampled at the center of the data eye for data recovery, while even data is skipped at clock edges and recovered by integrating the received signal values between the clock edges using an integrator (Figure 3.5 (a)). Clock recovery also utilizes an integrator to compare the integrated values before and after transitions, enabling the determination of the phase between the clock and data (Figure 3.5 (b)).



Figure 3.5: (a) Data recovery and (b) clock recovery of the sub-rate sampling CDR from [32].

# 3.3 EQ adaptation

#### 3.3.1 Least Mean Square

Least mean squares (LMS) algorithm is based on the steepest gradient descent algorithm to find the coefficients which minimized a cost function [34,35]. The update equation is

$$w_{k+1} = w_k - \mu \nabla \epsilon_k, \tag{3.13}$$

here,  $\epsilon$  is the cost function, or mean-square error and  $w_k$  is the filter coefficient. The mean-square error and its gradient can be expressed as,

$$\epsilon_k = E(e_k^2),\tag{3.14}$$

$$\nabla \epsilon_k = \nabla E(e_k^2)$$

$$= 2E\{\nabla(e_k)e_k\},$$
(3.15)

where,  $e_k$  is the error for k-th sample which is expressed as,

$$e_k = R_k - w_k x_k, \tag{3.16}$$

$$\nabla e_k = \nabla (R_k - w_k x_k)$$
  
=  $-x_k$ , (3.17)

where  $R_k$  is the k-th received bit and  $x_k$  is the k-th received signal. Therefore, Equation 3.13 is rewritten as,

$$w_{k+1} = w_k + \mu^* \cdot e_k \cdot x_k.$$
(3.18)

For high-speed implementation, Equation 3.18 is replaced with a 1-bit operation, so-called sign-sign LMS (SS-LMS) algorithm, which is one of the most widely used algorithms due to its simplicity in computation and implementation [15, 28, 36].

$$w[n+1] = w[n] + \mu \cdot sign(x[n]) \cdot sign(e[n]), \qquad (3.19)$$

where, w is a coefficient, x is the received signal and e is the error of the received signal with respect to the reference voltage. Since there is no need for analog value calculations, the hardware implementation becomes highly simplified. However, a critical issue arises in selecting an appropriate reference voltage for accurate error detection. The most practical and effective approach is to adaptively adjust the reference voltage to maintain an optimal value [36]. Therefore, an additional adaptation loop is formed for the reference voltage, and it operates as follows:

$$V_{REF}[n+1] = V_{REF}[n] + \mu \cdot sign(e[n]).$$
(3.20)

#### 3.3.2 BER-based Adaptation

BER-based adaptation is a method that aims to optimize the coefficients by measuring the BER and finding the point with the least amount of error [37, 38]. By predefining a target BER, the coefficients are adjusted iteratively, checking if they yield a bit error rate lower than the target. This approach is the most direct and accurate way to reduce bit errors in a receiver, whose major objective is to reduce bit errors. However, since the bit error rate serves as the cost function, a large number of data samples are required at each iteration, and the adaptation time increases exponentially as the number of coefficients increases. Despite attempts to reduce adaptation time using techniques like the stochastic hill climbing algorithm [38], convergence still takes a significant amount of time.



Figure 3.6: (a) BER-based adaptation algorithm from [37] and (b) from [38].

#### 3.3.3 EOM-based Adaptation

EOM-based adaptation algorithm adjusts coefficients to optimal by detecting the effective eye-opening area [39–43]. By sweeping the sampling phase and the sample reference voltage, it is possible to perform a 2-D sweep of the data eye. By changing the values of the target coefficients and comparing the areas of the data eye, the optimal value can be selected, which maximizes the data eye area. EOM method is a brute-force technique that requires a long adaptation time, which can be restricted in specific applications. Also, it still needs complex hardware due to computational loads and iterations.



Figure 3.7: (a) EOM to detect the effective eye-opening area and (b) example of the EOM operation: EOM output versus sampling timing with various threshold voltages.

# **CHAPTER 4**

# CDR AND DFE ADAPTATION WITH GRADIENT MAXIMUM-EYE-TRACKING

## 4.1 Overview

Recently, the maximum-eye-tracking (MET) algorithms [44, 45] are proposed, which utilize the slope of the data eye for simultaneous adaptation. Met has the potential to achieve the maximum performance of the receiver while mitigating the complex and time-consuming drawbacks of existing algorithms. However, MET from [44, 45] still necessitates the extra hardware or the high design complexity due to separate algorithms for CDR and EQ adaptation. Chapter 4 provides an analysis of the gradient MET (GMET) algorithm [46], which overcomes the drawbacks of the existing MET while maximizing its advantages. Behavior simulations and measurement results demonstrate the effectiveness of the algorithm.

## 4.2 Vertical Eye Height

Due to the presence of various channel characteristics such as timing jitter and voltage noise, coefficients that satisfy the maximum vertical eye height (VEH), maximum horizontal eye width, or minimum BER do not perfectly match [44]. Nevertheless, it is widely known that the optimum point that satisfies the minimum BER is similar to the point that maximizes the VEH, as illustrated in Figure 4.1 [37].



Figure 4.1: Simulated (a) BER and (b) voltage margin contour and the convergence points of SS-LMS, maximum voltage margin, and minimum BER.

Before looking into VEH, let us briefly discuss the characteristic of signals that receivers deal with. The transmitter and receiver are connected via a channel or transmission line. An ideal transmission line would be equipotential regardless of its length. However, in reality, the signal propagates and experiences losses due to frequency-dependent losses such as skin effect and dielectric loss. The transmission line can be modeled using distributed Resistance, Inductance, Conductance, and Capacitance (RLGC), as depicted in Figure 4.2. Signal attenuation is determined by the propagation constant A as follows [5],



Figure 4.2: Model of transmission line: (a) lumped RLGC and (b) frequent dependent lossy model.

$$\frac{V_i(x)}{V_i(0)} = exp(-Ax)$$

$$A = [(G + j\omega C)(R + j\omega L)]^{1/2}$$
(4.1)

When the data stream from the transmitter traverses the channel, the high-frequency components are attenuated, causing them to spread beyond 1 UI. Consequently, from the receiver's viewpoint, the input signal can be understood as the superposition of the spread spectrum of SBRs.

$$V_{in}(t_s) = \sum_{-\infty}^{\infty} SBR(t_s - kT)$$
(4.2)

Although the signal input to the receiver is continuous, the sampler determines its value based on the signal at a specific moment (sampling timing;  $t_s$ ), so it can be



Real channel (-7dB @ f<sub>Nyquist</sub>)

Figure 4.3: Example of SBR and discrete cursor value.

represented as a function of the sum of discrete cursors with the bit time, T, depending on the sampling timing as follows,

$$V_{in}(t_s) = \sum_{-\infty}^{\infty} h_k(t_s) D[t_s - kT]$$
(4.3)

And Figure 4.4 illustrates the contribution of the cursors on eye level dispersion. VEH is determined by the worst-case data sequence, or data sequence with the longest run length. Without loss of generality, let us consider only the case where  $h_0 > 0$  and  $D[t_s] = 1$ . In this case, VEH of the upper eye, VEH<sub>upper</sub> can be expressed as follows,

$$\text{VEH}_{upper}(t_s) = h_0 - \sum_{k=-1}^{-\infty} |h_k(t_s)| - \sum_{l=1}^{\infty} |h_l(t_s)|, \qquad (4.4)$$

considering only the upper eye. If an N-tap DFE is applied, it can be incorporated into the Equation (4.4) as follows,

$$\operatorname{VEH}_{DFE}(t_s, w_1, \cdots, w_N) = h_0 - \sum_{k=-1}^{-\infty} |h_k(t_s)| - \sum_{l=1}^{N} |h_l(t_s) - w_l| - \sum_{m=N+1}^{\infty} |h_m(t_s)|.$$
(4.5)

The second, third, and last term refers to the sum of pre-cursors, DFE residual cursors, post-cursors not covered by the DFE, respectively. If DFE fully eliminates the post-cursors, then

$$VEH_{DFE,ideal}(t_s) = h_0 - \sum_{k=-1}^{-\infty} |h_k(t_s)| - \sum_{m=N+1}^{\infty} |h_m(t_s)|,$$

$$\sum_{l=1}^{N} |h_l(t_s) - w_l| = 0.$$
(4.6)



Figure 4.4: Eye level dispersion according to the cursors.

#### 4.3 Biased Data Level

In practical environments, determining the exact value of VEH is extremely difficult due to various non-ideal factors such as unknown channel characteristics, various ISI, random noise, and process-voltage-temperature (PVT) variations. Instead of the actual VEH, a Biased data level (Bdlev) can be considered as an alternative parameter that contains information about the residual cursors under consideration. Bdlev is obtained by setting the ratio of "1"s and "0"s in SS-LMS to a proper value as the following equation [44], [15].

$$Bdlev[i+1] = \begin{cases} Bdlev[i] + \alpha \cdot \mu \cdot D[i] \cdot E[i], \quad D[i] \cdot E[i] > 0, \\ Bdlev[i] + \beta \cdot \mu \cdot D[i] \cdot E[i], \quad D[i] \cdot E[i] < 0. \end{cases}$$
(4.7)

Data level (Dlev), which generally refers the magnitude of the main cursor,  $h_0$ , can be obtained by averaging the number of sampled "1"s and "0"s with equal weights ( $\alpha = \beta$ ). However, Bdlev is obtained by setting the weights unequal. For example, if the post-cursors are well removed through CTLE and N-tap DFE, and only the first pre-cursor,  $h_{-1}$ , remains significantly, the eye distribution can be divided into four sections as depicted in Figure 4.5 and as follows:

$$[h_0 - |h_{-1}| - h_{resi,max}, h_0 - |h_{-1}|], [h_0 - |h_{-1}|, h_0 - |h_{-1}| + h_{resi,max}],$$
  
 $[h_0 + |h_{-1}| - h_{resi,max}, h_0 + |h_{-1}|], \text{ and } [h_0 + |h_{-1}|, h_0 + |h_{-1}| + h_{resi,max}],$   
where  $h_{resi,max}$  represents the maximum value of the sum of the residual cursors,

$$h_{resi,max}(t_s) = \sum_{k=-2}^{-\infty} |h_k(t_s)| + \sum_{l=N+1}^{\infty} |h_l(t_s)|.$$
(4.8)



Figure 4.5: Divided eye levels by  $h_{-1}$  and  $h_{resi,max}$ .

Each section is a set of all possible combinations of the sum of the residual cursors and the sections would have the same probability if the input data are fully randomized. Therefore, the  $\alpha$  and  $\beta$  ratio can be set to 3:1 to make Bdlev equal to  $h_0 - |h_{-1}|$ , Figure 4.5. If there are two significant residual cursors, such as  $h_{-1}$  and  $h_{-2}$ , the ratio can be set to 7:1 to make Bdlev equal to  $h_0 - |h_{-1}| - |h_{-2}|$ .

The analysis thus far has been premised on ideal conditions. However, for realworld application, it is imperative to validate the robustness in noisy environment. Let's discuss a scenario where additive white Gaussian noise (AWGN) is incorporated into the input data and the Bdlev is adjusted in an  $\alpha$  and  $\beta$  ratio of 1:3. Under optimal circumstances, the Bdlev is anticipated to converge to  $h_0 - |h_{-1}|$  if  $h_1$  is the second biggest cursor. Yet, in instances where AWGN is present, intersections between UP and DN will be overlapped, consequently causing the Bdlev to converge to a value below  $h_0 - |h_{-1}|$ . If the value which the Bdlev falls below  $h_0 - |h_{-1}|$  is defined as  $\Delta d$ , the convergence level of Bdlev,  $h_0 - |h_{-1}| - \Delta d$ , will satisfy the following
probability distribution,

$$\int_{h_0-|h_{-1}|-\Delta d}^{\infty} \left(\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-(h_0+|h_{-1}|)}{\sigma})^2} + \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-(h_0-|h_{-1}|)}{\sigma})^2}\right) dx$$
  
: 
$$\int_{-\infty}^{h_0-|h_{-1}|-\Delta d} \left(\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-(h_0+|h_{-1}|)}{\sigma})^2} + \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-(h_0-|h_{-1}|)}{\sigma})^2}\right) dx \quad (4.9)$$
  
= 3 : 1

Upon expanding the Equation 4.9, he following expression can be derived.

$$\int_{h_0-|h_{-1}|-\Delta d}^{\infty} \left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-(h_0+|h_{-1}|)}{\sigma})^2} + \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-(h_0-|h_{-1}|)}{\sigma})^2}\right)dx$$
  
=  $3\int_{-\infty}^{h_0-|h_{-1}|-\Delta d} \left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-(h_0+|h_{-1}|)}{\sigma})^2} + \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-(h_0-|h_{-1}|)}{\sigma})^2}\right)dx.$   
(4.10)

Simplifying the upper equation, the final relational expression is achieved as Equation 4.11 and plotted as Figure 4.6.

$$(A=)\int_{-\Delta d}^{0} \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{x^2}{2\sigma^2}} dx = \int_{-\infty}^{-2|h_{-1}|-\Delta d} \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{x^2}{2\sigma^2}} dx \ (=B).$$
(4.11)



Figure 4.6: Bdlev is lowered from  $h_0 - |h_{-1}|$  by  $\Delta d$  when there exists AWGN.

The magnitude of  $\Delta d$  is determined by the size of  $h_1$  and the variance of the input noise. Figure 4.7 plots how the magnitude of  $\Delta d$  changes with respect to the size of  $h_{-1}$  in relation to the variance of the input noise ( $\sigma$ ). As the size of the cursor increases, one can observe a rapid decrease in the magnitude of  $\Delta d$ .



Figure 4.7: Normalized  $h_{-1}$  versus normalized  $\Delta d$ .

# 4.4 Gradient Maximum Eye Tracking

For optimization of non-linear systems with numerous variables, the gradient ascent method, which requires less computation and provides high convergence, is widely used [47]. Recall Equation (4.5), the update equation of each coefficient  $(t_s, w_1, \dots, w_N)$  to converge to the maximum value of VEH is expressed as,

$$C_{new} = C_{old} + \zeta_C \frac{\partial \text{VEH}_{DFE}}{\partial C}, \qquad (4.12)$$

where C is the coefficient under adjustment,  $\zeta_C$  is the corresponding step size. Since the updated amount is determined in proportion to the gradient, the convergence speed is fast when far from the optimal point, and the stability is high near the optimal point. If the VEH is substituted to the Bdlev from Equation (4.7), the Equation (4.12) can be modified as follows,

$$C_{new} = C_{old} + \zeta_C \frac{\partial B dlev}{\partial C}, \qquad (4.13)$$

In order to implement the gradient ascent method in hardware, it is necessary to simplify it further. Therefore, instead of using the coefficients with analog values, digital control codes are used. The digital control codes are updated by adding or subtracting only 1-bit at a time according to the product of the sign of the previous update and the sign of the gradient of the Bdlev. However, in this case, there are problems with convergence speed and stability because the magnitude of the gradient is not considered. Therefore, the amount of updates based on the magnitude of the gradient is substituted by adjusting the delay between updates. If the gradient after the update is large, the delay required for the next update is shortened to reduce the convergence time to the optimum. On the contrary, the code is slowly updated near the optimum where the gradient is small so that it has robust performance against instantaneous noise and unintended perturbations. The optimized equation for the control code and the update delay is written as,

$$\begin{cases} D_C[i+1] = D_C[i] + sign(\Delta B dlev \cdot \Delta C), \\ T_{C,i+1} = \alpha_C / |\Delta B dlev| \end{cases}$$

$$(4.14)$$

where D is a control code for the coefficient C,  $T_C$  is an update delay, and  $\alpha_C$  is an update gain for the corresponding control code. The flow chart is shown in Figure 4.8.



Figure 4.8: Flow chart of the control code for coefficient, C.

#### 4.4.1 Sampling Phase Adaptation with GMET



Figure 4.9: Illustration of sampling phase adaption with GMET.

Figure 4.9 shows a sampling phase adaption with GMET. As the Bdlev changes, the code update delay is adjusted and sampling phase converges to the point where the



Figure 4.10: Measured VEH according to sampling phase through behavior simulation.

Bdlev is maximized. After convergence, the control code performs dithering around the optimum. Figure 4.10 shows the VEH according to the sampling phase obtained through behavior simulation. As the VEH does not lose monotonicity on either side of the point where it is maximized, it can be expected to converge to the maximum point without getting stuck in a local optimum. Furthermore, as the sampling phase is far from the optimum phase, the magnitude of the gradient of VEH becomes larger. Therefore, the update gain increases, resulting in an increase in the update speed and a reduction in convergence time. As it gradually approaches the optimum, the gain decreases, leading to a shortened overall loop bandwidth and enabling robust operation against Bdlev perturbations.

Figure 4.11 presents a comparison between GMET and MM CDR, the most widely used method in baud-rate sampling CDRs. The behavior simulation was conducted on the same 9-dB loss channel, and the convergence points of both GMET and MM CDR were compared. The simulation results indicate that if the post-cursors are significantly larger than the pre-cursors, the MM CDR settles at a sub-optimal point, resulting in a smaller VEH. In contrast, the GMET CDR settles at the point where the VEH is maximal, which almost coincides with the point where the main cursor is maximal. Moreover, the VEH of the GMET CDR lock point is twice as large as that of the MM CDR lock point. Therefore, the simulation results suggest that GMET CDR is more effective in achieving the maximal VEH than MM CDR in baud-rate sampling CDRs.

The following figures are the results of the behavior simulation to validate the stability and performance of sampling phase adaptation. Figure 4.12 (a) compares the differences over the update gain ( $\alpha$ ). When the update gain is fixed at the maximum value ( $\alpha_{MAX}$ ), it results in the shortest convergence time, of course. However, it shows a wider dithering range due to input noise after convergence. On the other hand, when the gain is fixed to the minimum value ( $\alpha_{MIN}$ ), the convergence takes more than 50 times longer compared to the maximum gain case, and the dithering after convergence

is the smallest. When the gain is dynamically adjusted, the convergence is achieved within 3 times the time compared to the maximum gain case and 23 times faster than the minimum gain, and the reduction in gain after convergence leads to a decrease in dithering.

Figure 4.12 (b) validates the stability over various patterns with different run lengths. Although the PRBS7 pattern achieves the fastest convergence due to its higher number of transitions and the PRBS31 pattern achieves the slowest convergence time, there is not a significant difference in the overall convergence time. After convergence, the PRBS31 pattern, with its longest run length, exhibits less dithering but does not show any remarkable differences.



Figure 4.11: Simulated convergence point of MM CDR and GMET CDR on (a) SBR and (b) eye diagram.



(b)

Figure 4.12: Behavior simulation results over (a) various update gain  $\alpha$  and (b) various patterns with different run lengths.

#### 4.4.2 DFE Adaptation with GMET



Figure 4.13: Illustration of DFE adaptation with GMET.

Figure 4.13 shows the 1-tap DFE coefficient adaptation using GMET. The Bdlev adaptation ratio is set to 1:7 ( $\alpha$  :  $\beta$  = 1 : 7 in Equation 4.7). As the Bdlev changes, the  $w_1$  is updated to the direction where the Bdlev increases. When  $w_1$  reaches the optimal value ( $\simeq h_1$ ), the convergence level of the Bdlev would be similar to the  $h_0 - h_{-1}$  if the magnitudes of the residual cursors are little enough. And the Bdlev will dither around the level.

Figure 4.14 shows the measured VEH according to  $w_1$  through behavior simulation. Like CDR adaptation, in this case as well, the VEH has a monotonic characteristic on either sides based on the point where it reaches the maximum value. Straightforwardly, when  $w_1$  is smaller than  $h_1$ , increasing  $w_1$  will result in an increase in VEH as  $|h_1 - w_1|$  decreases. On the other hand, if  $w_1$  becomes larger than  $h_1$ , VEH will decrease. There is another characteristic that is differentiated from GMET on CDR and conventional SS-LMS algorithm. VEH has a unity slope, when  $w_1$  is far from the optimum. Near the optimum, however, the change in Bdlev becomes smaller than that of  $w_1$  (Equation 4.15). Unlike CDR adaptation, there is no update time change until the  $w_1$  reaches optimum. Rather, it operates similar to conventional SS-LMS algorithm. However, near the optimum, the gradient of the VEH is smaller unity, resulting in a slow update rate which increases the robustness against perturbations such as voltage noise. Further, it can be extended to N-tap DFE, which would be discussed in detail on Section 4.4.3.

$$\left|\frac{\delta V E H}{\delta w_{1}}\right| \begin{cases} = 1, & |w_{1}| \ll |h_{1}| \\ < 1, & |w_{1}| \simeq |h_{1}| \end{cases}$$
(4.15)



Figure 4.14: Measured VEH according to  $w_1$  through behavior simulation.

#### 4.4.3 Simultaneous Adaptation with GMET

To achieve optimal performance in actual operating environments, it is necessary to continuously calibrate and optimize the EQ and CDR parameters. One-time optimization processes such as pre-determined values or training patterns cannot cope with VT variations or transistor aging that may occur during operation, ultimately leading to performance degradation over time. In addition, since CDR and equalization do not operate independently, the optimal points when each circuit is applied separately may differ from the optimal point when both circuits are used simultaneously. Therefore, simultaneous optimization of both circuits is essential to ensure the maximum performance of the receiver.

There are multiple adaptation loops: Bdlev, CDR, and N-tap DFE. If the interference among loops is severe, it may adversely affect the joint operation, causing instability. The interference would be reduced by making the bandwidth far apart among the loops. Since Bdlev determines the operating speed of the entire loop, its loop bandwidth should be set at the highest. The bandwidth of the other loops should then be determined based on two criteria. First, if more interference is on the optimization of the other loops, the bandwidth needs to be lower for the loop. From (1), it can be seen that the sample timing affects the values of all cursors and the value of the optimal DFE coefficients. Therefore, based on the aforementioned first criterion, the CDR loop bandwidth should be set at the lowest. Second, the larger the cursor magnitude, the faster the loop should be. Therefore, the loop bandwidth of each tap of the DFE should be determined to be high in that order.

Figure 4.15 presents the results of a behavior simulation conducted to verify the joint operation of sampling phase optimization and DFE coefficient optimization. A channel with 13-dB loss at Nyquist frequency was utilized, assuming a 2-tap DFE. The GMET adaptation algorithm was used exclusively, and a comparison was made between cases with and without a DFE to observe differences in the presence of post-

cursors. In the case of the applied DFE, a simultaneous adaptation of multiple taps was successfully performed, effectively eliminating the post-cursors  $(h_1, h_2)$  corresponding to the 2-tap DFE, as shown in Figure 4.11 (a). Notably, when the DFE was applied, the settling of the sampling phase occurred at a point before the maximum of  $h_0$ , which is the point that maximizes  $|h_0 - h_{-1}|$ . This was due to the convergence of the sampling phase at the aforementioned point. To confirm the resulting VEH, Figure 4.11 (b) was examined, revealing that the settling occurred at a point similar to the actual maximum VEH. Furthermore, the DFE effectively eliminated the post-cursors, resulting in an approximately 7 times increase in VEH compared to the case without the DFE.



Figure 4.15: Lock point comparison of CDR with and without simultaneous 2-tap DFE adaptation using GMET on (a) SBR and (b) eye diagram.

# 4.5 Circuit Implementation



Figure 4.16: Block diagram of the prototype receiver.

Figure 4.16 shows the entire block diagram of the prototype receiver. The chip consists of three parts: data path, clock path and synthesized digital logic.

Figure 4.17 shows the fabricated die photograph and the total power consumption of the prototype receiver. The chip is fabricated in 28-nm CMOS technology and the active area is 0.106 mm<sup>2</sup>, Figure 4.17 (a). The chip consumes 47 mW; 31 mW for the data path, 10 mW for the clock path and 6 mW for the synthesized digital logic. The detailed power consumption of the sub-blocks is described in Figure 4.17 (b).



(a)



(b)

Figure 4.17: (a) Die photograph and (b) power consumption.

#### **Data Path**

The data path includes a termination circuit, CTLE, DFE, a deserializer (DES), and digital-to-analog converters (DACs).

The CTLE is designed with the Cherry-Hooper topology to cancel early and longtail post-cursor ISIs with a high-frequency gain boost [15]. The CTLE is designed to have adjustable degeneration resistance ( $R_{CTLE}$ ) and feedback resistance ( $V_{CTRL}$ ), as shown in Figure 4.18 (a). Figure 4.18 (b) and (c) present the simulation results of the AC response corresponding to each resistance value. Figure 4.18 (b) represents the results when the degeneration resistance is adjusted. As the resistance value increases, the DC gain decreases, resulting in increased gain boosting at Nyquist frequency relative to DC. Figure 4.18 (c) demonstrates the results when the feedback resistance is adjusted. The feedback delay is controlled by the feedback resistance, enabling gain boosting at the corresponding frequency ( $\frac{1}{2\pi BC}$ ).

Figure 4.19 shows an schematic of the comparator and simulation results. The reference voltage for an error comparator is adapted respectively not only to attain desired Bdlev but also to calibrate any mismatches and offsets of each comparator. Quarter-rate clocking is employed for the robust operation of the comparators, providing sufficient setup and reset time.

The DFE block is composed of 4 sets of a CML summer, with 2-tap feedback, as shown in Figure 4.20 (a), and two StrongARM comparators, one for data and the other for error, as shown in Figure 4.20 (b). In order to guarantee a sufficient feedback timing margin, the  $1^{st}$  tap data is fed back as return-to-zero data directly from the comparator, while the  $2^{nd}$  tap data is fed back as non-return-to-zero data after passing through SR latches.

DACs consist of three types, 8-bit thermometer-coded differential current DACs for stable adaptation, 5-bit binary-coded voltage DACs for manual offset calibration for data samplers, and a 64-bit one-hot coded R-ladder for the CTLE.



Figure 4.18: (a) Schematic of CTLE with Cherry-Hooper topology and AC simulation corresponding to (b)  $R_{CTRL}$  and (c)  $V_{CTRL}$ .





Figure 4.19: Schematic of (a) the comparator and simulation results: (b) Delay for  $1^{st}$  and  $2^{nd}$  tap feedback data and (c) Offset Monte Carlo simulation.



Figure 4.20: Schematic of (a) CML summer with 2-tap feedback and (b) Strong-ARM comparator.

#### **Clocking Path**

The clocking path is designed as a forwarded clocking architecture to improve jitter tolerance [8]. A 28-GHz external clock is divided into a 14-GHz 4-phase clock at the IQ divider, and the phase is finely controlled by a phase interpolator [48] with 2-bit gray-coded MSBs and 32-bit thermometer-coded LSBs, as shown in Figure 4.21.

### Synthesized Digital Logic

The synthesized digital logic consists of digital loop filter (DLF) for Bdlev adaptation, GMET logic, and I<sup>2</sup>C logic for communication with PC.



Figure 4.21: Schematic of (a) PI and simulation results: (b) delay and (c) linearity.

## 4.6 Measurement Results



Figure 4.22: Measurement setup.

The measurement setup for the receiver is depicted in Figure 4.22. Digital codes, such as Bdlev codes, DFE weight codes and PI codes, are extracted via Aardvark I<sup>2</sup>C. Input 28 Gb/s differential data and 14 GHz differential clocks are generated in the pattern generator (MU183020A). The input channel consists of SMA cables with the channel emulation board and FR4 trace. Fig. 8 illustrates the measured losses of 23.8 dB at 14 GHz with the SMA cable and the channel emulation board and of 3.9 dB at 14 GHz with the FR-4 trace. Total insertion loss is 27.7 dB at 14 GHz.

Recovered data and clocks are de-multiplexed inside the chip and sent to the error detector (MU183040B) and the oscilloscope (MSO73304DX) respectively. The jitter tolerance with BER of 10-12 measured at the error detector within the equipment limit, is depicted in Figure 4.25. The recovered 7-GHz clock and its jitter histogram are shown in Figure 4.26. RMS jitter and peak-to-peak jitter of the clock are measured as 1.69 ps and 16.2 ps, respectively.



Figure 4.23: Measured insertion channel losses: SMA cable, channel emulation board and FR4 trace.



Figure 4.24: Measured eye diagrams (a) from PPG, (b) after the channel.



Figure 4.25: Measured jitter tolerance curve for  $BER < 10^{-12}$ .



Figure 4.26: Measured 7-GHz recovered clock and its jitter histogram.

Figure 4.27 plots the measured bathtub curves over the sampling phase and the CDR lock points. Due to significant insertion loss of 27.7 dB, BER is not measured below  $10^{-12}$  for any clock phases without the DFE, and the CDR locks at the incorrect sampling phase. With the simultaneous adaptation using the proposed algorithm, A BER lower than  $10^{-12}$  is measured for 0.17 UI, and the CDR locks at the correct sampling phase.



Figure 4.27: Measured bathtub curves without DFE and with DFE.

Figure 4.28 plots Bdlev codes over the sampling phase, both with and without the DFE. It is confirmed that the CDR locks at the point where the Bdlev is maximal. The joint operation of the CDR and DFE adaptation makes the CDR lock to a point where the BER is lower than  $10^{-12}$ . Table 4.1 summarizes the measured performance of the proposed receiver in comparison with other latest adaptive receivers. The comparison result shows that the proposed receiver achieves the superior figure-of-merit (FoM), which is defined as energy efficiency over channel loss at the Nyquist frequency. The receiver achieves the FoM of 0.061 pJ/b/dB.



Figure 4.28: Measured Bdlev codes over sampling phases without DFE and with DFE.

|                          |             | JSSC 13     | TCAS-I 17   | TVLSI 20   | SOVC 20    | JSSC 20    |
|--------------------------|-------------|-------------|-------------|------------|------------|------------|
|                          | 1 IIIS WOLK | [38]        | [42]        | [44]       | [45]       | [15]       |
| CMOS Process [nm]        | 28          | 65 LP       | 40          | 28         | 28         | <u>59</u>  |
| Data Rate [Gb/s]         | 28          | 5           | 28          | 26         | 28         | 10.8       |
| Channel Loss [dB]        | 27.7        | 15          | 25***       | 23.5       | 20         | 34         |
| CDD Mathod               | CMET        | BED count   | Pattern-    | MET        | MM w/      | aa         |
|                          |             |             | filtered MM | IVILI      | MET        | aa         |
| Ecualization             | CTLE        | 1-tap DFE*  | CTLE*       | CTLE       | CTLE       | CTLE*      |
| Edualization             | 2-tap DFE*  | IIR filter* | 1-tap DFE*  | 4-tap DFE* | 2-tap DFE* | 2-tap DFE* |
| EQ Adaptation Method     | GMET        | BER         | SSEOM       | SSLMS      | SSLMS      | SSLMS      |
| Area [mm <sup>2</sup> ]  | 0.106       | 0.0036      | 0.26        | 0.089      | 0.108      | 0.174      |
| Energy Efficiency [pJ/b] | 1.68        | 1.82        | 1.57        | 3.35       | 2.02       | 3.4        |
| FoM** [pJ/b/dB]          | 0.061       | 0.121       | 0.063       | 0.142      | 0.101      | 0.091      |
|                          |             |             |             |            |            |            |

Table 4.1: Performance Summary and Comparison.

\* Adaptation

\*\* Energy efficiency / channel loss at Nyquist frequency

\*\*\* One-tap TX pre-emphasis included

# **CHAPTER 5**

# CONCLUSION

In this dissertation, a gradient maximum-eye-tracking algorithm is proposed for high-speed receivers. The proposed algorithm has demonstrated that the simultaneous adaptation of CDR and equalization is essential to maximize the performance of the receiver and reduce power dissipation and area utilization. In response to the increasing demand for bandwidth extension and the corresponding challenges of frequencydependent loss and energy dissipation, the proposed algorithm presents a method for improving the performance of the high-speed receivers.

CDR and equalization are essential for wireline receivers to accurately recover data from the received signal without errors. Furthermore, in high-speed operations where even slight mismatches, attenuation, and noise can degrade performance significantly, their importance cannot be emphasized enough. In Chapter 2, the structures and characteristics of high-speed wireline receivers are explained. In addition, the roles of CDR and equalization need to be fulfilled and the considerations for implementing each circuit are discussed.

CDR and equalization circuits have been extensively studied not only in the field of wireline links but also in various other communication-related domains, and this research has been actively ongoing. Chapter 3 presented prior works on CDR and EQ adaptation. Various CDR methods were classified into oversampling CDR, baud-rate sampling CDR, and sub-rate sampling CDR. The operating principles and advantages of various CDR methodologies, including the most traditional and widely used CDRs like bang-bang CDR and Mueller-Müller CDR, as well as more recent works such as blind oversampling CDR and sub-rate CDRs. For EQ adaptation, the algorithms that enable the adaptation of multiple coefficients simultaneously are discussed from the perspective of the operating principles and pros and cons. This included prominent algorithms such as LMS algorithm, as well as BER-based adaptation that finds the optimal value of each receiver coefficients by measuring bit errors, and EOM-based adaptation that optimizes coefficients by measuring the widest eye margin.

While research on CDR and EO adaptation [15, 19–43] has provided valuable insights, there still remain several limitations. Some of them focus only on optimizing either the CDR or EQ circuit separately, while the others that attempt simultaneous adaptation face challenges in terms of excessive time requirements or the need for additional hardware. Recently, the maximum-eye-tracking algorithms [44, 45] were proposed, which utilize the slope of the data eye for simultaneous adaptation. However, They either require separate hardware for eye slope measurement [44] or increase the complexity of digital blocks by applying separate algorithms for CDR and EQ adaptation [45]. Chapter 4 provides an analysis of the gradient maximum eye tracking algorithm, which overcomes the drawbacks of the existing MET while maximizing its advantages. By applying the well-known gradient ascent algorithm to MET, the bandwidth of the loop can be dynamically adjusted, improving the performance of the receiver while concurrently enhancing the stability of its operation. Furthermore, by employing a shared circuit and algorithm for simultaneous adaptation, design complexity and power dissipation are significantly reduced. The algorithm's performance is validated through behavior simulations under various scenarios, and the measurement results confirm its effectiveness. The prototype receiver is fabricated in 28-nm CMOS and occupies an active area of 0.106 mm<sup>2</sup>. At the data rate of 28 Gb/s, the receiver consumes 47 mW, corresponding to an energy efficiency of 1.68 pJ/b. Furthermore, with a channel loss of 27.7 dB at the Nyquist frequency, joint operation of the CDR and the DFE adaptation offers measured results with a margin of 0.17 UI at a BER lower than  $10^{-12}$ .

# **Bibliography**

- D. Reinsel, J. Gantz, and J. Rydning, "Data Age 2025: The Digitization of the World from Edge to Core," *Seagate*, 2018.
- [2] PCI SIG, "PCI Express 7.0 Specification," 2022. https://pcisig.com/specificati ons/pci-express-70-specification.
- [3] J. Lee, "Design of High-Speed Receiver for Video Interface with Adaptive Equalization," *Thesis*, 2019.
- [4] S. Mirabbasi, L. C. Fujino, and K. C. Smith, "Through the Looking Glass—The 2023 Edition: Trends in solid-state circuits from ISSCC," *IEEE Solid-State Circuits Magazine*, vol. 15, no. 1, pp. 45–62, 2023.
- [5] W. J. Dally, W. J. Dally, and J. W. Poulton, *Digital Systems Engineering*. Cambridge University Press, 1998.
- [6] B. Razavi, "Challenges in the Design High-Speed Clock and Data Recovery Circuits," *IEEE Communications Magazine*, vol. 40, no. 8, pp. 94–101, 2002.
- [7] C. E. Shannon, "Communication in the presence of noise," *Proceedings of the IEEE*, vol. 72, no. 9, pp. 1192–1201, 1984.
- [8] W. Bae, G.-S. Jeong, K. Park, S.-Y. Cho, Y. Kim, and D.-K. Jeong, "A 0.36 pJ/bit, 0.025 mm<sup>2</sup>, 12.5 Gb/s Forwarded-Clock Receiver With a Stuck-Free Delay-Locked Loop and a Half-Bit Delay Line in 65-nm CMOS Technology," *IEEE*

*Transactions on Circuits and Systems I: Regular Papers*, vol. 63, no. 9, pp. 1393–1403, 2016.

- [9] M. Hossain, F. Aquil, P. S. Chau, B. Tsang, P. Le, J. Wei, T. Stone, B. Daly, C. Tran, J. C. Eble, *et al.*, "A Fast-Lock, Jitter Filtering All-Digital DLL based Burst-Mode Memory Interface," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, pp. 1048–1062, 2014.
- [10] A. Manian and B. Razavi, "A 40-Gb/s 9.2-mW CMOS Equalizer," in 2015 Symposium on VLSI Circuits (VLSI Circuits), pp. C226–C227, IEEE, 2015.
- [11] S. Shekhar, J. S. Walling, and D. J. Allstot, "Bandwidth Extension Techniques for CMOS Amplifiers," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 11, pp. 2424–2439, 2006.
- [12] S. Galal and B. Razavi, "10-Gb/s Limiting Amplifier and Laser/Modulator Driver in 0.18-μm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 12, pp. 2138–2146, 2003.
- [13] C. D. Holdenried, J. W. Haslett, and M. W. Lynch, "Analysis and Design of HBT Cherry-Hooper Amplifiers with Emitter-Follower Feedback for Optical Communications," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 11, pp. 1959–1967, 2004.
- [14] J. Kim, J.-K. Kim, B.-J. Lee, and D.-K. Jeong, "Design Optimization of On-Chip Inductive Peaking Structures for 0.13-μm CMOS 40-Gb/s Transmitter Circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 56, no. 12, pp. 2544–2555, 2009.
- [15] J. Lee, K. Lee, H. Kim, B. Kim, K. Park, and D.-K. Jeong, "A 0.1-pJ/b/dB 1.62to-10.8-Gb/s Video Interface Receiver With Jointly Adaptive CTLE and DFE Using Biased Data-Level Reference," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 8, pp. 2186–2195, 2020.

- [16] A. Atharav and B. Razavi, "A 56-Gb/s 50-mW NRZ Receiver in 28-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 1, pp. 54–67, 2021.
- [17] S. Lee, J. Kim, and D.-K. Jeong, "Feedforward Cherry-Hooper Continuous Time Linear Equalizer in 28-nm CMOS," in 2022 37th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), pp. 507–510, IEEE, 2022.
- [18] W. Bae, B. Nikolić, and D.-K. Jeong, "Use of Phase Delay Analysis for Evaluating Wideband Circuits: An Alternative to Group Delay Analysis," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 12, pp. 3543– 3547, 2017.
- [19] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H. Jackel, "A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects," *IEEE Journal* of Solid-State Circuits, vol. 41, no. 12, pp. 2921–2929, 2006.
- [20] F. A. Musa and A. C. Carusone, "Modeling and Design of Multilevel Bang-Bang CDRs in the Presence of ISI and Noise," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 54, no. 10, pp. 2137–2147, 2007.
- [21] H.-J. Jeon, R. Kulkarni, Y.-C. Lo, J. Kim, and J. Silva-Martinez, "A Bang-Bang Clock and Data Recovery Using Mixed Mode Adaptive Loop Gain Strategy," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 6, pp. 1398–1415, 2013.
- [22] P. K. Hanumolu, M. G. Kim, G.-Y. Wei, and U.-k. Moon, "A 1.6 Gbps Digital Clock and Data Recovery Circuit," in *IEEE Custom Integrated Circuits Conference 2006*, pp. 603–606, IEEE, 2006.
- [23] C.-K. K. Yang, R. Farjad-Rad, and M. A. Horowitz, "A 0.5-μ/m CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, pp. 713–722, 1998.

- [24] J. Kim and D.-K. Jeong, "Multi-Gigabit-Rate Clock and Data Recovery based on Blind Oversampling," *IEEE Communications Magazine*, vol. 41, no. 12, pp. 68– 74, 2003.
- [25] K. Mueller and M. Muller, "Timing Recovery in Digital Synchronous Data Receivers," *IEEE Transactions on Communications*, vol. 24, no. 5, pp. 516–531, 1976.
- [26] T. Liu, T. Li, F. Lv, B. Liang, X. Zheng, H. Wang, M. Wu, D. Lu, and F. Zhao, "Analysis and Modeling of Mueller-Müller Clock and Data Recovery Circuits," *Electronics*, vol. 10, no. 16, p. 1888, 2021.
- [27] F. A. Musa, High-Speed Baud-Rate Clock Recovery. University of Toronto, 2008.
- [28] M. Q. Le, P. J. Hurst, and J. P. Keane, "An Adaptive Analog Noise-Predictive Decision-Feedback Equalizer," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 105–113, 2002.
- [29] P. M. Aziz and S. Surendran, "Symbol Rate Timing Recovery for Higher Order Partial Response Channels," *IEEE Journal on Selected Areas in Communications*, vol. 19, no. 4, pp. 635–648, 2001.
- [30] F. A. Musa and A. C. Carusone, "A Baud-Rate Timing Recovery Scheme with a Dual-Function Analog Filter," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 53, no. 12, pp. 1393–1397, 2006.
- [31] P. Liu, J. Guo, and Y. Jiang, "Half Baud-Rate, Low BER PAM-4 CDR Based on SS-MMSE Algorithm," *Electronics Letters*, vol. 52, no. 25, pp. 2036–2038, 2016.
- [32] D. Kim, W.-S. Choi, A. Elkholy, J. Kenney, and P. K. Hanumolu, "A 15-Gb/s Sub-Baud-Rate Digital CDR," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 3, pp. 685–695, 2019.

- [33] Y.-H. Yang, M. Tzou, and T.-C. Lee, "A 6.0-11.0-Gb/s Reference-Less Sub-Baud-Rate Linear CDR with Wide-Range Frequency Acquisition Technique," *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2022.
- [34] S. S. Haykin, Adaptive Filter Theory. Pearson Education India, 2002.
- [35] S. Haykin and B. Widrow, *Least-Mean-Square Adaptive Filters*. Wiley Online Library, 2003.
- [36] V. Stojanovic, A. Ho, B. W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R. T. Kollipara, C. W. Werner, J. L. Zerbe, *et al.*, "Autonomous Dual-Mode (PAM2/4) Serial Link Transceiver with Adaptive Equalization and Data Recovery," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 4, pp. 1012–1026, 2005.
- [37] E.-H. Chen, J. Ren, B. Leibowitz, H.-C. Lee, Q. Lin, K. S. Oh, F. Lambrecht, V. Stojanovic, J. Zerbe, and C.-K. K. Yang, "Near-Optimal Equalizer and Timing Adaptation for I/O Links Using a BER-Based Metric," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 9, pp. 2144–2156, 2008.
- [38] S. Son, H.-S. Kim, M.-J. Park, K. Kim, E.-H. Chen, B. Leibowitz, and J. Kim, "A 2.3-mW, 5-Gb/s Low-Power Decision-Feedback Equalizer Receiver Front-End and its Two-Step, Minimum Bit-Error-Rate Adaptation Algorithm," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 11, pp. 2693–2704, 2013.
- [39] B. Anaui, A. Rylyakov, S. Rylov, M. Meghelli, and A. Hajimiri, "A 10Gb/s Two-Dimensional Eye-Opening Monitor in 0.13 μm Standard CMOS," *IEEE Journal* of Solid-State Circuits, vol. 40, no. 12, 1998.
- [40] T. Suttorp and U. Langmann, "A 10-Gb/s CMOS Serial-Link Receiver Using Eye-Opening Monitoring for Adaptive Equalization and for Clock and Data Recovery," in 2007 IEEE Custom Integrated Circuits Conference, pp. 277–280, IEEE, 2007.

- [41] H. Noguchi, N. Yoshida, H. Uchida, M. Ozaki, S. Kanemitsu, and S. Wada, "A 40-Gb/s CDR Circuit with Adaptive Decision-Point Control Based on Eye-Opening Monitor Feedback," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 12, pp. 2929–2938, 2008.
- [42] H. Won, J.-Y. Lee, T. Yoon, K. Han, S. Lee, J. Park, and H.-M. Bae, "A 28-Gb/s Receiver With Self-contained Adaptive Equalization and Sampling Point Control Using Stochastic Sigma-Tracking Eye-Opening Monitor," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 3, pp. 664–674, 2017.
- [43] D. Yun, E. Lee, W. Jung, K. Kim, K.-M. Beak, J. Kim, H.-B. Lee, B. Ko, W.-S. Choi, and D.-K. Jeong, "A 32-Gb/s PAM4-Binary Bridge With Sampler Offset Cancellation for Memory Testing," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 69, no. 9, pp. 3749–3753, 2022.
- [44] H.-Y. Joo, J. Lee, H. Ju, H.-G. Ko, J. M. Yoon, B. Kang, and D.-K. Jeong, "A Maximum-Eye-Tracking CDR With Biased Data-Level and Eye Slope Detector for Near-Optimal Timing Adaptation," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 28, no. 12, pp. 2708–2720, 2020.
- [45] M.-C. Choi, H.-G. Ko, J. Oh, H.-Y. Joo, K. Lee, and D.-K. Jeong, "A 0.1-pJ/b/dB 28-Gb/s Maximum-Eye Tracking, Weight-Adjusting MM CDR and Adaptive DFE with Single Shared Error Sampler," in 2020 IEEE Symposium on VLSI Circuits, pp. 1–2, 2020.
- [46] S. Lee, B. Kang, W. Rhee, and D.-K. Jeong, "A 0.061-pJ/b/dB 28-Gb/s Gradient-Based Maximum Eye Tracking CDR with 2-Tap DFE Adaptation in 28-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2023.
- [47] H. B. Curry, "The Method of Steepest Descent for Non-linear Minimization Problems," *Quarterly of Applied Mathematics*, vol. 2, no. 3, pp. 258–261, 1944.

- [48] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee, A. V. Rylyakov, H. A. Ainspan, B. D. Parker, M. P. Beakes, A. Chung, T. J. Beukema, *et al.*, "A 10-Gb/s 5-tap DFE/4-tap FFE Transceiver in 90-nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 12, pp. 2885–2900, 2006.
- [49] R. Dokania, A. Kern, M. He, A. Faust, R. Tseng, S. Weaver, K. Yu, C. Bil, T. Liang, and F. O'Mahony, "10.5 A 5.9pJ/b 10Gb/s serial link with unequalized MM-CDR in 14nm tri-gate CMOS," in 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 1–3, 2015.
- [50] U. Cisco, "Cisco annual internet report (2018–2023) white paper," *Cisco: San Jose, CA, USA*, vol. 10, no. 1, pp. 1–35, 2020.
- [51] K. Kundert, "Modeling Jitter in PLL-based Frequency Synthesizers," 2006. https://www.designers-guide.org.
- [52] G. Telcordia, "Synchronous Optical Network (SONET) Transport Systems: Common Generic Criteria."
- [53] OIF, "OIF-FD-CEI-224G-01.0 Next Generation CEI-224G Framework," 2022. https://www.oiforum.com/wp-content/uploads/OIF-FD-CEI-224G-01.0.pdf.
- [54] T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, and D.-K. Jeong, "A<sup>3</sup>: Accelerating Attention Mechanisms in Neural Networks with Approximation," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341, 2020.
- [55] S. Lee and D.-K. Jeong, "Design of Strong ARM Latch based High Speed Sampler for 32Gb/s Decision Feedback Equalizer," 대한전자공학회 학술대회, pp. 233-234, 2020.

- [56] S. Lee and D.-K. Jeong, "Design of StrongARM Latch based High Speed Sampler with Adjustable Reference Voltage," 대한전자공학회 학술대회, pp. 399– 400, 2021.
- [57] H.-S. Choi, S. Roh, S. Lee, J.-H. Park, K. Lee, Y. Hwang, and D.-K. Jeong, "A 6b 48-GS/s Asynchronous 2b/cycle Time-Interleaved ADC in 28-nm CMOS," in 2021 18th International SoC Design Conference (ISOCC), pp. 127–128, 2021.
- [58] M.-C. Choi, S. Lee, S. Roh, K. Lee, J. Oh, S. Kim, K. Kim, W.-S. Choi, J. Kim, and D.-K. Jeong, "A 2.5–32 Gb/s Gen 5-PCIe Receiver With Multi-Rate CDR Engine and Hybrid DFE," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 69, no. 6, pp. 2677–2681, 2022.
- [59] J.-H. Park, K.-H. Lee, Y. Lee, J.-W. Sull, Y. Song, S. Lee, H. Lee, H. Cho, J. Oh, H.-G. Ko, and D.-K. Jeong, "A 68.7-fJ/b/mm 375-GB/s/mm Single-Ended PAM-4 Interface with Per-Pin Training Sequence for the Next-Generation HBM Controller," in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pp. 150–151, 2022.
- [60] B. Kang, W. Jung, H. Kim, S. Lee, and D.-K. Jeong, "A 42Gb/s PAM-8 Transmitter with Feed-Forward Tomlinson-Harashima Precoding in 28nm CMOS," in 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 1–3, 2022.
- [61] J.-H. Park, H. Lee, H. Cho, S. Lee, K.-H. Lee, H.-G. Ko, and D.-K. Jeong, "A 32Gb/s/pin 0.51 pJ/b Single-Ended Resistor-less Impedance-Matched Transmitter with a T-Coil-Based Edge-Boosting Equalizer in 40nm CMOS," in 2023 IEEE International Solid- State Circuits Conference (ISSCC), pp. 410–412, 2023.
## 국문초록

본 학위 논문은 적응 제어 균등화 기술 (adaptive equalization) 및 클럭-데이터 복원 기술 (clock-and-data recovery)가 적용된 고속 유선 수신기에 관한 연구를 소 개한다. 고속 유선 수신기의 구조적 특징에 관한 소개 및 관련 기존 연구들에 대한 분석은 회로적 통찰을 제공하다. 더불어, 고속 수신기의 성능을 최대로 이끌어 낼 수 있도록 하는 기울기 기반 최대 눈 크기 추적 (gradient maximum-eye-tracking) 알고 리즘을 제안한다. 널리 사용되는 기울기 증가 최적화 방식 (gradient ascent method) 을 접목한 최대 눈 크기 추적 알고리즘을 통해 수신기의 성능을 최적화 할 수 있다. 또한, 공통의 알고리즘으로 동시에 최적화하는 것으로 적은 전력 소모 및 설계 복잡 도를 낮출 수 있다. 프로토타입 수신기는 28-nm CMOS 공정을 통해 제작되었으며, 0.106mm<sup>2</sup>의 면적을 차지한다. 보 레이트 (baud-rate) 클럭-데이터 복원회로 및 2tap DFE를 포함 하고 있으며, 기존의 수신기들과 비교하였을 때, 높은 손실이 있는 조건에서도 뛰어난 성능을 보였다. 28 Gb/s의 데이터 속도에서 47 mW의 전력을 소 모하였으며, 에너지 효율은 1.68 pJ/b으로 나타났다. 또한, Nyquist 주파수에서 27.7 dB의 손실을 갖는 채널을 이용해 측정하였을 때, 제안하는 기울기 기반 최대 눈 크기 추적 알고리즘을 통한 클럭-데이터 복원 및 DFE의 동시 최적화를 통해 전체 눈의 0.17 UI 만큼의 영역에서 비트 에러율이 10<sup>-12</sup> 미만으로 측정되었다.

**주요어**: 클럭-데이터 복원 기술, 균등화 기술, 고속 통신, 유선 통신, 최대 눈 크기 추적, 적응 제어 균등화 기술, 수신기

**학번**: 2018-29272