



**Ph.D.Dissertation** 

# Design of High-Speed Power-Efficient Transmitter with Time-Based Equalization

시간 기반 등화기를 포함한 고속 저전력 송신기의 설계

by

**Chan-Ho Kye** 

August, 2021

Department of Electrical and Computer Engineering College of Engineering Seoul National University

# Design of High-Speed Power-Efficient Transmitter with Time-Based Equalization

지도 교수 정 덕 균

이 논문을 공학박사 학위논문으로 제출함 2021 년 8 월

> 서울대학교 대학원 전기·정보공학부

계 찬 호

계찬호의 박사 학위논문을 인준함 2021 년 8 월



# Design of High-Speed Power-Efficient Transmitter with Time-Based Equalization

by

Chan-Ho Kye

A Dissertation Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

at

#### SEOUL NATIONAL UNIVERSITY

August, 2021

Committee in Charge:

Professor Dongsuk Jeon, Chairman

Professor Deog-Kyoon Jeong, Vice-Chairman

Professor Woo-Seok Choi

Professor Yongsam Moon

Professor Jaeduk Han

### Abstract

In this thesis, a design of high-speed, power-efficient wireline transmitter is reported. An energy-efficient voltage-mode transmitter with an un-segmented output driver equalizes channel loss in the time-domain based on the phase delay analysis. By modulating the phase of the transmitting clock rather than the serialized data stream, the proposed transmitter significantly reduces the data-dependent jitter. The horizontal eye-opening is improved by compensating for the zero-crossing time variation dependent on the run-length of the transmitted data. The proposed scheme significantly reduces the driver complexity by eliminating many driver slices that consume significant signaling and switching power. The prototype chip has been fabricated in a 28-nm CMOS process and occupies an active area of  $0.045 \text{ mm}^2$ . The measured results show that the proposed transmitter achieves an energy efficiency of 0.95 pJ/b at 22 Gb/s with an output swing of 440 mV<sub>ppd</sub> at 1.0 V supply. In addition, peak-to-peak jitter is reduced from 34 ps to 20 ps at 22 Gb/s with the proposed phase delay compensation over the channel with a 15.0 dB loss.

**Keywords** : voltage-mode transmitter, time-based feed-forward equalizer (TB-FFE), phase delay, zero-crossing time, data-dependent jitter (DDJ), quarter-rate clocking, forwarded-clocking, NRZ.

Student Number : 2016-20861

# Contents

| ABSTRACT                                       | Ι    |
|------------------------------------------------|------|
| CONTENTS                                       | II   |
| LIST OF FIGURES                                | IV   |
| LIST OF TABLES                                 | VIII |
| CHAPTER 1 INTRODUCTION                         | 1    |
| 1.1 MOTIVATION                                 | 1    |
| 1.2 THESIS ORGANIZATION                        | 4    |
| CHAPTER 2 BACKGROUNDS                          | 5    |
| 2.1 Overview                                   | 5    |
| 2.2 FEED-FORWARD EQUALIZATION                  | 7    |
| 2.2.1 Amplitude-Domain Equalization            | 7    |
| 2.2.2 Phase-Domain Equalization                | 12   |
| 2.2.3 PULSE-WIDTH MODULATION                   | 18   |
| 2.3 ADAPTIVE FEED-FORWARD EQUALIZATION         | 21   |
| 2.3.1 Amplitude-Domain Equalization            | 21   |
| 2.3.2 PULSE-WIDTH MODULATION                   | 24   |
| CHAPTER 3 DESIGN OF THE TIME-BASED FEED-FORWAR | D    |
| EQUALIZATION OF THE TRANSMITTER                | 26   |

| 3.1 Overview                           | 26 |
|----------------------------------------|----|
| 3.2 BASIC CONCEPT OF TIME-BASED FFE    |    |
| 3.2.1 ZERO-CROSSING TIME               | 28 |
| 3.2.2 Phase Delay                      | 32 |
| 3.2.3 FINDING THE OPTIMUM COEFFICIENT  | 39 |
| 3.2.4 COMPARISON WITH CONVENTIONAL FFE | 43 |
| 3.3 ADAPTIVE TIME-BASED FFE            | 50 |
| 3.3.1 Overview                         | 50 |
| 3.3.2 BEHAVIORAL MODELING              | 51 |
| 3.3.3 SIMULATION RESULTS               | 53 |
| 3.4 TRANSMITTER IMPLEMENTATION         | 60 |
| 3.4.1 Overview                         | 60 |
| 3.4.2 Phase Modulation                 | 62 |
| 3.4.3 SERIALIZER AND CLOCK PATH        | 67 |
| CHAPTER 4 MEASUREMENT                  | 71 |
| 4.1 Overview                           | 71 |
| 4.2 Eye Diagram                        | 76 |
| 4.3 POWER CONSUMPTION                  | 81 |
| CHAPTER 5 CONCLUSION                   | 84 |
| BIBLIOGRAPHY                           | 86 |
| 초록                                     | 92 |

# **List of Figures**

| FIG. 1.1 ENERGY EFFICIENCY OF RECENTLY PUBLISHED WIRELINE LINKS                  |
|----------------------------------------------------------------------------------|
| FIG. 2.1 BLOCK DIAGRAM OF TRANSMITTER WITH AMPLITUDE-DOMAIN EQUALIZATION         |
| FIG. 2.2 AMPLITUDE MODULATION OF PRE-EMPHASIS AND DE-EMPHASIS                    |
| FIG. 2.3 TRANSMISSION OF A NON-TRANSITION BIT BY THE VOLTAGE-MODE DRIVER FOR     |
| AMPLITUDE EQUALIZATION WITH IMPEDANCE MATCHING AND WITH RELAXED                  |
| IMPEDANCE MATCHING                                                               |
| FIG. 2.4 EYE-DIAGRAMS AND JITTER HISTOGRAMS OF THE RECEIVING END WITHOUT PHASE-  |
| DOMAIN EQUALIZATION AND WITH PHASE-DOMAIN EQUALIZATION                           |
| FIG. 2.5 CHANGES IN TWO SINGLE-BIT RESPONSES WITH DIFFERENT WIDTHS PRODUCED BY   |
| PHASE-DOMAIN EQUALIZATION                                                        |
| FIG. 2.6 CHANGES IN CLOCK PATTERN RESPONSE AND SINGLE-BIT RESPONSE PRODUCED BY   |
| PHASE-DOMAIN EQUALIZATION                                                        |
| FIG. 2.7 REPRESENTATION OF BITS "1" AND "0" USING PWM PULSES WITH DUTY CYCLE D18 |
| FIG. 2.8 NORMALIZED RESPONSE OF A CHANNEL TO PWM PULSES                          |
| FIG. 2.9 POWER SPECTRAL DENSITY OF PWM PULSES WITH DIFFERENT DUTY CYCLES         |
| FIG. 2.10 ADAPTIVE EQUALIZATION OF TRANSMITTER                                   |
| FIG. 2.11 PROPAGATION-TIME DETECTION TIMING DIAGRAM                              |
| FIG. 2.12 BLOCK DIAGRAM OF THE TRANSMITTER WITH ADAPTIVE TAP COEFFICIENTS        |
| FIG. 2.13 ADAPTIVE PHASE-DOMAIN EQUALIZATION OF TRANSMITTER                      |
| FIG. 2.14 THE TRANSCEIVER OPERATING IN THE CALIBRATION MODE                      |
| FIG. 3.1 ZERO-CROSSING TIME VARIATION DUE TO TRANSMITTED DATA                    |

| FIG. 3.2 RC LOW-PASS FILTER CHANNEL UNDER ANALYSIS                                         |
|--------------------------------------------------------------------------------------------|
| FIG. 3.3 LINEAR PHASE SHIFTER                                                              |
| FIG. 3.4 CALCULATED WAVEFORMS OF INPUT AND OUTPUT OF THE LINEAR PHASE SHIFTER              |
| WHEN C IS NON-ZERO                                                                         |
| FIG. 3.5 CALCULATED WAVEFORMS OF INPUT AND OUTPUT OF THE LINEAR PHASE SHIFTER              |
| WHEN C IS ZERO                                                                             |
| Fig. 3.6 Calculated waveforms of input and output of the $3^{\text{RD}}$ -order polynomial |
| PHASE SHIFTER WITH SAME PHASE DELAY                                                        |
| Fig. 3.7 Calculated waveforms of input and output of the $3^{\text{RD}}$ -order polynomial |
| PHASE SHIFTER WITH SAME GROUP DELAY                                                        |
| FIG. 3.8 PHASE RESPONSE OF RC CHANNEL                                                      |
| FIG. 3.9 PHASE DELAY OF RC CHANNEL                                                         |
| Fig. 3.10 Comparison results of average zero-crossing time and phase delay with            |
| TWO PREVIOUS BITS                                                                          |
| Fig. 3.11 Comparison results of average zero-crossing time and phase delay with            |
| THREE PREVIOUS BITS                                                                        |
| FIG. 3.12 CONCEPTUAL DIAGRAM OF FFE                                                        |
| FIG. 3.13 CONCEPTUAL DIAGRAM OF TB-FFE                                                     |
| FIG. 3.14 BLOCK DIAGRAM OF BEHAVIORAL MODELING FOR TB-FFE                                  |
| FIG. 3.15 SIMULATED EYE-DIAGRAMS OF RECEIVER SIDE ON RC CHANNEL WITH FFE                   |
| FIG. 3.16 SIMULATED EYE-DIAGRAMS OF RECEIVER SIDE ON RC CHANNEL WITH TB-FFE46              |
| FIG. 3.17 REAL CHANNEL CHARACTERISITCS UNDER SIMULATION                                    |
| FIG. 3.18 REAL CHANNEL PHASE DELAY UNDER SIMULATION                                        |
| FIG. 3.19 SIMULATED EYE-DIAGRAMS OF RECEIVER SIDE ON REAL CHANNEL WITH FFE                 |

| Fig. 3.20 Simulated eye-diagrams of receiver side on real channel with TB-FFE49 $$ |
|------------------------------------------------------------------------------------|
| FIG. 3.21 BLOCK DIAGRAM OF BEHAVIORAL MODELING FOR ADAPTIVE TB-FFE                 |
| FIG. 3.22 Look-up table for the pattern of $\{D_{-4}, D_{-3}, D_{-2}, D_{-1}\}$    |
| FIG. 3.23 SIMULATED EYE-DIAGRAM WITHOUT EQUALIZATION                               |
| FIG. 3.24 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 50NS                       |
| FIG. 3.25 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 100NS                      |
| FIG. 3.26 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 200NS                      |
| FIG. 3.27 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 400NS                      |
| FIG. 3.28 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 700NS                      |
| FIG. 3.29 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 1.2US                      |
| FIG. 3.30 SIMULATED EYE-DIAGRAM WITH EQUALIZATION AFTER 2US                        |
| FIG. 3.31 COMPARISON RESULT OF THE EYE-DIAGRAMS                                    |
| FIG. 3.32 SETTLING BEHAVIOR OF THE TAP COEFFICIENTS WITH 7-TAPS OF TB-FFE          |
| FIG. 3.33 SETTLING BEHAVIOR OF THE TAP COEFFICIENTS WITH 3-TAPS OF TB-FFE          |
| FIG. 3.34 OVERALL ARCHITECTURE OF PROPOSED TRANSMITTER                             |
| FIG. 3.35 OUTPUT IMPEDANCE OF THE ACTIVE-FEEDBACK DRIVER                           |
| FIG. 3.36 PHASE CONTROL SIGNAL ENCODER                                             |
| FIG. 3.37 TIMING DIAGRAM OF PHASE MODULATION                                       |
| FIG. 3.38 PHASE MODULATOR AND ITS TIMING DIAGRAM                                   |
| FIG. 3.39 DELAY COEFFICIENT OF PHASE MODULATOR ACROSS PROCESS VARIATION            |
| FIG. 3.40 DIFFERENCE FROM THE PRIOR WORK                                           |
| FIG. 3.41 8-TO-4 CLOCKED MUX                                                       |
| FIG. 3.42 4-TO-1 CLOCKED MUX AND ITS TIMING DIAGRAM.                               |
| FIG. 3.43 POST-LAYOUT SIMULATED EYE DIAGRAMS OF S2D OUTPUT AND DRIVER OUTPUT AT    |

| 20 GB/S                                                                                 |
|-----------------------------------------------------------------------------------------|
| FIG. 3.44 CIRCUIT IMPLEMENTATION OF CLOCK PATH                                          |
| FIG. 4.1 CHIP PHOTOMICROGRAPH OF THE IMPLEMENTED TRANSMITTER                            |
| FIG. 4.2 MEASUREMENT SETUP                                                              |
| FIG. 4.3 MEASURED LOSS PROFILE OF FR4 TRACE, SMA CONNECTOR AND SMA CABLE 74             |
| FIG. 4.4 MEASURED PHASE DELAY PROFILE OF CHANNEL                                        |
| Fig. 4.5 Measured eye-diagram of transmitter output at $20$ Gb/s without                |
| MODULATION77                                                                            |
| Fig. 4.6 Measured eye-diagram of transmitter output at $20$ Gb/s with modulation.       |
|                                                                                         |
| Fig. 4.7 Measured eye-diagram of receiver side on inserted channel at $20 \text{ GB/s}$ |
| WITHOUT TB-FFE                                                                          |
| Fig. 4.8 Measured eye-diagram of receiver side on inserted channel at $20 \text{ GB/s}$ |
| WITH 2-TAP TB-FFE                                                                       |
| Fig. 4.9 Measured eye-diagram of receiver side on inserted channel at 22 Gb/s $$        |
| WITHOUT TB-FFE                                                                          |
| Fig. 4.10 Measured eye-diagram of receiver side on inserted channel at 22 Gb/s $$       |
| WITH 2-TAP TB-FFE                                                                       |
| FIG. 4.11 MEASURED POWER BREAKDOWN OF ENTIRE TRANSMITTER AT 22 GB/S                     |
| FIG. 4.12 COMPARISON OF ENERGY EFFICIENCY WITH RECENTLY REPORTED VOLTAGE-MODE           |
| TRANSMITTERS ACROSS THE DATA RATE                                                       |

# **List of Tables**

| TABLE 3.1 TABLE OF ZERO-CROSSING TIME.                               | 31 |
|----------------------------------------------------------------------|----|
| TABLE 3.2 SUMMARY OF EYE-OPENING ON RC CHANNEL.                      | 46 |
| TABLE 3.3 SUMMARY OF EYE-OPENING ON REAL CHANNEL.                    | 49 |
| TABLE 4.1 SUMMARY OF PHASE DELAY AT 20 GB/S AND 22 GB/S              | 75 |
| TABLE 4.2 SUMMARY OF PEAK-TO-PEAK JITTER IMPROVEMENT.                | 80 |
| TABLE 4.3 PERFORMANCE SUMMARY AND COMPARISON WITH OTHER VOLTAGE-MODE |    |
| TRANSMITTERS.                                                        | 83 |

## Chapter 1

### Introduction

### **1.1 Motivation**

The growth of machine learning, big data, artificial intelligence (AI), and the Internet of Things (IoT) is expected to increase the global internet data traffic to approximately 400 EB/month in 2022 [1]. This has resulted in an exponential increase in the chip-to-chip and chip-to-module aggregate data rates. While data rates continue to increase exponentially, it has been observed that the energy efficiency improvement of wireline links has slowed down in the last five years, as shown in Fig. 1.1 [2]. If this trend continues, a significant portion of the system on-chip (SoC) power will be consumed by the wireline communication system, and only very little power will be available for computing. Therefore, improving the energy efficiency of wireline links by reducing their power consumption is required. Power consumption in a typical high-speed wireline communication consists of three components: 1) driving power (power consumption in the output driver to drive a transmission line); 2) analog power (power consumption in the analog blocks); and 3) clocking power (power consumption in the clock distribution network). Power consumption in a wireline link can also be classified into the transmitter, receiver, and clocking blocks. The transmitter power consists of the serializer, equalizer, and the output driver to drive a 50- $\Omega$  transmission line. The receiver power consists of the front-end power in the analog blocks, such as an amplifier, deserializer, and equalizer. The clocking power consists of the clock distribution network, clock buffer, duty-correction, and phase-correction.





Fig. 1.1 Energy efficiency of recently published wireline links.

the driving power for transmitting the data. Driving power consumption is a function of the signal swing, output driver architecture (SST or CML), and equalization architecture. The use of SST output driver architecture can help reduce the output driver's power by four times compared with the CML-based output driver [3]. Low-voltage differential signaling (LVDS) output drivers offer better energy efficiency than SST drivers when a small output signal swing is required [4]. However, when LVDS drivers are operated for the maximum swing that the SST driver can achieve, LVDS drivers consume twice the power compared with SST driver.

Another reason for degrading energy efficiency improvement is that consuming power for equalization of the channel loss is significant as the data rate increases. To equalize the channel loss at the transmitter, the pre-driver and the driver power are significant to tune the equalizer tap coefficients, and set termination resistance for impedance matching.

In this thesis, we propose an energy-efficient and low-complexity equalization method for the transmitter. We analyze the phase delay to optimize the coefficient value and the number of taps for equalization. To find the tap coefficients adaptively, behavioral model of the adaptive equalization is proposed and simulated.

### **1.2 Thesis Organization**

This thesis is organized as follows. In Chapter 2, a background of the feed-forward equalization (FFE) is explained. The basic operations and critical blocks of the general FFE are defined. Especially, equalizing with amplitude-domain, phase-domain, and pulse-width modulation are introduced. Also, adaptive equalization for the optimum FFE coefficient is described.

In Chapter 3, a transmitter with the time-based feed-forward equalization (TB-FFE) and its quarter-rate implementation is proposed. First, the basic concept of the zerocrossing time and the phase delay is described and analyzed. The optimum coefficient and the optimum number of taps for the phase delay compensation are proposed by calculating the phase difference of the different frequency components. Secondly, the behavioral model of the adaptive TB-FFE is presented and simulated. Finally, the circuit implementation of the proposed transmitter is described.

In Chapter 4, the measurement results of the implemented transmitter are described. The eye diagrams of the transmitter output are measured. With the significant channel loss, the eye diagram improvement of the receiver side is measured by modulating the phase of the transmitter output. At the end of Chapter 4, the proposed TB-FFE scheme is compared with the prior works of the other published transmitters.

Finally, Chapter 5 summarizes the proposed work and concludes this thesis.

## Chapter 2

### Backgrounds

#### **2.1 Overview**

The bandwidth-limited channel attenuates the high-frequency gain of the transmitted data due to the skin effect and dielectric loss. It induces inter-symbol interference (ISI). The ISI may degrade the timing jitter and eye-opening of received data and worsen the bit error rate (BER). The FFE is usually implemented in a transmitter to compensate the channel loss to equivalently boost the high-frequency gain. The FFE on a transmitter avoids the feedback path and results in an efficient equalization. Various equalizing methods in a transmitter have been proposed. Representatively, there are three methods for equalization of transmitter: amplitude-domain, phase-domain, and pulse-width modulation. The amplitude-domain equalization has the advantages of an accurate FFE tap coefficient setting for channel loss compensation. However, as the number of the segment driver cells is increased, the impedance matching is difficult. Also, the bandwidth is limited due to the output capacitance of the driver. The phase-domain and pulse-width modulation have the advantages of decoupling the trade-off among the impedance matching, output swing, and gain resolution of the FFE.

In this chapter, before explaining the proposed TB-FFE scheme in the later chapter, the conventional FFE scheme and basic theory are firstly described. Also, the traditional adaptive FFE scheme at the amplitude-domain and the phase-domain is described.

### 2.2 Feed-Forward Equalization

#### 2.2.1 Amplitude-Domain Equalization

Amplitude-domain equalization is the most widely used method of equalization in a transmitter design. The amplitude-domain equalization provides high resolution for equalization coefficients, and it is simple to implement and measure [5], [6]. However, the amplitude-domain equalization requires significant driving and switching power in the driver node of the transmitter because of the high simultaneous switching noise [7]. Also, to equalize the data at the driver node, many slices of the driver is significantly required, as shown in Fig. 2.1.

Amplitude-domain equalization techniques, such as FFE, involve increasing the amplitude of transition bits or decreasing the amplitude of non-transition bits, as shown in Fig. 2.2. Amplitude-domain equalization can be incorporated in a voltage-mode driver without sacrificing channel impedance matching by means of a high-pass filter. The driver output  $Y_{FFE}[n]$  of 2-tap FFE can be expressed as follows:

$$Y_{FFE}[n] = X[n] - \alpha \cdot X[n-1] \tag{2.1}$$

where  $\alpha$  is the tap coefficient, X[n-1] and X[n] are 1 UI delayed input bit and input bit, respectively. Fig. 2.3 shows how consecutive bits of pull-up data undergo amplitude-domain equalization in an impedance-matched voltage-mode driver having a  $V_{SS}$ termination. When data is transmitted, the voltage level of a transition bit is  $V_{OH,MAIN}$ .



Fig. 2.1 Block diagram of transmitter with amplitude-domain equalization.

During transmission of a non-transition bit, the voltage level is reduced by the tap coefficient  $\alpha$ , producing the voltage level of  $V_{OH,POST}$ . When this operation happens, some pull-up and pull-down drivers turn on or off, which creates a short current path from the supply to the ground [8], increasing the power consumption.



Fig. 2.2 Amplitude modulation of pre-emphasis and de-emphasis.

To overcome the drawbacks of the scheme, if only some pull-up drivers are turned on or off when transferring consecutive identical data '1's, the output voltage  $V_{OH,POST}$ of non-transition bits can be adjusted by the tap coefficient  $\alpha$  to equalize the amplitude of the pull-up data, as shown in Fig. 2.3. This can have a harmful effect on channel impedance matching, but signal reflections are somewhat attenuated as the channel loss increases [9]. Besides, the short current path is removed. Overall, relaxing impedance matching [10] in pull-up data equalization can balance simultaneous switching noise and signal reflections while equalizing the output amplitude and reducing power consumption.



Fig. 2.3 Transmission of a non-transition bit by the voltage-mode driver for amplitude equalization with impedance matching and with relaxed impedance matching.

#### 2.2.2 Phase-Domain Equalization

Phase-domain equalization conventionally leads or lags the rising or falling edges of data, depending on the previous data pattern.

Generally, the eye-opening of the received signal is narrowed by jitter, as shown in Fig. 2.4. The total jitter probability distribution function (PDF) is separated to random jitter (RJ) from deterministic jitter (DJ) [11]. RJ can be modeled as a Gaussian distribution, and the distribution of DJ can be modeled as two impulse functions [12] and then characterized by the distance between those impulses. DJ can be classified into various types. DDJ is the significant component of DJ. DDJ is the deviation of each data zero-crossing time from a reference period due to the residual effect of previous data [13]. Due to the limited bandwidth of the transmission channel and the front end of a receiver, the transmission of a single bit causes a response with a long tail, which modifies the channel's response to the next bit. This ISI affects both the



Fig. 2.4 Eye-diagrams and jitter histograms of the receiving end without phase-domain equalization and with phase-domain equalization.

amplitude and phase of the received signal. The phase-domain equalization compensates for this effect by making the time at which transmission begins depending on the previous data.

The system response determines the behavior of DDJ. Therefore, an analysis of phase-domain equalization naturally starts with a first-order system, modeled by a low-pass filter (LPF) with a time constant  $\tau$  (=RC). In the absence of noise, the received data r(t) can be expressed as follows.

$$r(t) = \sum_{n = -\infty}^{0} a_n \cdot g(t - nT)$$
(2.2)

$$g(t) = 1 - e^{-\frac{t}{\tau}} \quad 0 < t < T$$
 (2.3)

where  $a_n$  is the data symbol and g(t) is the single-bit response determined by the channel. We have used the normalized signal amplitude in the equation because the amplitude is an irrelevant parameter that is canceled during the calculation. We define the phase shift coefficient  $\alpha_n$  to be the factor required to compensate for DDJ caused by the previous bit  $a_{-(n+1)}$  when the current bit  $a_0$  passes through the decision threshold voltage of the receiver. In a first-order system,  $\alpha_1$  can be derived from two single-bit responses,  $r_1(t)$  and  $r_2(t)$ , with different widths, as shown in Fig. 2.5. These two responses, including the effect of phase-domain equalization, can be written as follows:

$$r_1(t) = g(t + \alpha_1) + g(t + 3T) + g(t + 4T) + \dots$$
(2.4)

$$r_2(t) = g(t) + g(t+2T) + g(t+3T) + \cdots .$$
(2.5)





Since optimum phase-domain equalization will cause the decision threshold voltage to occur at the same time in the signals described by (2.4) and (2.5),  $\alpha_l$  can be obtained as follows:

$$\alpha_{1} = -\tau \ln(1 - e^{-\frac{T}{\tau}} + e^{\frac{2T}{\tau}}).$$
(2.6)

Similarly, we can find  $S_n$  and  $\alpha_n$  by using single-bit responses with widths of nT and (n+1)T

$$S_n = \sum_{k=1}^n a_k \tag{2.7}$$

$$a_n = S_n - S_{n-1}$$
(2.8)

The maximum eye height  $H_{Eye,ph}$  after DDJ compensation can also be found numerically. The ideal maximum eye height is determined by a clock pattern. This produces the same eye-opening for any phase shift coefficients since the previous data pattern is always identical for any bit in a clock pattern. Fig. 2.6 shows the response to a clock pattern and a single bit with and without the presence of DDJ compensation. The response to a clock pattern can be represented as single-bit responses superimposed on each other at 2T intervals. The value of  $\alpha_{even}$  is the sum of the phase shift coefficients required to compensate for the effect of the previous even bits in the clock pattern. The single-bit response requires the most significant phase shift,  $\alpha_{max}$ .  $\alpha_{max}$  is equal to the sum of the phase shift coefficients,  $S\infty$ , and it has the same meaning as maximum DDJ. When phase-domain equalization is applied, the single-bit response with phase-domain equalization is equal to the response to a clock pattern at the decision level of the receiver side, and  $\alpha_{max}$  can be derived from (2.7), as follows:





$$\lim_{n \to \infty} S_n = \alpha_{\max} = -\tau \ln(1 - e^{-\frac{T}{\tau}}) .$$
(2.9)

If the signal responses overlap correctly, and the previous data cannot affect current data in response to the clock pattern, the eye height can be obtained by the variation in voltage of the response to the clock pattern between  $t = T - \alpha_{even}$  and  $t = -\alpha_{even}$ 

$$H_{Eye,emp} = \frac{1 - e^{-\frac{T}{\tau}}}{1 + e^{-\frac{T}{\tau}}} > 0.$$
 (2.10)

Unlike the amplitude-domain equalization, the phase-domain equalization does not require segmentation of the output driver or additional output driver slices for equalization. This reduces the input and output capacitance ( $C_{IO}$ ), which improves the signal integrity of both the transmitter and the receiver. Because the phase-domain equalization does not produce the short current path at the output driver, it uses less power than amplitude-domain equalization and reduces simultaneous switching noise.

#### 2.2.3 Pulse-Width Modulation

The pulse-width modulation (PWM) performs equalization by adjusting the duty cycle of the transmit pulse of period T, as shown in Fig. 2.7. It equalizes channel response similar to the FIR filter. PWM employs time as an equalizer variable, and FIR uses voltage variable. In contrast to FIR filter-based de-emphasis, PWM provides selective frequency amplification beyond the Nyquist frequency (1/2T). It is to be noted that a high pass frequency response of an equalization filter over a larger frequency range helps in compensating channel loss beyond Nyquist frequency and thereby reducing ISI.

A PWM pulse with duty cycle D can be expressed as

$$p_{pwm}(t) = u(t) - 2u(t - DT) + u(t - T)$$
(2.11)

where T is the bit period,  $0.5 \le D \le 1$  is the duty cycle of the pulse waveform, and u(t) is a step function. ISI caused by channel loss can be compensated by selecting D



Fig. 2.7 Representation of bits "1" and "0" using PWM pulses with duty cycle D.

between 0.5 and 1. The impact of the duty cycle of the PWM pulse on ISI can be visualized both in the time and frequency domains. In the time domain, consider the normalized channel response to a PWM pulse, shown in Fig. 2.8, for three different duty cycle conditions. In the channel loss at Nyquist frequency 28dB, compared with unequalized NRZ, D = 0.5 shows significantly less ISI, thereby reducing ISI by selecting D optimally. In the frequency domain shown in Fig. 2.9, the power spectral density of the PWM pulse can be calculated to be

$$\left|S_{PWM}(f)\right| = \frac{1}{\pi f} \sqrt{1 + \cos(\pi fT) [\cos(\pi fT) - 2\cos((2D - 1)\pi fT)]}$$
(2.12)



Fig. 2.8 Normalized response of a channel to PWM pulses.

It extends beyond the Nyquist frequency in contrast to an amplitude-domain equalization, which de-emphasizes frequencies only below the Nyquist frequency. Consequently, PWM-based de-emphasis enables wide-range ISI suppression.

PWM-based equalization transmits only two voltage levels. This makes it ideally suited for implementing de-emphasis in a voltage-mode driver. It eliminates the nonlinear dependence of driver output impedance on varying output swing present in a voltage-mode de-emphasis. Also, it decouples termination impedance from both the amount of de-emphasis and output swing.



Fig. 2.9 Power spectral density of PWM pulses with different duty cycles.

### **2.3 Adaptive Feed-Forward Equalization**

#### 2.3.1 Amplitude-Domain Equalization

The transmitter side for the adaptation logic of the equalization can be implemented by using the information of the receiver side, as shown in Fig. 2.10. To adapt the tap coefficients of FFE, the back-channel is needed. However, in [14], by transmitting a step signal through the channel, as shown in Fig. 2.11, the propagation time is estimated to update the digitally controlled tap coefficients. This structure does not need an additional back channel or a collaborating receiver.

Initially, the transmitter is operated in propagation-time detection mode, and the terminations of the receiver are isolated. When the step signal is transmitted from one



Fig. 2.10 Adaptive equalization of transmitter.



Fig. 2.11 Propagation-time detection timing diagram.

side of the transmission line, the signal Ctrl connected to a counter goes high. The counter starts to count up to update the coefficients of the FFE.

The fixed tap coefficient with the main driver and the variable tap coefficient with the FFE tap driver is combined at the driver node, as shown in Fig. 2.12.

This scheme has advantages of eliminating back-channel with only detecting propagation time. However, it may limit the bandwidth of the counter and the comparator to detect the propagation time.





#### 2.3.2 Pulse-Width Modulation

In [7], the comparison result is transmitted back to the transmitter, and the duty cycle of the PWM is updated to compensate for the channel loss, as shown in Fig. 2.13. The advantage of this adaptation is the tolerance of the PVT variations. It can adaptively compensate for the loss of the channels with different lengths. However, it has a calibration circuit to return the transmitter controller to increase the hardware. Also, this approach will be limited to differential communications through two single-ended channels. This approach is susceptible to mismatches between two cables, the channel imperfections, the offset of the transceiver.



Fig. 2.13 Adaptive phase-domain equalization of transmitter.



Fig. 2.14 The transceiver operating in the calibration mode.

# Chapter 3

# Design of the Time-Based Feed-Forward Equalization of the Transmitter

### **3.1 Overview**

As the number of data buffers increases dramatically in high-speed I/O interfaces, minimizing power consumption with minimal DJ becomes a critical issue. As the data rate expands, the channel loss at the Nyquist frequency is increased, and thus, equalizing the channel loss with good energy efficiency is required. DDJ takes up the major DJ and shows up as varied zero-crossing times dependent on the transmitted data. To address this issue, the equalization at the transmit-side is conventionally provided by the FFE [15]-[17]. Reducing the amplitude at the non-transition bits effectively achieves high frequency boosting at the Nyquist frequency. However, it increases signaling and switching power because of the implementation of the segmented driver slices for equalization. Although the PWM-based transmitter [7] does not require the segmented driver slices, a narrow pulse must be generated precisely at the high data rate. The phase-domain equalization [18], which modulates the phase of the transmitting data to compensate for the post-cursor ISI, was presented. However, the modulation is performed at the output driver, which requires high bandwidth.

This thesis proposes a voltage-mode transmitter operating up to 22 Gb/s with the TB-FFE that is suitable for low-power operation. By modulating the phase of the transmitting clock at the low-bandwidth path, DDJ is significantly reduced. It achieves good energy efficiency with low driver complexity. The phase delay analysis for calculating the zero-crossing times of the transmitting data is presented to obtain the optimum number of the previous bits to examine and the optimum coefficients to compensate for DDJ. The overall hardware complexity is more relaxed than the FFE, thereby allowing enhanced bandwidth of the equalization. The active-feedback output driver is implemented for driving the data in the voltage mode.

## **3.2 Basic Concept of Time-Based FFE**

#### **3.2.1 Zero-Crossing Time**

The response of the LTI system with a finite bandwidth to a data bit is influenced by the transmitted data [19] since the frequency-dependent loss shifts the zero-crossing time of the transmitting data, as shown in Fig. 3.1.

Assuming the first-order system with the RC LPF channel shown in Fig. 3.2, the zero-crossing times are spread according to the transmitted data. If the three previous bits ( $D_{-4}$ ,  $D_{-3}$ ,  $D_{-2}$ ) are considered at the current low-to-high transition, the sequence can be grouped into eight ways with the lower and upper delay boundaries, as shown in Table 3.1. The zero-crossing times can be analytically obtained using the two parameters;  $\tau$  (=RC) and T(=1UI).

In this case, three previous bits ( $D_{-4}$ ,  $D_{-3}$ ,  $D_{-2}$ ) are examined, with all other earlier bits accounted as all 1s and -1s for the lower and upper delay boundaries. For example, the sequence of ( $D_{-2}$ ,  $D_{-1}$ ,  $D_0$ ) = (-1, -1, 1) has a more lagged zero-crossing time with respect to the sequence of ( $D_{-2}$ ,  $D_{-1}$ ,  $D_0$ ) = (1, -1, 1) since the narrow bandwidth prevents the settlement to a complete binary level in the latter case. By properly adjusting the timing of the transmitting data with respect to the leading one, the zero-crossing times are equalized, thereby improving the horizontal eye-opening of the receiver side. Ideally, DDJ can be fully compensated by adjusting the zero-crossing time with respect to the most leading one by examining all the previous bits. However, the complexity of the encoder increases to examine more previous bits. Thus, for the reduced encoder complexity, the previous bits to examine have to be limited.



Fig. 3.1 Zero-crossing time variation due to transmitted data.



Fig. 3.2 RC low-pass filter channel under analysis.

| ~D₋₅  | D-4 D-3 D-2 D-1 D0 |     |    | <b>D</b> -1 | D٥ | Zero-crossing time                                                   |                     |
|-------|--------------------|-----|----|-------------|----|----------------------------------------------------------------------|---------------------|
| Lower | 1                  | 1   | 4  |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma)$                                  | ]↑                  |
| Upper | 1                  | 1   | 1  | -1          | 1  | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^4)$                       |                     |
| Lower | -1                 | 1   | 1  |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^3 - \gamma^4)$            | itio                |
| Upper | - 1                |     |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^3)$                       | faster<br>ransitior |
| Lower |                    | -1  | 1  |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^2 - \gamma^3)$            | tra                 |
| Upper | 1                  |     |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^2 - \gamma^3 + \gamma^4)$ |                     |
| Lower | 1                  | 4   | 1  |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^2 - \gamma^4)$            |                     |
| Upper | -1                 | -1  |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma + \gamma^2)$                       |                     |
| Lower |                    | 1   | -1 |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^2)$                                |                     |
| Upper | 1                  |     |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^2 + \gamma^4)$                     |                     |
| Lower | 4                  | 1   | -1 |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^2 + \gamma^3 - \gamma^4)$          |                     |
| Upper | -1                 |     |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^2 + \gamma^3)$                     | sr<br>on            |
| Lower |                    | -1  | -1 |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^3)$                                | slower<br>ansitio   |
| Upper | 1                  |     |    |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^3 + \gamma^4)$                     | slower<br>ransitior |
| Lower |                    | 1   | 4  |             |    | $\tau \ln 2 + \tau \ln(1 - \gamma^4)$                                | ]   📮               |
| Upper | - 1                | - 1 | -1 |             |    | $\tau \ln 2$                                                         | ] ↓                 |
| Т     |                    |     |    |             |    |                                                                      |                     |

Table 3.1 Table of zero-crossing time.

\* 
$$\gamma = e^{-\frac{1}{\tau}}, \tau = RC$$

#### **3.2.2 Phase Delay**

Recently, the required bandwidth of wireline communications has been increasing [20], and therefore many on-chip bandwidth extension techniques have been proposed [21]. The importance of evaluating timing distortion of a wideband circuit has also been increasing to assess such bandwidth extension techniques because the timing accuracy is as important as the signal-to-noise ratio (SNR) [22], which is related to a magnitude response of a wideband circuit. A group delay has been a widely used performance metric for evaluating wideband amplifiers and buffers. It is believed that the group delay provides information on timing distortion caused by a wideband circuit, which is hard to be intuitionally informed from a magnitude response. It has also been believed that a flat group delay response across the frequency range of interest assures the quality of the wideband circuit [23].

However, inherently, a phase delay analysis corresponds much more with the classic theory on distortionless transmission [24], than the group delay analysis.

By definition, the group delay and the phase delay are given as

$$\tau_{g}(\omega) = -\frac{d\phi(\omega)}{d\omega}, \qquad (3.1)$$

$$\tau_{p}(\omega) = -\frac{\phi(\omega)}{\omega}$$
(3.2)

where  $\omega$  is an angular frequency, and  $\phi(\omega)$  is a phase response. Notably, that the group delay is obtained by using a differentiation operation. If there is a non-zero 'constant' term in  $\phi(\omega)$ , this data is lost after the differentiation. Before we decide to discard

some information, we must ensure that the information is truly negligible. Let us assume an imaginary transfer function, whose magnitude response is unity across all the frequency and phase response is proportional to the frequency. It may be referred to as a linear phase shifter. The phase response of this phase shifter can be expressed as

$$\phi(\omega) = -k\omega + C \tag{3.3}$$

where k and C are non-zero, arbitrary constants. Note that the group delay of the linear phase shifter is k regardless of the frequency being investigated. This transfer function is perfect from the conventional standpoint because it has a flat magnitude and a flat group delay across the overall frequency range. When two combined sinusoidal signals are applied to the input of the linear phase shifter as shown in Fig. 3.3, two sinusoidal components experience a different phase shift. It is noteworthy that there is no signal distortion when the two signals experience the same delay in time. The output of the phase shifter can then be rewritten as

$$Out(t) = \sin\{\omega_1(t + \frac{\phi(\omega_1)}{\omega_1})\} + \sin\{\omega_2(t + \frac{\phi(\omega_2)}{\omega_2})\}.$$
 (3.4)

That is, the first and the second sinusoidal signals experience time delays of  $-\phi(\omega_1)/\omega_1$  and  $-\phi(\omega_2)/\omega_2$ , respectively, which is the phase delay. In the case of a linear phase shifter, (3.4) becomes

$$Out(t) = \sin\{\omega_1(t-k+\frac{C}{\omega_1})\} + \sin\{\omega_2(t-k+\frac{C}{\omega_2})\}.$$
 (3.5)





Because the time delays experienced by the sinusoid at  $\omega_1$  and  $\omega_2$  are always different from each other, except when C=0, the waveform at the output deviates from that at the input, as shown in Fig. 3.4. On the other hand, when C=0, the result shows the same waveform as that of the input and is just delayed by *k*, as shown in Fig. 3.5. From this observation, it is concluded that neglecting the constant term, which usually happens during the calculation of a group delay, results in a loss of necessary information.

Even when C=0, the flat group delay may fail to reflect a waveform distortion in some cases. Let us assume a phase shifter, whose phase response is given as a third-order polynomial as follows,

$$\phi(\omega) = -k_3\omega^3 - k_2\omega^2 - k_1\omega, \qquad (3.6)$$

where  $k_3$ ,  $k_2$ , and  $k_1$  are arbitrary constants. Then the phase delay  $\tau_p$  becomes

$$\tau_{p}(\omega) = k_{3}\omega^{2} + k_{2}\omega + k_{1}, \qquad (3.7)$$

while the group delay is

$$\tau_{g}(\omega) = 3k_{3}\omega^{2} + 2k_{2}\omega + k_{1}, \qquad (3.8)$$

With coefficients of  $k_3=-5\times10^{-25}/(12\pi^2)$ ,  $k_2=5\times10^{-17}/(2\pi)$ , and  $k_1=5\times10^{-9}$ , the phase delays at 100 MHz and 200 MHz are made the same, whereas the same group delays are obtained with a slightly different set of coefficients of  $k_3=-5\times10^{-25}/(36\pi^2)$ ,  $k_2=5\times10^{-17}/(4\pi)$ , and  $k_1=5\times10^{-9}$ . The waveforms at the output of this polynomial phase shifter are shown and compared to those at the input in Fig. 3.6. For the case of the



Fig. 3.4 Calculated waveforms of input and output of the linear phase shifter when

C is non-zero.



Fig. 3.5 Calculated waveforms of input and output of the linear phase shifter when

C is zero.



Fig. 3.6 Calculated waveforms of input and output of the 3<sup>rd</sup>-order polynomial phase shifter with same phase delay.



Fig. 3.7 Calculated waveforms of input and output of the 3<sup>rd</sup>-order polynomial

phase shifter with same group delay.

same phase delay, the output waveform exactly matches with the input waveform despite the group delays differ by 5 ns, as shown in Fig. 3.7. On the other hand, the output waveform deviates from the input waveform even with the same group delay, although the difference between the phase delays is less than 1 ns. In other words, the coefficient multiplication due to the presence of a differentiation in group delay calculation causes not only the loss of the constant term but also deformation of the original phase information. Considering Taylor's theorem where an n-th order polynomial can approximate all n-times differentiable functions, it can be inferred that the failure of group delay analysis on this polynomial based example may be extended to very general cases.

#### 3.2.3 Finding the Optimum Coefficient

To determine the optimum number of the previous bits to examine and the optimum coefficients, the phase delay analysis is employed. It is appropriate to evaluate more precise timing information on the LTI system. The phase delay is the amount of time delay at each frequency component of the signal. If the phase delays of the different frequencies at the particular channel environment are adjusted equally, the zerocrossing times of the receiver side will match perfectly for any random data sequence.

The phase response and the phase delay of the LPF channel in Fig. 3.2 can be expressed as

$$\phi(\omega) = -\arctan(\omega RC) \tag{3.9}$$

and

$$\tau_{p}(\omega) = \frac{\arctan(\omega RC)}{\omega}$$
(3.10)

where  $\omega$  is an angular frequency,  $\phi(\omega)$  is the phase response, and  $\tau_p(\omega)$  is the phase delay of the LPF channel.

Fig. 3.8 and Fig. 3.9 show the phase response and the phase delay of the LPF channel, respectively. The phase delay can be numerically calculated and plotted using and T. The phase delay difference between the sub-harmonic frequencies can be approximated by the difference of the average zero-crossing time.



Fig. 3.8 Phase response of RC channel.



Fig. 3.9 Phase delay of RC channel.



Fig. 3.10 Comparison results of average zero-crossing time and phase delay with

two previous bits.



Fig. 3.11 Comparison results of average zero-crossing time and phase delay with

three previous bits.

By compensating for the phase delay difference from the phase delay at the Nyquist frequency as the reference, the difference of the average zero-crossing time due to the various run-lengths (consecutive identical bits) can be significantly reduced. The comparison results between the phase delays at the sub-harmonic frequencies and the average zero-crossing times of the corresponding sequence are summarized. Fig. 3.10 shows the comparison results with two previous bits ( $D_{-3}$ ,  $D_{-2}$ ) case. Fig. 3.11 shows the comparison results with three previous bits ( $D_{-4}$ ,  $D_{-3}$ ,  $D_{-2}$ ) case.

#### **3.2.4 Comparison with Conventional FFE**

The calculated average zero-crossing time over the various run lengths and the phase delay of the corresponding sequence is quite similar. The results show that the time difference due to the run length is much smaller beyond the run length of four. Although examining more previous bits can reduce more DDJ, the encoder complexity is significantly increased. Thus, we limit the number of the previous bits to examine by two ( $D_{-3}$ ,  $D_{-2}$ ) or three ( $D_{-4}$ ,  $D_{-3}$ ,  $D_{-2}$ ).

As the analysis presented so far indicates, the equalization at the transmit-side can be easily realized by adjusting the timing of the transmitting data. To compare the performance of the TB-FFE with the conventional FFE, we can conceptually model the transmitter architecture as shown in Fig. 3.12 and Fig. 3.13. In the RC channel environment used in Fig. 3.2, the simulated eye diagrams of the receiver side are presented. In this simulation, the number of the FFE taps is limited by two and three. As shown in Fig. 3.14, the block diagram of behavioral modeling for TB-FFE is illustrated. Fig. 3.15 shows the eye diagrams of the receiver side on the RC channel with the FFE. Without equalization, the eye diagram is entirely closed. With the FFE, the output voltage is attenuated at the non-transition bits to boost high-frequency components of the signal. Fig. 3.16 shows the eye diagrams of the receiver side on the RC channel with the TB-FFE. With the TB-FFE, the delay elements adjust the transmitting data appropriately by the two control signals. Using the results of Fig. 3.11, the phase delay difference is added to compensate for the zero-crossing time. By compensating the transmitting timing for the corresponding sequence, the zero-cross



Fig. 3.12 Conceptual diagram of FFE.



Fig. 3.13 Conceptual diagram of TB-FFE.



Fig. 3.14 Block diagram of behavioral modeling for TB-FFE.

ing times are converged to the reference timing spread between the lower and upper boundaries, resulting from the two worst-case sequences (all 1s and all -1s) preceding the previous bits. The horizontal eye-opening is similar in both cases. However, the vertical eye-opening of the TB-FFE is better than the vertical eye-opening of the FFE because the TB-FFE does not attenuate the signal amplitude.

The phase delay analysis can be employed in the real channel environment. In this work, based on the Interference Tolerance Test Channel model provided by the IEEE Ethernet task force [25], the performance of the equalizer is analyzed.



Fig. 3.15 Simulated eye-diagrams of receiver side on RC channel with FFE.



Fig. 3.16 Simulated eye-diagrams of receiver side on RC channel with TB-FFE.

Table 3.2 Summary of eye-opening on RC channel.

| TX EQ. | # of FFE<br>Taps          | Horizontal<br>Eye [UI] | Vertical<br>Eye [mV] |
|--------|---------------------------|------------------------|----------------------|
| FFE    | <b>2(α</b> 1, <b>α</b> 2) | 0.82                   | 235                  |
|        | 3(α1,α2,α3)               | 0.84                   | 272                  |
| TB-FFE | 2(β1,β2)                  | 0.83                   | 286                  |
|        | 3(β1,β2,β3)               | 0.85                   | 301                  |



Fig. 3.17 Real channel characterisitcs under simulation.



Fig. 3.18 Real channel phase delay under simulation.

The channel characteristics and phase delay are shown in Fig. 3.17 and Fig. 3.18. The eye diagrams of the receiver side on the real channel with the FFE and with the TB-FFE are presented in Fig. 3.19 and Fig. 3.20, respectively. Without equalization, the eye diagram is completely closed. The FFE has a slightly better performance of the horizontal eye-opening than the TB-FFE in this case. However, the TB-FFE has better vertical eye-pening than the FFE. Considering that the same number of the FFE taps, the difference in the eye-opening between the FFE and the TB-FFE is not significant. In addition, the difference in the eye-opening is not substantial compared to the number of the FFE taps. We focus on low-power operation with good jitter performance. As mentioned earlier, the encoder complexity proportionally increases as we investigate more previous bits. While reducing the encoder complexity, we can compensate DDJ sufficiently with the TB-FFE using only two previous bits ( $D_{-3}$ ,  $D_{-2}$ ).



Fig. 3.19 Simulated eye-diagrams of receiver side on real channel with FFE.



Fig. 3.20 Simulated eye-diagrams of receiver side on real channel with TB-FFE.

Table 3.3 Summary of eye-opening on real channel.

| TX EQ. | # of FFE<br>Taps          | Horizontal<br>Eye [UI] | Vertical<br>Eye [mV] |
|--------|---------------------------|------------------------|----------------------|
| FFE    | <b>2(α</b> 1, <b>α</b> 2) | 0.75                   | 131                  |
|        | 3(α1,α2,α3)               | 0.78                   | 144                  |
| TB-FFE | 2(β1,β2)                  | 0.70                   | 141                  |
|        | 3(β1,β2,β3)               | 0.74                   | 158                  |

# **3.3 Adaptive Time-Based FFE**

#### 3.3.1 Overview

Since the channel characteristics vary due to different lengths and materials, the adaptive FFE is needed. We can compensate the channel distortion with the optimum coefficient setting like the amplitude and PWM-based adaptive equalization. In this section, we model the adaptive TB-FFE with System Verilog. Firstly, the concept of the adaptive TB-FFE is explained. Secondly, the simulation results are shown by the eye diagram of the receiving end changing with the adaptation time. The first-order system with the RC LPF is used for the channel environment. Finally, the optimum number for TB-FFE taps is concluded.

#### 3.3.2 Behavioral Modeling

We can model the transceiver with the transmitter which transmits the phase-modulated data by the previous data pattern and the receiver which receives the data and measures zero-crossing time. Fig. 3.21 shows the block diagram of the adaptive TB-FFE. In this scheme, the back channel is needed for delivering the information of the zero-crossing time to the transmitter.

Firstly, the data is transmitted to the receiver with the RC LPF channel. Secondly, the receiver detects the rising time with the zero-crossing time. The detection result is sent back to the transmitter. Finally, the delay of the transmitting data is updated to compensate for the RC LPF channel loss. The delay for compensation is based on the pattern of the { $D_{-7}$ ,  $D_{-6}$ ,  $D_{-5}$ ,  $D_{-4}$ ,  $D_{-3}$ ,  $D_{-2}$ ,  $D_{-1}$ }. The delay for compensation is as follows.

$$Delay + = 0.01^{*}(ideal\_time - real\_time)$$
(3.6)

The delay line can be implemented by the digitally controlled delay line (DCDL) and the look-up table (LUT) for real circuit design. The delay coefficients are updated with the information of the zero-crossing time. The delay code of LUT modulates the transmitting data for zero-crossing time compensation as shown in Fig. 3.22.



Fig. 3.21 Block diagram of behavioral modeling for adaptive TB-FFE.



Fig. 3.22 Look-up table for the pattern of {D-4, D-3, D-2, D-1}.

#### **3.3.3 Simulation Results**

Using System Verilog, we can simulate the behavior of the adaptive TB-FFE. By measuring the eye diagram of the channel output, we can compare the eye-opening by the adaptation time. As the adaptation progresses, the eye diagram of the channel output is improved by the time, as shown in from Fig. 3.23 to Fig. 3.30. As the adaptation has proceeded, the rising edge of the data is converged to the rising edge of the Nyquist frequency. The adaptation is settled after 2 us. The horizontal eye-opening improvement after settling of the coefficients is 0.39 UI, and the vertical eye-opening improvement after settling of the coefficients is 178 mV, as shown in Fig. 3.31. Fig. 3.32 shows the settling behavior of the tap coefficients are converged to similar value under 2 ps. Thus, using 3-tap equalizer is efficient for reducing hardware.



Fig. 3.23 Simulated eye-diagram without equalization.



Fig. 3.24 Simulated eye-diagram with equalization after 50ns.



Fig. 3.25 Simulated eye-diagram with equalization after 100ns.



Fig. 3.26 Simulated eye-diagram with equalization after 200ns.



Fig. 3.27 Simulated eye-diagram with equalization after 400ns.



Fig. 3.28 Simulated eye-diagram with equalization after 700ns.



Fig. 3.29 Simulated eye-diagram with equalization after 1.2us.



Fig. 3.30 Simulated eye-diagram with equalization after 2us.



Fig. 3.31 Comparison result of the eye-diagrams.



Fig. 3.32 Settling behavior of the tap coefficients with 7-taps of TB-FFE.



Fig. 3.33 Settling behavior of the tap coefficients with 3-taps of TB-FFE.

# **3.4 Transmitter Implementation**

#### 3.4.1 Overview

Fig. 3.34 shows the overall architecture of the proposed transmitter, which includes a built-in parallel 8-bit PRBS-7 generator, a phase control signal encoder, a phase modulator, a serializer, a single-to-differential converter, an active-feedback output driver, and a clock path. The 8-bit parallel data generated by the internal PRBS-7 generator are 8-to-4 serialized, and then 4-to-1 serialized using the quarter-rate phasemodulated clock to make a phase-modulated data. To overcome the limited bandwidth of the equalization, the control signal generator is implemented at the low-bandwidth path. A single-to-differential converter is implemented at the pre-driver node, not only to reduce the common-mode noise which is induced by the simultaneous switching noise (SSN), but also to improve the signal swing. The output driver is the activefeedback architecture [26] which dramatically enhances the output swing without distortion of the output impedance due to varying feedback resistance. It does not need segmentation to keep a constant output impedance of 50- $\Omega$  and no switching loss is incurred for FFE computation as shown in Fig. 3.35.



Fig. 3.34 Overall architecture of proposed transmitter.



Fig. 3.35 Output impedance of the active-feedback driver.

### **3.4.2 Phase Modulation**

The details of the implemented phase control signal encoder are shown in Fig. 3.36. Using the parallel data from the PRBS-7 generator, the two control signals are generated. The 8-bit input parallel data are retimed to examine the two previous bits.



Fig. 3.36 Phase control signal encoder.

For example, if D0[n-1] and D1[n-1] are equal, Ctrl1[2] signal goes high, which indicates the request for the earlier-phase clock modulation.

In the same way, the Ctrl2 signal goes high if the current bit and the two previous bits are equal. The two control signals are serialized at the 8-to-4 clocked MUX and then go to the phase modulator.



Fig. 3.37 Timing diagram of phase modulation.

Fig. 3.37 shows the timing diagram of the phase modulation. If the current and the two previous bits are equal, the current transition timing is pulled forward by C1+C2. In other words, the current transition timing is pulled earlier if the 3-consecutive identical bits are detected.

The optimum delays for the coefficients of C1 and C2 are determined by the phase delay analysis of the channel. The values of C1 and C2 are provided manually in this implementation with the 3-bit binary code.



Fig. 3.38 Phase modulator and its timing diagram.



Fig. 3.39 Delay coefficient of phase modulator across process variation.

As shown in Fig. 3.38, the phase modulator is implemented as the current-starved inverter where only NMOS transistors are used to adjust the current transition timing of the rising edge is pulled early.

As the phase modulation is performed before the 4-to-1 clocked MUX, the equalization overhead is significantly reduced at the output driver node.

To assess the sensitivity of the phase modulator to the process variation, the delay coefficient of the phase modulator is illustrated in Fig. 3.39. The range of the delay coefficient is 10.6 ps at TT corner, 7.6 ps at FF corner, and 13.9 ps at SS corner,



Fig. 3.40 Difference from the prior work.

respectively. As the target data rate is about 20 Gb/s, 1 UI is about 50 ps. It is appropriate to set the delay coefficient at about 0.2 UI as presented earlier.

Fig. 3.40 shows the difference from the prior work. In the prior work, there is the bandwidth limitation of the encoder because encoding is performed at the driver node. However, in this work, the bandwidth limitation of the encoder is significantly reduced because the encoding is performed at the low-bandwidth clock path.

### 3.4.3 Serializer and Clock Path

The serializer consists of the two stages. The 8-to-4 clocked MUX is based on the flip-flop-based structure using the quarter-rate clock as shown in Fig. 3.41. The 4-to-1 clocked MUX, and its timing diagram are shown in Fig. 3.42. A NAND-based quarter-rate pulse generator (PG) proposed in [27] is employed to combine the data before the final stage.

This structure makes the driver simple, and the phase modulation is performed at the quarter rate. In addition, the timing constraint is significantly reduced due to the operation at the low frequency.

In this structure, 1 UI pulse of the phase-modulated data is generated using the two phase-modulated quarter rate clock. As the data from 8-to-4 clocked MUX are retimed by the quarter-rate clock before PG, the sampling margin is significantly enhanced.



Fig. 3.41 8-to-4 clocked MUX.



Fig. 3.42 4-to-1 clocked MUX and its timing diagram.

To reduce glitches at PG, the delay of the inverter and the transmission gate are matched. DJ is minimized by trimming the delay of the data path. Fig. 3.43 shows the post-layout simulated eye diagrams of S2D output and driver output at 20 Gb/s. The output bandwidth is enhanced by sufficient buffering with minimized capacitor load.



Fig. 3.43 Post-layout simulated eye diagrams of S2D output and driver output at 20

Gb/s.



Fig. 3.44 Circuit implementation of clock path.

The circuit implementation of the clock path using the differential clock is shown in Fig. 3.44. The differential clock passes the duty-cycle corrector (DCC) to adjust the duty cycle. It consists of ac-coupled capacitors and resistive feedback inverters to enhance the bandwidth of the clock path. I/Q generator generates the quadrature clock for clocking. The phase corrector corrects the errors of the quadrature clock. The delays of the phase corrector are adjusted by a DCDL which is implemented by inverters and 3-bit MOS capacitors. The DCDL code is selected manually through an I2C interface.

# Chapter 4

# Measurement

### **4.1 Overview**

The prototype chip has been fabricated in a 28-nm CMOS process and operates from a 1.0 V supply. The total active area of the proposed transmitter is 0.045 mm<sup>2</sup>, as shown in Fig. 4.1 with the chip photomicrograph. Fig. 4.2 shows the measurement setup. The differential input reference clock is generated from Agilent N4903A J-BERT, and the clock is forwarded to DUT. An MSO71604C oscilloscope is used for measurement. The chip is directly attached to a 50- $\Omega$  impedance-matched PCB with wire-bonding to pads.

To verify the performance of the TB-FFE in the real channel environment, the FR4 trace channel is inserted with a significant loss. The measured loss profile and the

measured phase delay profile are plotted in Fig. 4.3 and Fig. 4.4, respectively. The measured loss with the FR4 trace, an SMA connector, and an SMA cable at the Nyquist frequency of 10 GHz and 11 GHz is 13.1 dB and 15.0 dB, respectively.



Fig. 4.1 Chip photomicrograph of the implemented transmitter.



Fig. 4.2 Measurement setup.

The phase delay can be obtained by measuring the channel's phase and then dividing by the angular frequency. The phase delay difference with respect to the longest delay at the Nyquist frequency of 10 GHz and 11 GHz is used to compensate for DDJ. Table 4.1 summarizes the phase delay at 20 Gb/s and 22 Gb/s.



Fig. 4.3 Measured loss profile of FR4 trace, SMA connector and SMA cable.



Fig. 4.4 Measured phase delay profile of channel.

| Data Rate    | 20 Gb/s | 22 Gb/s |
|--------------|---------|---------|
| Nyq. freq.   | 726 ps  | 726 ps  |
| Nyq. freq./2 | 732 ps  | 731 ps  |
| Nyq. freq./3 | 737 ps  | 735 ps  |

Table 4.1 Summary of phase delay at 20 Gb/s and 22 Gb/s.

### 4.2 Eye Diagram

Fig. 4.5 shows the measured eye diagram of the transmitter output at 20 Gb/s without the FR4 trace channel. As shown in Fig. 4.6, the effect of the TB-FFE is indicated clearly in the eye diagram. By modulating the phase of the transmitting clock, the transmitter output is deviated by three patterns dependent on the run-length of the transmitted data. Fig. 4.7 and Fig. 4.8 show the measured eye diagrams of the receiver side on the inserted channel at 20 Gb/s, respectively. By compensating for the measured phase delay difference in each case with the calculated optimum coefficients, DDJ is significantly improved. Fig. 4.9 and Fig. 4.10 show the measured eye diagrams of the receiver side on the inserted channel at 22 Gb/s, respectively. Consequently, the horizontal eye-opening is improved to 0.66 UI and 0.53 UI at 20 Gb/s and 22 Gb/s, respectively. The signal amplitude is not attenuated after the equalization compared with the FFE.

Table 4.2 summarizes the improvement of the peak-to-peak jitter with the TB-FFE. Peak-to-peak jitter is approximately reduced by the phase delay difference with respect to the longest delay at the Nyquist frequency. The zero-crossing time variation at the Nyquist frequency shows up as the residue jitter due to the phase difference between the lower delay of  $(D_{.3}, D_{.2}, D_{.1}) = (-1, 1, -1)$  and the upper delay of  $(D_{.3}, D_{.2}, D_{.1}) = (1, 1, -1)$ . The random jitter arising from the noise sources in power and ground is another cause of the residue jitter.



#### Fig. 4.5 Measured eye-diagram of transmitter output at 20 Gb/s without modula-



Fig. 4.6 Measured eye-diagram of transmitter output at 20 Gb/s with modulation.

tion.



Fig. 4.7 Measured eye-diagram of receiver side on inserted channel at 20 Gb/s with-

#### out TB-FFE.



Fig. 4.8 Measured eye-diagram of receiver side on inserted channel at 20 Gb/s with

#### 2-tap TB-FFE.



Fig. 4.9 Measured eye-diagram of receiver side on inserted channel at 22 Gb/s with-

#### out TB-FFE.



Fig. 4.10 Measured eye-diagram of receiver side on inserted channel at 22 Gb/s with

#### 2-tap TB-FFE.

 Table 4.2 Summary of peak-to-peak jitter improvement.

| Data rate<br>Frequency<br>(D-3,D-2,D-1) | 20 Gb/s | 22 Gb/s |
|-----------------------------------------|---------|---------|
| (1,1,-1),(-1,1,-1)                      | -       | -       |
| (1,-1,-1)                               | 6 ps    | 7 ps    |
| (-1,-1,-1)                              | 5 ps    | 5 ps    |
| Total difference                        | 11 ps   | 12 ps   |
| Compensation                            | 11 ps   | 14 ps   |
| <b>Reduction Ratio</b>                  | 42.3 %  | 41.1 %  |

### **4.3 Power Consumption**

Fig. 4.11 shows the power breakdown of the entire transmitter at 22 Gb/s for the optimum setting in Fig. 4.10. For an output swing of 440 mV<sub>ppd</sub>, the transmitter consumes a total power of 20.9 mW at 22 Gb/s while the corresponding energy efficiency is 0.95 pJ/b. Power consumption of only the driver and the pre-driver is 8.5 mW, and the energy efficiency is 0.39 pJ/b.

Table 4.3 compares this work with the other recently published equalizing transmitters. The proposed transmitter achieves the best energy efficiency of 0.95 pJ/b among the reported voltage-mode transmitters with the channel loss equalization of above 10 dB.

With the figure-of-merit (FoM2) considering the amount of the equalization [28] given as

$$FoM 2 = (pJ/b)/(dB)$$

$$(4.1)$$

The proposed transmitter shows the best number as well. Fig. 4.12 illustrates the comparison of energy efficiency with the recently reported voltage-mode transmitters across the data rate.



Fig. 4.11 Measured power breakdown of entire transmitter at 22 Gb/s.



Fig. 4.12 Comparison of energy efficiency with recently reported voltage-mode transmitters across the data rate.

|                       |                       |                       |                       |                          | Gb/s<br>ò dB loss     | *** Measured at 16 Gb/s<br>**** Measured at 16 dB loss |                       | * Total transmitter measurement<br>** Measured at 28 Gb/s |
|-----------------------|-----------------------|-----------------------|-----------------------|--------------------------|-----------------------|--------------------------------------------------------|-----------------------|-----------------------------------------------------------|
| 0.045                 | 0.13                  | 0.078                 |                       | 0.014                    | 0.06                  | 0.0279                                                 | 0.210                 | Area [mm <sup>2</sup> ]                                   |
| 0.063                 | 0.188****             | 0.192***              | 0.128*                | 0.088                    | 0.112                 | 0.535**                                                | 0.179*                | FoM2 [pJ/b/dB]                                            |
| 0.95                  | 3.0****               | 4.6***                | 1.53*                 | 1.8                      | 1.88                  | 6.95**                                                 | 3.58*                 | FoM1 [pJ/b]                                               |
| 20.9                  | 15.1****              | 73.6***               | 30.5*                 | 23.04                    | 37.7                  | 195**                                                  | 57.3*                 | Total Power [mW]                                          |
| 15                    | 16****                | 24                    | 12                    | 20.4                     | 16.8                  | 13                                                     | 20                    | Channel Loss [dB]                                         |
| Time                  | Time                  | Amplitude             | Amplitude             | Amplitude                | Amplitude             | Amplitude                                              | Time                  | Equalization Domain                                       |
| 440 mV <sub>ppd</sub> | 660 mV <sub>ppd</sub> | 515 mV <sub>ppd</sub> | 250 mV <sub>ppd</sub> | 400~600 mV <sub>PP</sub> | 400 mV <sub>ppd</sub> | 1.12 V <sub>ppd</sub>                                  | 250 mV <sub>ppd</sub> | Output Swing                                              |
| AFB                   | VM                    | SST                   | RFB                   | OVM                      | SST                   | SST                                                    | SST                   | Driver Topology                                           |
| 22                    | 5                     | 4~32                  | 20                    | 0.2~12.8                 | 20                    | 16~40                                                  | 16                    | Data Rate [Gb/s]                                          |
| 28                    | 90                    | 22                    | 65                    | 55                       | 65                    | 14                                                     | 65                    | Process [nm]                                              |
| This Work             | [33]                  | [32]                  | [31]                  | [30]                     | [29]                  | [27]                                                   | [18]                  |                                                           |
|                       |                       |                       |                       |                          |                       |                                                        |                       |                                                           |

| _             |
|---------------|
| Ta            |
| Ы             |
| e 4           |
| ble 4.3 Perfe |
| Pe            |
| Ĭ             |
| 0 r           |
| na            |
| ormanc        |
| es            |
| Ï             |
| I             |
| ar            |
| Š             |
| y and         |
| 10            |
| com           |
| 3dt           |
| pari          |
| SOI           |
| N             |
| witl          |
| with other    |
| Ħ             |
| er            |
| VO            |
| olta          |
| ge-I          |
| Ë             |
| mode t        |
| ēt            |
| ra            |
| ISU           |
| smit          |
| fe            |
| rs.           |
|               |

# Chapter 5

## Conclusion

In this thesis, a design of high speed, a power-efficient transmitter is described. The proposed transmitter presents a good energy efficiency under significant channel loss with simple and low complexity architecture.

The phase delay analysis is used to determine the optimum number of the previous bits to examine and the optimum coefficients. The phase modulation is performed at the low-bandwidth path. Thereby the bandwidth limitation for the equalization is significantly relaxed with the reduced power consumption and the low driver complexity. The adaptive TB-FFE is modeled and behaviorally simulated under the RC LPF channel.

The transmitter is implemented for quarter-rate clocking to alleviate the clocking timing issue and to operate at a high speed. Phase modulation at the clock path by control signal is highly reconfigurable because of not modulating the voltage level.

A prototype chip fabricated in 28-nm CMOS technology occupies 0.045 mm<sup>2</sup> and

consumes 20.9 mW at 22 Gb/s.

The proposed transmitter achieves an energy efficiency of 0.95 pJ/b over the channel with a 15.0 dB loss at the Nyquist frequency. With 2-tap TB-FFE, DDJ is reduced by 41% at 22 Gb/s. The compensation amount is quite similar to the phase delay difference of the measured phase delay.

### Bibliography

- [1] Cisco. Cisco Visual Networking Index: Forecast and Methodology, 2017–2022. Accessed: May. 30, 2021. [Online]. Available: https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visualnetworking-in-dex-vni/white-paper-c11-741490.html#\_Toc532256803 "Global Cloud Index 2016–2021", Available: https://www.cisco.com/c/en/us/solutions/service-provider/global-cloud-index-gci
- [2] T. Anand. Wireline Link Performance Survey. Accessed: May. 30, 2021.[Online]. Available: https://web.engr.oregonstate.edu/~anandt/linksurvey
- [3] H. Hatamkhani, et al., "A 10-mW 3.6-Gbps I/O transmitter," *IEEE Symposium* on VLSI Circuits, Mar. 2004, pp. 97–98.
- [4] M. Chen, et al., "Low-voltage low-power LVDS drivers," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 2, pp. 472–479, Feb. 2005.
- [5] Y.-H. Song and S. Palermo, "A 6-Gbit/s hybrid voltage-mode transmitter with current-mode equalization in 90-nm CMOS," *IEEE Transanctions on Circuits and Systems II: Express Briefs*, vol. 59, no. 8, pp. 491–495, Aug. 2012.
- [6] S. Kim, et al., "A 5.2-Gb/s low-swing voltage-mode transmitter with an AC-/DC-Coupled equalizer and a voltage offset generator," *IEEE Transactions on*

Circuits and Systems I: Regular Papers, vol. 61, no. 1, pp. 213–225, Jan. 2014.

- [7] W.-J. Su, et al., "A 5 Gb/s Voltage-Mode Transmitter Using Adaptive Time-Based De-Emphasis," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 4, pp. 959–968, Apr. 2017.
- [8] K.-L.-J. Wong, et al., "A 27-mW 3.6-Gb/s I/O transceiver," IEEE Journal of Solid-State Circuits, vol. 39, no. 4, pp. 602–612, Apr. 2004.
- [9] B. Kim et al., "A 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3526-3538, Dec. 2009.
- [10] M. Choi et al., "An FFE transmitter which automatically and adaptively relaxes impedance matching," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 6, pp. 1780-1792, Jun. 2018.
- B. Analui et al., "Data-dependent jitter in serial communications," *IEEE Transactions on Microwave Theory and Techniques*, vol. 53, no. 11, pp. 3388–3397, Nov. 2005.
- [12] J. Buckwalter and A. Hajimiri, "A 10 Gb/s data-dependent jitter equalizer," in *Proc. IEEE Custom Integrated Circuits Conference*, Oct. 2004, pp. 39–42.
- [13] J. Buckwalter, B. Analui and A. Hajimiri, "Predicting data-dependent jitter," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 51, no. 9,

pp. 453-457, Sep. 2004.

- [14] S.-Y. Kao and S.-I. Liu, "A 1.62/2.7-Gb/s Adaptive Transmitter With Two-Tap Preemphasis Using a Propagation-Time Detector," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 57, no. 3, pp. 178–182, Mar. 2010.
- [15] C. Menolfi et al., "A 28Gb/s source-series terminated TX in 32nm CMOS SOI," in *ISSCC Digest of Technical Papers*, pp. 334-335, 2012.
- K. L. Chan et al., "A 32.75-Gb/s voltage-mode transmitter with three-tap FFE in 16-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 10, pp. 2663-2678, Oct. 2017.
- [17] P.-J. Peng et al., "A 50-Gb/s quarter-rate voltage-mode transmitter with three-rap FFE in 40-nm CMOS," in *Proc. IEEE European Solid-State Circuits Conference*, Sep. 2018, pp. 174-177.
- [18] A. Ramachandran et al., "A 16Gb/s 3.6pJ/bit wireline transceiver with phase domain equalization scheme: Integrated Pulse Width Modulation (iPWM) in 65nm CMOS," in *ISSCC Digest of Technical Papers*, pp. 488-489, 2017.
- [19] J. Buckwalter and A. Hajimiri, "Analysis and equalization of data-dependent jitter," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 3, pp. 607–620, Mar. 2006.
- [20] W. Bae et al., "A 7.6 mW, 414 fs RMS-jitter 10 GHz phase-locked loop for

a 40 Gb/s serial link transmitter based on a two-stage ring oscillator in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 10, pp. 2357–2367, Oct. 2016.

- [21] S. Galal and B. Razavi, "Broadband ESD protection circuits in CMOS technology," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 12, pp. 2334–2340, Dec. 2003.
- [22] W. Bae et al., "A 0.36 pJ/bit, 0.025 mm<sup>2</sup>, 12.5 Gb/s forwarded-clock receiver with a stuck-free delay-locked loop and a half-bit delay line in 65-nm CMOS technology," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 63, no. 9, pp. 1393–1403, Sep. 2016.
- [23] J. Kim, J.-K. Kim, B.-J. Lee and D.-K. Jeong, "Design optimization of onchip inductive peaking structures for 0.13-um CMOS 40-Gb/s transmitter circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 56, no. 12, pp. 2544–2555, Dec. 2009.
- [24] A. B. Carlson, P. B. Crilly, and J. C. Rutledge, *Communication systems*. New York, NY, USA: McGraw-Hill, 2002.
- [25] IEEE P802.3an (10GBASE-T) Task Force. [Online]. Available: <u>http://www.ieee802.org/3/an</u> (accessed on Apr. 14, 2019).
- [26] H. Ju et al., "A 64Gb/s 1.5pJ/bit PAM-4 transmitter with 3-tap FFE and Gmregulated active-feedback driver in 28nm CMOS," *IEEE Symposium on VLSI*

Circuits, Jun. 2018, pp. 51-52.

- [27] J. Kim et al., "A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS," in *ISSCC Digest of Technical Papers*, pp. 60-61, 2015.
- [28] J. Lee et al., "A 0.1pJ/b/dB 1.62-to-10.8Gb/s video interface receiver with fully adaptive equalization using un-even data level," *IEEE Symposium on VLSI Circuits*, Jun. 2019, pp. 198–199.
- [29] H. Yang et al., "A low-power dual-mode 20-Gb/s NRZ and 28-Gb/s PAM-4 voltage-mode transmitter," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 66, no. 3, pp. 372–376, Mar. 2019.
- [30] J.-H. Chae et al., "A 12.8-Gb/s quarter-rate transmitter using a 4:1 overlapped multiplexing driver combined with an adaptive clock phase aligner," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 66, no. 3, pp. 372–376, Mar. 2019.
- [31] G.-S. Jeong et al., "A 20 Gb/s 0.4 pJ/b energy-efficient transmitter driver architecture utilizing constant Gm," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 10, pp. 2312–2327, Oct. 2016.
- [32] T. Musah *et al.*, "A 4-32 Gb/s bidirectional link with 3-tap FFE/6-tap DFE and collaborative CDR in 22nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 12, pp. 3079–3090, Dec. 2014.

[33] S. Saxena et al., "A 5 Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.

# 초 록

본 논문은 고속, 저전력으로 동작하는 유선 송신기의 설계에 대해 설 명하고 있다. 분리되지 않은 출력 드라이버가 있는 에너지 효율적인 전압 모드 송신기는 위상 지연 분석을 기반으로 시간 영역에서 채널 손실을 보 상한다. 직렬화된 데이터 스트림이 아닌 송신 클럭의 위상을 변조함으로 써 제안된 송신기는 데이터 의존적 지터를 크게 줄인다. 수평 아이 오프 닝은 전송된 데이터의 실행 길이에 따라 제로 크로싱 시간 변동을 보상함 으로써 개선된다. 제안된 방식은 큰 신호 및 스위칭 전력을 소비하는 많 은 드라이버 슬라이스를 제거함으로써 드라이버 복잡성을 크게 줄인다.

프로토타입 칩은 28 nm CMOS 공정으로 제작되었으며 0.045 mm<sup>2</sup> 의 실 제 면적을 차지한다. 측정된 결과는 제안된 송신기가 1.0 V 공급에서 440 mVppd 의 출력 스윙으로 22 Gb/s 의 속도에서 0.95 pJ/b 의 에너지 효율을 달성함을 보여준다. 또한 피크 대 피크 지터는 15.0 dB 손실의 채널에 대 해 제안된 위상 지연 보상을 통해 22 Gb/s 의 속도에서 34 ps 에서 20 ps 로 감소된다.

주요어 : 전압 모드 송신기, 시간 기반 피드 포워드 이퀄라이저, 위상 지연, 제로 크로싱 시간, 데이터 종속 지터, 1/4 속도 클럭킹, 포워드 클럭 킹, NRZ

학 번 : 2016-20861