#### 저작자표시-비영리-변경금지 2.0 대한민국 #### 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게 • 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. #### 다음과 같은 조건을 따라야 합니다: 저작자표시. 귀하는 원저작자를 표시하여야 합니다. 비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다. 변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다. - 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. - 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다. 저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다. #### Ph.D. Dissertation # Design of High-Speed Multi-Level Transmitter with TomlinsonHarashima Precoding Tomlinson-Harashima Precoding 을 활용한 고속 멀티 레벨 송신기의 설계 by **Byungjun Kang** February, 2023 Department of Electrical and Computer Engineering College of Engineering Seoul National University # Design of High-Speed Multi-Level Transmitter with TomlinsonHarashima Precoding 지도 교수 정 덕 균 이 논문을 공학박사 학위논문으로 제출함 2023 년 2 월 서울대학교 대학원 전기·정보공학부 강 병 준 강병준의 박사 학위논문을 인준함 2023 년 2 월 | 위 원 | <sup>ᆁ</sup> 장 | 김 재 하 | (인) | |-----|----------------|-------|-----| | 부위 | 원장 | 정 덕 균 | (인) | | 위 | 원 | 최 우 석 | (인) | | 위 | 원 | 이 우 근 | (인) | | 위 | 원 | 문 용 삼 | (인) | ## Design of High-Speed Multi-Level Transmitter with TomlinsonHarashima Precoding by ### Byungjun Kang A Dissertation Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at #### SEOUL NATIONAL UNIVERSITY February, 2023 #### Committee in Charge: Professor Jaeha Kim, Chairman Professor Deog-Kyoon Jeong, Vice-Chairman Professor Woo-Seok Choi Professor Woogeun Rhee Professor Yongsam Moon ABSTRACT ### **Abstract** These growths of the hyperscale data center and the data traffic inevitably require an increase in transmission speed and bandwidth. Accordingly, the data rate per lane of various I/O standards increased rapidly over time. Also, multi-level signaling, such as pulse amplitude modulation (PAM), especially PAM-4, is widely adopted in many standards. In the case of multi-level signaling, a degradation in signal-to-noise ratio (SNR) is inevitable compared to NRZ signaling. In line with these trends, the channel loss also has increased as the year passes. In addition, the pre-cursor can increase as the portion of the rise/fall time increases, and it is necessary to remove it. In this regard, Tomlinson-Harashima precoding (THP), which can achieve SNR improvement, is introduced, and several variations to remove a pre-cursor using it are presented. High-speed multi-level transmitter (TX) introducing the feed-forward Tomlinson-Harashima precoding (FF-THP) are presented. The proposed FF-THP takes both advantages of the modulo-based equalization and the controllability over a pre-cursor. Moreover, the quantitative z-domain analysis on channel response and the equalization parts of the THP, the FFE, and the FF-THP is conducted. A simple 1-pole channel with a pre-cursor is employed to demonstrate the repercussions of a pre-cursor and the effectiveness of the FF-THP. From the analysis, the FF-THP offers the largest vertical eye margin (VEM) among the TX equalization methods when the channel has a pre-cursor or large ISI. The two high-speed multi-level TX adopting FF-THP were fabricated in 28 nm CMOS technology. The first chip is a 10 Gb/s PAM-4 TX with FF-THP. A modulo ABSTRACT II prediction engine (MPE) and FFE are designed in a 4-parallel structure, which is matched to a 4-phase clock generated by PLL. The FFE tap coefficients are opti- mized to compensate for the 21-dB loss channel appropriately. The proposed FF- THP presents a wider horizontal eye margin and larger VEM than the FFE. The TX achieves 10 Gb/s PAM-4 with a power efficiency of 6.0 pJ/b and 4.05 pJ/b/ISI while compensating for 21-dB loss and occupying the active area of 0.0746 mm<sup>2</sup>. The second chip presents a 42 Gb/s PAM-8 FF-THP TX. The MPE and FFE in the synthesized digital block are designed and optimized to achieve a 16-parallel structure and high-speed operation while compensating for 7.7-dB channel loss. 16- phase clock is generated by RDAC-based digitally controlled delay line, and 1-UI pulse generator based 16-to-1 serializers are used to offer 14 Gbaud data. Source- series-termination-based 6-bit DAC driver offers 50 $\Omega$ matching with reasonable DNL and INL. These efforts have advanced the highest 3-bit/Baud TX data rate of 42 Gb/s and achieved power efficiency of 1.58 pJ/b, which is comparable to state- of-the-art TXs, with the active area of 0.0703 mm<sup>2</sup>. The effectiveness of FF-THP is verified in mathematics, simulation, and meas- urement result. Moreover, the digital-based equalization technique can take full ad- vantage of process scaling. Keywords: multi-level transmitter, feed-forward equalizer(FFE), Tomlinson- Harashima precoding (THP), feed-forward Tomlinson-Harashima precoding (FF- THP), DAC driver **Student Number**: 2017-27459 CONTENTS ## **Contents** | ABSTRACT | | I | |----------------|------------------------------------------------------|--------| | CONTENTS | | III | | LIST OF FIG | GURES | VII | | LIST OF TAI | BLES | XII | | CHAPTER 1 | INTRODUCTION | 1 | | 1.1 <b>M</b> O | TIVATION | 1 | | 1.2 THE | SIS ORGANIZATION | 7 | | CHAPTER 2 | BACKGROUND OF CHANNEL MODEL AND FFE TAI | • | | COEFFICIE | NT OPTIMIZATION FOR HIGH-SPEED INTERFACE | 8 | | 2.1 OVE | ERVIEW | 8 | | 2.2 Moi | DELING 1-POLE CHANNEL HAVING A PRE-CURSOR AND SINGLE | E-BIT | | RESPONSE | AND IMPORTANCE OF A PRE-CURSOR CONTROLLABILITY | 10 | | 2.2.1 | 1-POLE CHANNEL AND SINGLE-BIT RESPONSE | 10 | | 2.2.2 | STEP SIGNAL BASED 1-POLE CHANNEL HAVING A PRE-CURSO | OR AND | | SINGLE-BIT | RESPONSE | 13 | | 2.2.3 | RAMP SIGNAL BASED 1-POLE CHANNEL HAVING A PRE-CURS | SOR | | AND SINGLE | E-BIT RESPONSE | 16 | | 2.2.4 | IMPORTANCE OF A PRE-CURSOR CONTROLLABILITY | 21 | | 2 3 FFF | TAD COEFFICIENT ODTIMIZATION FOR 1-POLE CHANNEL HAV | ING A | | PRE-CURSO | OR | 22 | |------------|---------------------------------------------------|-----------| | 2.3.1 | 1-TAP FFE COEFFICIENT OPTIMIZATION FOR 1-POLE CHA | ANNEL22 | | 2.3.2 | FFE TAP COEFFICIENT OPTIMIZATION FOR 1-POLE CHAN | NEL | | HAVING A I | Pre-cursor | 24 | | CHAPTER 3 | S TOMLINSON-HARASHIMA PRECODING AND | | | VARIATION | NS | 30 | | 3.1 Ton | MLINSON-HARASHIMA PRECODING | 30 | | 3.2 PRE | E-CURSOR CONTROL USING THP | 36 | | 3.2.1 Pi | RE-CURSOR THP | 36 | | 3.2.2 T | HP-FFE | 39 | | 3.2.3 FI | FE-THP | 42 | | 3.3 SIM | IULATION RESULTS OF CONVENTIONAL THP, PRE-CURSOR | ГНР, ТНР- | | FFE, AND | FFE-THP | 44 | | 3.3.1 C | ONVENTIONAL AND PRE-CURSOR THP | 45 | | 3.3.2 T | HP-FFE AND FFE-THP | 47 | | CHAPTER 4 | FEED-FORWARD TOMLINSON-HARASHIMA PRE | ECODING | | | | 50 | | 4.1 DES | SIGN PROCESS OF FF-THP | 50 | | 4.2 EFF | ECTIVENESS OF FF-THP | 53 | | 4.2.1 M | ATHEMATICS IN Z-DOMAIN RESPONSE | 53 | | 4.2.2 S | YSTEMVERILOG SIMULATION | 63 | | CHAPTER 5 | 5 10 GB/S PAM-4 TRANSMITTER WITH FF-THP IN 2 | 28 NM | *CONTENTS* V | CMOS | 69 | |-------------------------------------------------|----------| | 5.1 Transmitter Implementation | 69 | | 5.1.1 OVERALL ARCHITECTURE | 69 | | 5.1.2 MODULO PREDICTION ENGINE | 72 | | 5.1.3 FEED-FORWARD EQUALIZER | 75 | | 5.1.4 OTHER BLOCKS | 77 | | 5.2 Measurement Results | 79 | | 5.2.1 MEASUREMENT SETUP AND TRANSMITTER OUTPUT | 79 | | 5.2.2 CHANNEL RESPONSE AND EQUALIZATION RESULTS | 82 | | 5.2.3 CHIP PHOTOGRAPH AND PERFORMANCE SUMMARY | 86 | | CHAPTER 6 42 GB/S PAM-8 TRANSMITTER WITH FF-THP | IN 28 NM | | CMOS | 89 | | 6.1 Transmitter Implementation | 89 | | 6.1.1 OVERALL ARCHITECTURE | 89 | | 6.1.2 MODULO PREDICTION ENGINE | 92 | | 6.1.3 OTHER BLOCKS | 96 | | 6.2 Measurement Results | 98 | | 6.2.1 MEASUREMENT SETUP AND TRANSMITTER OUTPUT | 98 | | 6.2.2 CHANNEL RESPONSE AND EQUALIZATION RESULTS | 101 | | 6.2.3 CHIP PHOTOGRAPH AND PERFORMANCE SUMMARY | 105 | | CHAPTER 7 CONCLUSIONS | 108 | | BIBLIOGRAPHY | 110 | | | | CONTENTS 초록 119 LIST OF FIGURES VII ## **List of Figures** | Fig. 1.1 Growth of global hyperscale data center [4] | 2 | |---------------------------------------------------------------------------|---| | Fig. 1.2 Growth of Cloud data center traffic [4] | 2 | | FIG. 1.3 PER-LANE DATA-RATE VS. YEAR FOR A VARIETY OF COMMON I/O | | | STANDARDS [5] | 3 | | FIG. 1.4 TRENDS OF CHANNEL LOSS VS. YEAR (TOP) AND VS. DATA RATE (BOTTOM) | | | [6] – [31] | 4 | | Fig. 2.1 Model of skin effect in RL ladder (top) and RC ladder (bottom) | | | [47] | 9 | | Fig. 2.2 1-pole channel and normalized single-bit response | 0 | | FIG. 2.3 STEP SIGNAL BASED 1-POLE CHANNEL HAVING A PRE-CURSOR AND SINGLE- | - | | BIT RESPONSE1 | 3 | | FIG. 2.4 RAMP SIGNAL BASED 1-POLE CHANNEL HAVING A PRE-CURSOR AND | | | SINGLE-BIT RESPONSE | 6 | | Fig. 2.5 Structure of feed-forward equalizer | 2 | | FIG. 3.1 STRUCTURE OF TOMLINSON [42] | 0 | | FIG. 3.2 STRUCTURE OF HARASHIMA [43] | 0 | | FIG. 3.3 BLOCK DIAGRAM OF TOMLINSON-HARASHIMA PRECODING AND SIGNALING | j | | EXAMPLE [35] | 1 | | Fig. 3.4 Structure of Tomlinson-Harashima precoding transmitter 3. | 3 | | FIG. 3.5 PROBABILITY DENSITY FUNCTION OF TX OUTPUT OF FFE AND THP3. | 5 | | FIG. 3.6 STRUCTURES OF CONVENTIONAL THP AND PRE-CURSOR THP WITH TAP | | | COEFFICIENTS | |---------------------------------------------------------------------------| | FIG. 3.7 EQUIVALENT REPRESENTATIONS OF THP-FFE | | FIG. 3.8 EQUIVALENT REPRESENTATIONS OF FFE-THP | | FIG. 3.9 CHARACTERISTICS OF SIMULATED CHANNEL SINGLE BIT RESPONSE AND | | INSERTION LOSS OF CHANNEL | | Fig. 3.10 Eye diagrams at channel output (a) conventional THP and (b) | | PRE-CURSOR THP | | FIG. 3.11 EYE DIAGRAMS AT RX-MOD OUTPUT (A) CONVENTIONAL THP AND (B) | | PRE-CURSOR THP | | Fig. 3.12 Eye diagrams of THP-FFE (a) at channel output and (b) at RX- | | MOD OUTPUT | | Fig. 3.13 Eye diagrams of FFE-THP (a) at channel output and (b) at RX- | | MOD OUTPUT | | FIG. 4.1 CONVERSION STEPS FROM THP TO THE PROPOSED FF-THP. (1) | | CONVENTIONAL THP (2) INTERPRETATION OF MODULO OPERATION (3) MODULO | | PREDICTION (4) PROPOSED FF-THP | | FIG. 4.2 NORMALIZED SINGLE-BIT RESPONSE OF A 1-POLE CHANNEL HAVING A PRE- | | CURSOR | | FIG. 4.3 3-D GRAPHS OF VEMS OF THP, FFE, FF-THP IN PAM-4 SIGNALING 58 | | Fig. 4.4 3-D graphs of VEMs of THP, FFE, FF-THP in PAM-8 signaling 59 | | FIG. 4.5 CROSS-SECTIONAL DIAGRAM OF VEMS OF THP, FFE, FF-THP IN PAM-4 | | SIGNALING | | FIG. 4.6 CROSS-SECTIONAL DIAGRAM OF VEMS OF THP, FFE, FF-THP IN PAM-8 | | SIGNALING60 | | FIG. 4.7 RATIO OF $VEM_{FF-THP}/VEM_{THP}$ AND $VEM_{FF-THP}/VEM_{FFE}$ IN PAM-4 | |--------------------------------------------------------------------------------------------------------------| | SIGNALING 61 | | Fig. 4.8 Ratio of $VEM_{\text{FF-THP}}/VEM_{\text{THP}}$ and $VEM_{\text{FF-THP}}/VEM_{\text{FFE}}$ in PAM-8 | | SIGNALING61 | | Fig. 4.9 Single-bit response of the 1-pole channel having a pre-cursor ( $H_{-1}$ | | $= 0.2 \text{ AND } H_1 = 0.5)$ | | Fig. 4.10 Eye diagrams of THP, FFE, FF-THP in PAM-4 signaling | | COMPENSATING FOR THE CHANNEL ( $H_{-1} = 0.2$ and $H_1 = 0.5$ ) | | Fig. 4.11 Eye diagrams of THP, FFE, FF-THP in PAM-4 signaling | | COMPENSATING FOR THE CHANNEL ( $H_{-1} = 0.2$ and $H_1 = 0.5$ ) WITH GAUSSIAN NOISE 66 | | FIG. 4.12 EYE DIAGRAMS OF FFE AND FF-THP IN PAM-8 SIGNALING | | FIG. 4.13 EYE DIAGRAMS OF FFE AND FF-THP IN PAM-8 SIGNALING | | COMPENSATING FOR THE CHANNEL ( $H_{-1} = 0.125$ and $H_1 = 0.25$ ) WITH GAUSSIAN NOISE | | 68 | | Fig. 5.1 Overall block diagram of 10 Gb/s PAM-4 FF-THP Transmitter71 | | FIG. 5.2 STRUCTURE OF THE MODULO PREDICTION ENGINE (MPE) | | Fig. 5.3 Structure of one phase of 4-parallel FFE without pipelining 75 | | FIG. 5.4 STRUCTURES OF COMPONENTS OF DATA PATH AND SERIALIZING TIMING | | DIAGRAM | | Fig. 5.5 DNL and INL of 8-bit differential DAC | | Fig. 5.6 Measurement setup for 10 Gb/s PAM-4 transmitter | | Fig. 5.7 Eye diagram and histogram of 10 Gb/s PAM-4 transmitter 80 | | | | FIG. 5.8 MEASURED 10 GB/S PAM-4 EYE DIAGRAM AND HISTOGRAM OF TX OUTPUT | | (BOTTOM) | |--------------------------------------------------------------------------| | Fig. 5.9 Measured insertion loss and normalized single bit response of | | THE CHANNEL 82 | | Fig. 5.10 Measured 10 Gb/s eye diagram of FFE (top left) and FF-THP | | (BOTTOM LEFT), 84 | | Fig. 5.11 Calculated decision threshold voltage of FFE and FF-THP 85 | | Fig. 5.12 Chip photomicrograph of 10 Gb/s PAM-4 transmitter86 | | Fig. 5.13 Area and power breakdown at 10 Gb/s PAM-4 with FF-THP 86 | | Fig. 6.1 Overall block diagram of 42 Gb/s PAM-8 FF-THP transmitter 90 | | Fig. 6.2 Structure of 16-parallel modulo prediction engine92 | | FIG. 6.3 MODULO TABLE CELL FOR PAM-893 | | Fig. 6.4 Various 16-parallel MPE structures and their critical path | | DELAY95 | | FIG. 6.5 STRUCTURES OF COMPONENTS OF DATA PATH AND SERIALIZING TIMING | | DIAGRAM96 | | FIG. 6.6 CHARACTERISTICS OF DAC DNL & INL (TOP) AND OUTPUT RESISTANCE | | (воттом) | | Fig. 6.7 Measurement setup for 42 Gb/s PAM-8 FF-THP transmitter98 | | Fig. 6.8 Measured eye diagram of 42 Gb/s PAM-8 transmitter99 | | Fig. 6.9 Measured eye diagram of 32 Gb/s PAM-4 transmitter99 | | Fig. 6.10 Measured insertion loss and single bit response of the Channel | | | | Fig. 6.11 Measured 42 Gb/s eye diagrams of channel outputs FFE (left) | | AND FF-THP (RIGHT) | LIST OF FIGURES XI | | Fig. $6.12$ Measured insertion loss and normalized single bit response of | | |----|---------------------------------------------------------------------------|----| | TH | E CHANNEL1 | 03 | | | Fig. 6.13 Measured 32 Gb/s eye diagrams of channel outputs FFE (left) | | | AN | D FF-THP (RIGHT) | 03 | | | Fig. 6.14 Chip photomicrograph of 42 Gb/s PAM-8 transmitter | 05 | | | FIG. 6.15 AREA AND POWER BREAKDOWN AT 42 GB/S PAM-8 WITH FF-THP 10 | 05 | LIST OF TABLES XII ## **List of Tables** | TABLE 2.1 EXAMPLES OF STEP SIGNAL BASED A DERIVED BY $H_1$ AND $H_{-1}$ | 5 | |---------------------------------------------------------------------------|----| | TABLE 2.2 EXAMPLES OF RAMP SIGNAL BASED A DERIVED BY $H_1$ AND $H_{-1}$ | 20 | | TABLE 2.3 EXAMPLES OF RAMP SIGNAL BASED $H_{-1}$ DERIVED BY $H_1$ AND $A$ | 21 | | TABLE 5.1 PERFORMANCE SUMMARY AND COMPARISON FOR 10 GB/S PAM-4 | | | TRANSMITTER | 38 | | TABLE 6.1 PERFORMANCE SUMMARY AND COMPARISON FOR 42 GB/S PAM-8 | | | TRANSMITTER | )7 | ## Chapter 1 ## Introduction ### 1.1 Motivation A data center, which is a physical environment facility intended for housing computer systems and associated components in definition, is the crux that provides storage, communications, and networking and delivers IT services and business processes in general. The data center has been rapidly growing because it offers low cost and high efficiency [1] – [3] . Especially the growth of hyperscale data center, which has more than 5,000 servers, is shown in Fig. 1.1. The number of hyperscale data center will increase from 338 to 628 in 6 years with a 13% of compound annual growth rate (CAGR). Also, the percentage share of data center servers almost doubled from 27% to 53% in the same period [4] . Fig. 1.1 Growth of global hyperscale data center [4] The annual global data center traffic is estimated to be 6.8 ZB in 2016 and will triple to 20.6 ZB in 2021 with a 25% CAGR. Especially, the traffic of cloud data centers in 2021, which is 95% of total data center traffic, is 19.5 ZB, which will be tripled with 27% CAGR since 2016, as shown in Fig. 1.2 [4]. Fig. 1.2 Growth of cloud data center traffic [4] Fig. 1.3 Per-lane data-rate vs. year for a variety of common I/O standards [5] These growths of the hyperscale data center and the data traffic inevitably require an increase in transmission speed and bandwidth. Accordingly, the data rate per lane of various I/O standards increased rapidly over time, as shown in Fig. 1.3. The data rates per lane in various standards are growing exponentially as the year passes. For example, the peripheral component interconnect express (PCIe) has doubled every 3~4 years, and the tendency is even faster nowadays [5]. To line with these trends, the channel loss also has increased as the year passes. Fig. 1.4 shows the channel loss vs. year and data rate from 2017 to 2022 for various papers [6] – [31]. Fig. 1.4 Trends of channel loss vs. year (top) and vs. data rate (bottom) [6] – [31] Due to the rapid growth of demand for data throughput in wireline interfaces, multi-level signaling such as pulse amplitude modulation (PAM), especially PAM-4, is widely adopted in many standards such as Fibre Channel, InfiniBand, and Ethernet [32] – [34]. Despite the fact that multi-level PAM signaling can significantly increase the data throughput, inter-symbol interference (ISI) of a channel and the reduced signal-to-noise ratio (SNR) substantially degrade signal integrity and bit error rate (BER) performance, thus making it a great challenge to employ PAM-4 signaling on a high-loss channel. While channel equalization can be done at both the transmitter (TX) and the receiver (RX), there are a few advantages in equalizing the channel loss on the TX side. Equalization is more straightforward to implement at the TX than at the RX side because TX has the exact information of the input data, whereas the RX may have a sampling error. Therefore asymmetric links, such as DRAM interfaces, may use TX equalization thanks to its simplicity [35]. While a feed-forward equalizer (FFE) in the form of a finite impulse response (FIR) filter is widely employed, because of a scaling factor imposed on by maximum drivable voltage or current, the eye opening on a high-loss channel can be significantly reduced [36]. While the nonlinear decision feedback equalizer (DFE) is widely used for being immune to noise boosting, errors tend to occur in bursts that exacerbate the forward error correction (FEC) performance. Thus, combining the DFE and the FEC in PAM-4 signaling can bring out significant performance degradation [36] – [39]. As an alternative, Tomlinson-Harashima precoding (THP) is a viable candidate for TX equalization for a high-loss channel since, by being nonlinear, it offers an SNR gain and evenly distributed transmitted signal [40], [41]. The THP can theoretically equalize a wide range of channels, regardless of the channel loss [42], [43]. Albeit attractive, when it comes to a physical implementation, the use of THP is limited because of the feedback timing constraint and the lack of pre-cursor-handling capability, which are the same problem as the DFE. Various techniques, such as pipelining and mapping, have been reported to relieve the timing constraint in the THP implementation [44] – [46]. However, even though the timing constraint is alleviated, the pre-cursor ISI has remained a problem equalizing a high-loss channel. Therefore, another approach that has been reported is a model predictive control (MPC) that offers limited controllability of a pre-cursor ISI [47], [48]. In this thesis, the channel modeling, the importance of removing a pre-cursor, and the optimized tap coefficients of FFE are derived. Also, THP and variations are introduced for multi-level signaling, and the effects of a pre-cursor are verified. In addition, the feed-forward Tomlinson-Harashima precoding (FF-THP) architecture incorporating the pre-cursor compensation in the modulo-based equalization is proposed. Also, the multi-level transmitters adopting FF-THP achieve large channel loss compensation and high data rate, multi-level transmission with power efficiency compared to stat-of-the-art circuits. ## 1.2 Thesis Organization This thesis is organized as follows. In Chapter 2, the backgrounds of the channel model and FFE tap coefficient optimization for a high-speed interface are presented. The methods to model the 1-pole channel having a pre-cursor with unity DC gain and FFE tap coefficients optimization for the channel are discussed. In Chapter 3, Tomlinson-Harashima precoding and variations are presented. The basic concept and operation of THP are featured, and the pros and cons of THP are discussed. To compensate a pre-cursor, the pre-cursor THP and two kinds of combinations of THP and FFE are presented. Also, the analysis of them and simulation results are shown. In Chapter 4, feed-forward Tomlinson-Harashima precoding is presented. To achieve both high-speed operation and multi-level signaling, FF-THP is proposed, and the effectiveness of FF-THP compared to FFE and THP is discussed in mathematical equations and behavior simulations. In Chapter 5, 10 Gb/s PAM-4 TX with FF-THP is presented. The proposed modulo prediction engine and circuit implementations are featured, and the effectiveness of FF-THP is demonstrated in measurement results. In addition, the estimated biterror rate is calculated based on the histogram of the TX output. In Chapter 6, 42 Gb/s PAM-8 TX with FF-THP is presented. The optimization of 16-parallel MPE and circuit components is explained. Also, the measurement results supporting the effectiveness of FF-THP are shown. Chapter 7 summarizes the proposed works and concludes this thesis. ## Chapter 2 # Backgrounds of Channel Model and FFE Tap Coefficient Optimization for High-Speed Interface ### 2.1 Overview There are various channel modeling methods, such as RC, LC, and RLGC models. The approximation formula varies depending on the frequency of transmitted data and the values of channel elements (R, L, G, and C). In general, a channel used in a high-speed interface can be seen as a transmission line, and the main loss factors of this transmission line are resistive loss and dielectric loss caused by the conductor skin effect and dielectric absorption, respectively. Both types of loss increase with transmission frequency, and while the skin effect increases in proportion to the square root of the frequency, dielectric loss increases linearly with frequency [50] – [53]. Therefore, for accurate channel modeling, both types of loss must be adopted, but in the case of dielectric loss, it is complicated to be modeled because of the complexity. However, since both loss increases as the frequency increases are the same, the wireline channel can be modeled based on the skin effect to examine a tendency of channel response and an effect of equalization. Fig. 2.1 Model of skin effect in RL ladder (top) and RC ladder (bottom) [51] Therefore, this chapter will deal with the channel model exploring the resistive loss due to the skin effect, which is based on the RC channel (1-pole channel). Also, in order to represent the response of the actual channel, a pre-cursor was introduced to approximate the response. In addition, for each channel model, tap coefficient optimization of FFE, which is widely used for TX equalization, will be derived. ## 2.2 Modeling 1-Pole Channel having a Precursor and Single-Bit Response and Importance of a Pre-Cursor Controllability #### 2.2.1 1-Pole Channel and Single-Bit Response Fig. 2.2 1-pole channel and normalized single-bit response The 1-pole channel can be modeled as RC channel, whose time-domain relationship between input and output can be expressed as (2.1). $$RC\frac{dv_{out}(t)}{dt} + v_{out}(t) = v_{in}(t)$$ (2.1) Using Laplace transformation, (2.1) can be expressed in the frequency domain as (2.2). $$(1 + RCs)V_{out}(s) = V_{in}(s)$$ $$V_{out}(s) = \frac{1}{1 + RCs}V_{in}(s)$$ (2.2) The time-domain and frequency-domain function of single-bit voltage is expressed as (2.3). $$v_{in}(t) = u(t) - u(t - T)$$ $$V_{in}(s) = (\frac{1}{s} - \frac{e^{-aTs}}{s})$$ (2.3) Applying (2.3) to (2.2), the frequency-domain single-bit response (SBR) of the 1-pole channel is derived as (2.4). Also, applying Laplace inverse transformation, time domain SBR is derived as (2.5). $$V_{out}(s) = (\frac{1}{s} - \frac{1}{s + 1/RC})(1 - e^{-aTs})$$ (2.4) $$v_{out}(t) = (1 - e^{-\frac{t}{RC}})u(t) - (1 - e^{-\frac{t-T}{RC}})u(t-T)$$ (2.5) The main cursor, $H_0$ , post-cursors, $H_i$ , and the normalized post-cursors, $h_i$ , are expressed as below. $$v_{out}(T) = H_0 = (1 - e^{-\frac{T}{RC}})$$ (2.6) $$v_{out}((i+1)T) = H_i = (1 - e^{-\frac{(i+1)T}{RC}}) - (1 - e^{-\frac{iT}{RC}}) = e^{-\frac{iT}{RC}}(1 - e^{-\frac{T}{RC}})$$ (2.7) $$h_{i} = \frac{H_{i}}{H_{0}} = e^{-\frac{iT}{RC}} = h_{1}^{i}$$ (2.8) Therefore, the RC value for generating a 1-pole channel corresponding to the desired data rate and post-cursor values is derived as (2.9). $$RC = -\frac{T}{\ln h_1} \tag{2.9}$$ ## 2.2.2 Step Signal based 1-Pole Channel having a Pre-cursor and Single-Bit Response Fig. 2.3 Step signal based 1-pole channel having a pre-cursor and single-bit response The time-domain relationship between the input and output of a step signal based 1-pole channel having a pre-cursor and the SBR can be expressed as (2.10). $$v_{in}(t) = RC \frac{dv_{out}(t)}{dt} + v_{out}(t)$$ $$= \frac{1}{2} \{ u(t) + u(t - aT) - u(t - (a + b)T) - u(t - (2a + b)T) \}$$ (2.10) Using Laplace transformation, (2.10) can be expressed in the frequency domain as (2.11). $$(1+RCs)V_{out}(s) = \frac{1}{2} \left( \frac{1}{s} + \frac{e^{-aTs}}{s} - \frac{e^{-(a+b)Ts}}{s} - \frac{e^{-(2a+b)Ts}}{s} \right)$$ $$V_{out}(s) = \frac{1}{2} \left( \frac{1}{s} - \frac{1}{s+1/RC} \right) (1 + e^{-aTs} - e^{-(a+b)Ts} - e^{-(2a+b)Ts})$$ (2.11) Applying Laplace inverse transformation to (2.11), the $v_{out}(t)$ can be represented as below. $$v_{out}(t) = \frac{1}{2} \left\{ (1 - e^{-\frac{t}{RC}})u(t) + (1 - e^{-\frac{t-aT}{RC}})u(t - aT) - (1 - e^{-\frac{t-(a+b)T}{RC}})u(t - (a+b)T) - (1 - e^{-\frac{t-(2a+b)T}{RC}})u(t - (2a+b)T) \right\}$$ (2.12) The main cursor, $H_0$ , pre-cursor, $H_{-1}$ , post-cursors, $H_i$ , the normalized pre-cursor, $h_{-1}$ , and post-cursors, $h_i$ , are expressed as below. $$H_0 = v_{out}((2a+b)T) = \frac{1}{2}(1 + e^{-\frac{aT}{RC}} - e^{-\frac{(a+b)T}{RC}} - e^{-\frac{(2a+b)T}{RC}})$$ (2.13) $$H_{-1} = v_{out}(aT) = \frac{1}{2}(1 - e^{-\frac{aT}{RC}})$$ (2.14) $$\begin{split} H_{i} &= v_{out}((2a+b+i)T) = \frac{1}{2}(e^{-\frac{iT}{RC}} + e^{-\frac{(a+i)T}{RC}} - e^{-\frac{(a+b+i)T}{RC}} - e^{-\frac{(2a+b+i)T}{RC}}) \\ &= \frac{1}{2}e^{-\frac{iT}{RC}}(1 + e^{-\frac{aT}{RC}} - e^{-\frac{(a+b)T}{RC}} - e^{-\frac{(2a+b)T}{RC}}) = H_{0}e^{-\frac{iT}{RC}} \end{split} \tag{2.15}$$ $$h_{-1} = \frac{v_{out}(aT)}{v_{out}((2a+b)T)} = \frac{1 - e^{-\frac{aT}{RC}}}{(1 + e^{-\frac{aT}{RC}})(1 - e^{-\frac{(a+b)T}{RC}})}$$ (2.16) $$h_{i} = \frac{H_{i}}{H_{0}} = e^{-\frac{iT}{RC}} = h_{1}^{i}$$ (2.17) Therefore, considering a + b = 1, a can be represented by $h_{-1}$ and $h_1$ as (2.18). $$h_{-1} = \frac{1 - e^{-\frac{aT}{RC}}}{(1 + e^{-\frac{aT}{RC}})(1 - e^{-\frac{(a+b)T}{RC}})} = \frac{1 - e^{-\frac{aT}{RC}}}{(1 + e^{-\frac{aT}{RC}})(1 - h_1)}$$ $$\Rightarrow \frac{1 - h_{-1}(1 - h_1)}{1 + h_{-1}(1 - h_1)} = e^{-\frac{aT}{RC}} = h_1^a$$ $$\Rightarrow a = \frac{\ln(\frac{1 - h_{-1}(1 - h_1)}{1 + h_{-1}(1 - h_1)})}{\ln(h_1)}$$ (2.18) As a result, a 1-pole channel having a pre-cursor corresponding to the desired $h_1$ and $h_{-1}$ can be generated by modifying single-bit input to step-shaped single-bit input with the a. The examples of step signal based a derived by $h_1$ and $h_{-1}$ are shown in Table 2.1. | $h_{-1}$ $h_1$ | 0 | 0.1 | 0.2 | 0.3 | |----------------|---|--------|--------|--------| | 0.1 | 0 | 0.0784 | 0.1581 | 0.2405 | | 0.2 | 0 | 0.0996 | 0.2006 | 0.3042 | | 0.3 | 0 | 0.1165 | 0.2341 | 0.3541 | | 0.4 | 0 | 0.1311 | 0.2632 | 0.3972 | | 0.5 | 0 | 0.1444 | 0.2895 | 0.4361 | | 0.6 | 0 | 0.1567 | 0.3139 | 0.4721 | Table 2.1 Examples of step signal based a derived by $h_1$ and $h_{-1}$ ## 2.2.3 Ramp Signal based 1-Pole Channel having a Pre-cursor and Single-Bit Response Fig. 2.4 Ramp signal based 1-pole channel having a pre-cursor and single-bit response Similar to the step signal, the time-domain relationship between the input and output of ramp signal based 1-pole channel having a pre-cursor and the SBR can be expressed as (2.19). $$v_{in}(t) = RC \frac{dv_{out}(t)}{dt} + v_{out}(t)$$ $$= \frac{1}{aT} \int \{u(t) - u(t - aT) - u(t - (a + b)T) + u(t - (2a + b)T)\}dt$$ (2.19) Using Laplace transformation, frequency-domain $V_{out}(s)$ is expressed as (2.20). $$V_{out}(s) = \frac{1}{aT} \frac{1}{RC} \frac{1}{s+1/RC} \frac{1 - e^{-aTs} - e^{-(a+b)Ts} + e^{-(2a+b)Ts}}{s^2}$$ $$= \frac{1}{aT} \frac{1}{s} \frac{1}{s} \frac{1}{s} \frac{1}{s+1/RC} (1 - e^{-aTs} - e^{-(a+b)Ts} + e^{-(2a+b)Ts})$$ (2.20) Also, using Laplace inverse transformation, time-domain $v_{out}(t)$ is represented as (2.21). $$(1 - e^{-\frac{t}{RC}})u(t) - (1 - e^{-\frac{t-aT}{RC}})u(t - aT)$$ $$v_{out}(t) = \frac{1}{aT} \int_{-(1 - e^{-\frac{t-(a+b)T}{RC}})} u(t - (a+b)T) dt$$ $$+(1 - e^{-\frac{t-(2a+b)T}{RC}})u(t - (2a+b)T) \}$$ $$(2.21)$$ Considering a + b = 1, the main cursor, $H_0$ , and post-cursors, $H_i$ , are represented as (2.22) and (2.23), respectively. $$\begin{split} H_0 &= v_{out}((1+a)T) \\ &= \frac{1}{aT} \Big[ \int_0^{(1+a)T} (1 - e^{-\frac{t}{RC}}) dt - \int_{aT}^{(1+a)T} (1 - e^{-\frac{t-aT}{RC}}) dt - \int_T^{(1+a)T} (1 - e^{-\frac{t-T}{RC}}) dt \Big] \\ &= \frac{1}{aT} \Big[ \Big\{ (1+a)T - RC(1 - e^{-\frac{(1+a)T}{RC}}) \Big\} \\ &- \Big\{ T - RC(1 - e^{-\frac{T}{RC}}) \Big\} - \Big\{ aT - RC(1 - e^{-\frac{aT}{RC}}) \Big\} \Big] \\ &= \frac{RC}{aT} \Big( 1 - e^{-\frac{aT}{RC}} - e^{-\frac{T}{RC}} + e^{-\frac{(1+a)T}{RC}} \Big) \end{split}$$ $$\begin{split} H_{i} &= v_{out}((1+a+i)T) \\ &= \frac{1}{aT} \Big[ \int_{0}^{(1+a+i)T} (1-e^{-\frac{t}{RC}}) dt - \int_{aT}^{(1+a+i)T} (1-e^{-\frac{t-aT}{RC}}) dt \\ &- \int_{T}^{(1+a+i)T} (1-e^{-\frac{t-T}{RC}}) dt - \int_{(1+a)T}^{(1+a+i)T} (1-e^{-\frac{t-(1+a)T}{RC}}) dt \Big] \\ &= \frac{1}{aT} \Big[ \{ (1+a+i)T - RC(1-e^{-\frac{(1+a+i)T}{RC}}) \} - \{ (1+i)T - RC(1-e^{-\frac{(1+i)T}{RC}}) \} \\ &- \{ (a+i)T - RC(1-e^{-\frac{(a+i)T}{RC}}) \} + \{ iT - RC(1-e^{-\frac{iT}{RC}}) \} \Big] \\ &= \frac{RC}{aT} \Big( e^{-\frac{iT}{RC}} - e^{-\frac{(a+i)T}{RC}} - e^{-\frac{(1+a)T}{RC}} + e^{-\frac{(1+a+i)T}{RC}} \Big) \\ &= \frac{RC}{aT} e^{-\frac{iT}{RC}} (1-e^{-\frac{aT}{RC}} - e^{-\frac{T}{RC}} + e^{-\frac{(1+a)T}{RC}} \Big) \\ &= e^{-\frac{iT}{RC}} H_{0} \end{split}$$ Therefore, normalized post-cursors, $h_i$ , is represented as (2.24), and applying (2.24), the $H_0$ can be re-expressed as (2.25). $$h_i = \frac{H_i}{H_0} = e^{-\frac{iT}{RC}} = h_1^i$$ (2.24) $$H_0 = \frac{RC}{aT} (1 - e^{-\frac{aT}{RC}} - e^{-\frac{T}{RC}} + e^{-\frac{(1+a)T}{RC}}) = \frac{RC}{aT} (1 - h_1^a - h_1 + h_1^{1+a})$$ (2.25) Then, the pre-cursor and normalized pre-cursor, $H_{-1}$ and $h_{-1} = H_{-1}/H_0$ , are expressed as below. $$\begin{split} H_{-1} &= v_{out}(aT) = \frac{1}{aT} \int_{0}^{aT} (1 - e^{-\frac{t}{RC}}) dt \\ &= \frac{1}{aT} [aT - RC(1 - e^{-\frac{aT}{RC}})] = 1 - \frac{RC}{aT} (1 - h_{1}^{a}) \end{split} \tag{2.26}$$ $$h_{-1} = \frac{v_{out}(aT)}{v_{out}((1+a)T)} = \frac{\frac{aT}{RC} - 1 + h_1^a}{1 + h_1^{1+a} - h_1 - h_1^a} = \frac{-a \ln h_1 - 1 + h_1^a}{1 + h_1^{1+a} - h_1 - h_1^a}$$ $$= \frac{-\ln h_1^a - 1 + h_1^a}{(1 - h_1)(1 - h_1^a)} = -\frac{1}{(1 - h_1)} (1 + \frac{\ln h_1^a}{1 - h_1^a})$$ (2.27) Using Taylor series approximation, $h_{-1}$ can be approximated as follows. $$h_{-1} = -\frac{1}{(1 - h_1)} \left(1 + \frac{\ln h_1^a}{1 - h_1^a}\right) = -\frac{1}{(1 - h_1)} \left(1 + \frac{x}{1 - e^x}\right) \quad (x = \ln h_1^a)$$ $$\approx -\frac{1}{(1 - h_1)} \left(1 + \frac{x}{-x - x^2/2}\right) = -\frac{1}{(1 - h_1)} \frac{-x - x^2/2 + x}{-x - x^2/2} = -\frac{1}{(1 - h_1)} \frac{x}{2 + x} \quad (2.28)$$ $$\approx -\frac{1}{(1 - h_1)} \frac{x}{2} = -\frac{\ln h_1^a}{2(1 - h_1)} = -\frac{a \ln h_1}{2(1 - h_1)}$$ Then the rise/fall time of ramp signal a is represented by $h_1$ and $h_{-1}$ as (2.29). $$a \simeq -\frac{2h_{-1}(1 - h_1)}{\ln h_1} \tag{2.29}$$ As a result, a 1-pole channel having a pre-cursor corresponding to the desired $h_1$ and $h_{-1}$ can be generated by modifying single-bit input to ramp-shaped single-bit input with the a. The examples of ramp signal based a derived by $h_1$ and $h_{-1}$ are shown in Table 2.2, which is very similar to Table 2.1. | $h_{-1}$ $h_1$ | 0 | 0.1 | 0.2 | 0.3 | |----------------|---|--------|--------|--------| | 0.1 | 0 | 0.0782 | 0.1563 | 0.2345 | | 0.2 | 0 | 0.0994 | 0.1988 | 0.2982 | | 0.3 | 0 | 0.1163 | 0.2326 | 0.3488 | | 0.4 | 0 | 0.1310 | 0.2619 | 0.3929 | | 0.5 | 0 | 0.1443 | 0.2885 | 0.4328 | | 0.6 | 0 | 0.1566 | 0.3132 | 0.4698 | Table 2.2 Examples of ramp signal based a derived by $h_1$ and $h_{-1}$ # 2.2.4 Importance of a Pre-cursor Controllability Also, using (2.28), Table 2.3 shows the examples of $h_{-1}$ derived by $h_1$ and a. | $h_1$ | 0 | 0.1 | 0.2 | 0.3 | |-------|---|--------|--------|--------| | 0.1 | 0 | 0.1279 | 0.2558 | 0.3838 | | 0.2 | 0 | 0.1006 | 0.2012 | 0.3018 | | 0.3 | 0 | 0.0860 | 0.1720 | 0.2580 | | 0.4 | 0 | 0.0764 | 0.1527 | 0.2291 | | 0.5 | 0 | 0.0693 | 0.1386 | 0.2079 | | 0.6 | 0 | 0.0639 | 0.1277 | 0.1916 | Table 2.3 Examples of ramp signal based $h_{-1}$ derived by $h_1$ and a The above table means that even though the channel is a 1-pole channel generating only post-cursors, the rise/fall time of the data signal makes a pre-cursor ISI. As the data rate increases, the 1-UI is reduced, and the portion of rise/fall time in 1-UI increases. Therefore, pre-cursor controllability is much more crucial for high-speed serial link systems. #### 2.3 FFE Tap Coefficient Optimization for #### 1-Pole Channel having a Pre-cursor ## 2.3.1 1-Tap FFE Coefficient Optimization for 1-Pole Channel Fig. 2.5 Structure of feed-forward equalizer FFE is a widely used equalization method in TX. The post-cursors of a 1-pole channel can be easily equalized by FFE. The z-domain transfer function of the 1-pole channel having unity DC gain and the main cursor, $H_0$ , are expressed as follows. $$H_{ch}(z) = H_0 + H_1 z^{-1} + \dots + H_N z^{-N}$$ $$= \sum_{i=0}^{N} H_i z^{-i} = H_0 \sum_{i=0}^{N} h_i z^{-i} = H_0 \sum_{i=0}^{N} h_1^{i} z^{-i}$$ (2.30) $$H_0 = \frac{1}{1 + \sum_{i=1}^{n} h_i} = \frac{1}{1 + \sum_{i=1}^{n} h_i^i} = \frac{1}{1 + h_1/1 - h_1} = 1 - h_1$$ (2.31) The z-domain transfer function of 1-tap FFE is (2.32). $$H_{FFE}(z) = A_0(1 + a_1 z^{-1}), \quad A_0 = \frac{1}{1 + |a_1|}$$ (2.32) Then, the total z-domain response of the 1-pole channel with the 1-tap FFE is (2.33). $$\begin{split} H_{ch}(z)H_{FFE}(z) &= H_0A_0(1+a_1z^{-1})(1+\sum_{i=1}h_iz^{-i}) = H_0A_0(1+\sum_{i>0}(h_i+a_1h_{i-1})z^{-i}) \\ &= H_0A_0(1+\sum_{i>0}(h_1^i+a_1h_1^{i-1})z^{-i}) \end{split} \tag{2.33}$$ With $a_1 = -h_1$ , ISIs are zero for i > 0 terms in the z-domain response. Then, the result, $H_0A_0$ , is expressed as below. $$H_0 A_0 = (1 - h_1) \frac{1}{1 + |-h_1|} = \frac{1 - h_1}{1 + h_1}$$ (2.34) Therefore, the optimized 1-tap FFE coefficient $a_1$ is equal to $-h_1$ , and with this coefficient, FFE can perfectly equalize the post-cursors of the 1-pole channel. #### 2.3.2 FFE Tap Coefficient Optimization for 1-Pole Channel having a Pre-cursor Similar to (2.30) and (2.31), the z-domain response of the 1-pole channel having a pre-cursor and the main cursor of the channel, $H_0$ , are represented as (2.35) and (2.36), respectively. $$H_{ch}(z) = H_0(h_{-1}z^1 + 1 + \sum_{i=1}^{n} h_i z^{-i}), h_i = h_1^i$$ (2.35) $$H_0 = \frac{1}{h_{-1} + 1 + \sum_{i=1}^{n} h_i} = \frac{1}{h_{-1} + 1 + \sum_{i=1}^{n} h_i^{i}} = \frac{1}{h_{-1} + 1 + h_1/1 - h_1} = \frac{1 - h_1}{1 + h_{-1}(1 - h_1)}$$ (2.36) The z-domain response of an FFE with 2-pre-tap and 1-post-tap can be represented as (2.37). Also, the magnitude normalizing coefficient, $A_0$ , is derived by normalized tap coefficients as (2.38). $$H_{FFE}(z) = A_0 (a_{-2} z^2 + a_{-1} z^1 + 1 + a_1 z^{-1})$$ (2.37) $$A_0 = \frac{1}{|a_{-2}| + |a_{-1}| + 1 + |a_1|}$$ (2.38) The multiplication of $H_{ch}(z)$ and $H_{FFE}(z)$ is expressed as below. $$\begin{split} H_{ch}(z)H_{FFE}(z) &= H_0A_0(a_{-2}z^2 + a_{-1}z^1 + 1 + a_1z^{-1})(h_{-1}z^1 + 1 + \sum_{i=1}h_iz^{-i}) \\ &= H_0A_0(h_{-1}a_{-2}z^3 + (a_{-2} + h_{-1}a_{-1})z^2 + (a_{-2}h_1 + a_{-1} + h_{-1})z^1 \\ &\quad + (a_{-2}h_2 + a_{-1}h_1 + 1 + a_1h_{-1}) \\ &\quad + \sum_{i>0}(a_{-2}h_{2+i} + a_{-1}h_{1+i} + h_i + a_1h_{i-1})z^{-i}) \end{split} \tag{2.39}$$ Let $a_{-2} = 0$ , which means that the FFE has 1-pre-tap and 1-post-tap, the normalized magnitude of pre-cursor ISI is represented as (2.40). $$ISI_{pre}(a_{-1}) = |h_{-1}a_{-1}| + |a_{-1} + h_{-1}|$$ (2.40) Generally, $h_{-1}$ is a positive value, and we can assume that $a_{-1} < 0$ . Then, the sign of the forepart of $ISI_{pre}(a_{-1})$ is determined. However, the sign of the later part depends on the magnitude of $a_{-1}$ . $$ISI_{pre}(a_{-1}) = -h_{-1}a_{-1} + |a_{-1} + h_{-1}|$$ (2.41) First case: $|a_{-1}| < |h_{-1}|$ $$ISI_{pre}(a_{-1}) = -h_{-1}a_{-1} + a_{-1} + h_{-1} \Rightarrow \frac{\partial ISI_{pre}(a_{-1})}{\partial a_{-1}} = -h_{-1} + 1 > 0$$ (2.42) For this case, when $a_{-1}$ is maximized, $ISI_{pre}(a_{-1})$ is minimized. Second case: $|a_{-1}| > |h_{-1}|$ $$ISI_{pre}(a_{-1}) = -h_{-1}a_{-1} - a_{-1} - h_{-1} \Rightarrow \frac{\partial ISI_{pre}(a_{-1})}{\partial a_{-1}} = -h_{-1} - 1 < 0 \tag{2.43}$$ For this case, when $a_{-1}$ is minimized, $ISI_{pre}(a_{-1})$ is minimized. As a result, when $a_{-1} = -h_{-1}$ , $ISI_{pre}$ is minimized with $h_{-1}^2$ . Then, considering $z^{-i}$ term, the normalized magnitude of post-cursor ISI can be represented as (2.44). $$ISI_{post}(a_1) = \sum_{i>0} |a_{-1}h_1 + 1 + \frac{a_1}{h_1}|h_i = \sum_{i>0} |-h_{-1}h_1 + 1 + \frac{a_1}{h_1}|h_i$$ (2.44) $ISI_{post}(a_1)$ is equal to 0 with $a_1 = -h_1(1-h_{-1}h_1)$ . Therefore, to equalize 1-pole channel having a pre-cursor with FFE incorporating 1-pre-tap and 1-post-tap, the tap coefficients minimizing the ISI are $a_{-1} = -h_{-1}$ and $a_1 = -h_1(1-h_{-1}h_1)$ with $ISI_{total} = h_{-1}^2$ . Expanding this tap coefficient optimizing method, the second pre-tap of FFE also can be optimized. Let $a_{-2} \neq 0$ , the normalized magnitude of pre-cursor ISI can be represented as (2.45). $$ISI_{pre}(a_{-2}, a_{-1}) = |h_{-1}a_{-2}| + |a_{-2} + a_{-1}h_{-1}| + |a_{-2}h_{1} + a_{-1} + h_{-1}|$$ (2.45) Consider that the last part of $ISI_{pre}(a_{-2}, a_{-1})$ becomes 0 with $a_{-1} = -(h_{-1} + a_{-2}h_1)$ . Then, the normalized magnitude of pre-cursor ISI depends solely on $a_{-2}$ as (2.46). $$ISI_{pre}(a_{-2}) = |h_{-1}a_{-2}| + |a_{-2} - (h_{-1} + a_{-2}h_{1})h_{-1}|$$ $$= |h_{-1}a_{-2}| + |a_{-2}(1 - h_{1}h_{-1}) - h_{-1}^{2}|$$ (2.46) Similar to previous cases, the signs of pre-cursor ISI parts are determined by the magnitude of $a_{-2}$ . First case: $a_{-2} < 0$ $$ISI_{pre}(a_{-2}) = -h_{-1}a_{-2} - (a_{-2}(1 - h_{1}h_{-1}) - h_{-1}^{2})$$ $$\Rightarrow \frac{\partial ISI_{pre}(a_{-2})}{\partial a_{-2}} = -h_{-1} - (1 - h_{1}h_{-1}) < 0$$ (2.47) When $a_{-2} < 0$ , as $a_{-2}$ increases, $ISI_{pre}(a_{-2})$ decreases. Second case: $a_{-2} > 0$ $$ISI_{pre}(a_{-2}) = h_{-1}a_{-2} \pm (a_{-2}(1 - h_1h_{-1}) - h_{-1}^{2})$$ $$\Rightarrow \frac{\partial ISI_{pre}(a_{-2})}{\partial a_{-2}} = h_{-1} \pm (1 - h_1h_{-1})$$ (2.48) In this case, the sign of the second part of $ISI_{pre}(a_{-2})$ depends on the magnitude of $a_{-2}$ . Second-first case: $a_{-2} > h_{-1}^2/(1-h_1h_{-1})$ $$\frac{\partial ISI_{pre}(a_{-2})}{\partial a_{-2}} = h_{-1} + (1 - h_1 h_{-1}) = 1 + h_{-1}(1 - h_1) > 0 \tag{2.49}$$ For this case, as $a_{-2}$ decreases, $ISI_{pre}(a_{-2})$ also decreases. Second-second case: $a_{-2} < h_{-1}^2/(1-h_1h_{-1})$ $$\frac{\partial ISI_{pre}(a_{-2})}{\partial a_{-2}} = h_{-1} - (1 - h_1 h_{-1}) = -1 + h_{-1}(1 + h_1) < 0 \tag{2.50}$$ Generally, because $h_{-1}$ is smaller than 0.5 and $h_1$ is smaller than 1, the partial differential is smaller than 0. Therefore, in this case, as $a_{-2}$ increases, $ISI_{pre}(a_{-2})$ decreases. To sum up, in the first case $(a_{-2} < 0)$ $ISI_{pre}(a_{-2})$ decreases as $a_{-2}$ increases, and in the second case $(a_{-2} > 0)$ $ISI_{pre}(a_{-2})$ is minimized when $a_{-2} = h_{-1}^2/(1-h_1h_{-1})$ . As a result, the minimized value and $a_{-2}$ are shown below. $$ISI_{pre} = \frac{h_{-1}^{3}}{1 - h_{1}h_{-1}} \tag{2.51}$$ $$a_{-2} = \frac{h_{-1}^{2}}{1 - h_{1}h_{-1}} \tag{2.52}$$ Then the resulted $a_{-1}$ is expressed as (2.53). $$a_{-1} = -(h_{-1} + a_{-2}h_1) = -h_{-1} - \frac{h_{-1}^2 h_1}{1 - h_1 h_{-1}} = -\frac{h_{-1}}{1 - h_1 h_{-1}}$$ (2.53) Also, as same as the previously optimized tap coefficient, the optimized post-tap $a_1$ perfectly canceling is derived as (2.54). $$ISI_{post}(a_{1}) = \sum_{i>0} |a_{-2}h_{1}^{2} + a_{-1}h_{1} + 1 + \frac{a_{1}}{h_{1}}|h^{i}$$ $$= \sum_{i>0} |\frac{h_{-1}^{2}}{1 - h_{1}h_{-1}}h_{1}^{2} - \frac{h_{-1}}{1 - h_{1}h_{-1}}h_{1} + 1 + \frac{a_{1}}{h_{1}}|h^{i}$$ $$= \sum_{i>0} |-h_{-1}h_{1} + 1 + \frac{a_{1}}{h_{1}}|h^{i} = \sum_{i>0} |a_{1} + h_{1}(1 - h_{-1}h_{1})|\frac{h^{i}}{h_{1}}$$ $$\Rightarrow ISI_{post}(a_{1} = -h_{1}(1 - h_{-1}h_{1})) = 0$$ (2.54) Therefore, to equalize a 1-pole channel having a pre-cursor with 2-pre-tap and 1-post-tap FFE, the optimized tap coefficient that minimizes the normalized ISI are derived as (2.52), (2.53), and (2.54). As a result, the minimized ISI, $ISI_{total}$ , and the magnitude normalizing coefficient, $A_0$ , are derived as (2.55) and (2.56), respectively. $$ISI_{total} = ISI_{pre} = \frac{h_{-1}^{3}}{1 - h_{1}h_{-1}}$$ (2.55) $$\begin{split} A_{0} &= \frac{1}{\mid a_{-2}\mid + \mid a_{-1}\mid + 1 + \mid a_{1}\mid} = \frac{1}{a_{-2} - a_{-1} + 1 + a_{1}} \\ &= \frac{1 - h_{1}h_{-1}}{1 + h_{1} + h_{-1} - h_{1}h_{-1} - 2h_{1}^{2}h_{-1} + h_{-1}^{2} - 2h_{1}h_{-1}^{2} + h_{1}^{3}h_{-1}^{2}} \end{split} \tag{2.56}$$ We can notice that the optimized $a_1$ is equal to the previous case (1-pre-tap and 1-post-tap FFE case), which means that the post-cursor ISI of 1-pole channel can be perfectly canceled by appropriate 1-post-tap of FFE regardless of the magnitude of $h_{-1}$ and $h_1$ . In addition, considering the expansion of 1-pre-tap and 1-post-tap FFE to 2-pre-tap and 1-post-tap FFE, when the number of pre-taps is increased, each tap can be optimized, and the total ISI can be minimized in a similar way. #### Chapter 3 # Tomlinson-Harashima Precoding and Variations #### 3.1 Tomlinson-Harashima Precoding Fig. 3.1 Structure of Tomlinson [42] Fig. 3.2 Structure of Harashima [43] Fig. 3.3 Block diagram of Tomlinson-Harashima precoding and signaling example [35] Tomlinson-Harashima precoding (THP) is developed independently by Tomlinson and Harashima [42], [43]. THP is a matched-transmission based preequalization technique, which introduces a modulo operation. The equalization part of THP is a feedback structure the same as an inverse function of a transfer function of a targeted channel as shown in Fig. 3.1 and Fig. 3.2. The signaling example and block diagram of TX and RX are shown in Fig. 3.3. The post-cursor ISI induced by TX signal a[k] is subtracted to input I[k], which introduces the violation and the modulo operation. In the RX side, the RX input signal y[k] shows additional two levels. The highest level is the same as b0 with modulo operation, and the lowest level is the same as b1 with modulo operation. Therefore, after the modulo operation of RX, the demodulated RX signal r[k] is the same as the input bit stream I[k]. To introduce the modulo operation, the input of THP should be shrink as shown in input stream 1) of Fig. 3.3. The amplitude adjusting coefficient in PAM-L signaling is (3.1). $$A_{\text{THP}} = \frac{L - 1}{I} \tag{3.1}$$ $A_{\rm THP}$ can severely degrade the signal amplitude for NRZ signaling. However, the drawback becomes smaller for the multi-level signaling method. The equalization part of the THP in Fig. 3.3, is expressed as $1/H_{norm}(z)$ , which is the feedback structure. Therefore, the structure of THP can be re-drawn as Fig. 3.4. Fig. 3.4 Structure of Tomlinson-Harashima precoding transmitter The equalization part of THP is an infinite-impulse response (IIR) filter, whose transfer function can be expressed as (3.2). $$H_{\text{IIR}}(z) = \frac{1}{1 + \sum_{i=1}^{n} b_i z^{-i}}$$ (3.2) A channel ISI can be equalized perfectly with the $H_{IIR}(z)$ when the $b_i$ s are equal to $h_i$ s. Therefore, the THP can equalize all channels theoretically. Also, with the modulo operation, the THP has SNR gain compared to FFE. Considering the structure of FFE and THP in Fig. 2.5 and Fig. 3.4, the probability density function (PDF) of TX outputs of the two equalization methods are shown in Fig. 3.5. Denoting $a_i$ s as the tap coefficients of an FFE, the probability mass function (PMF) induced from $a_i$ , $\{P(-a_i), P(+a_i)\}$ , is equal to $\{0.5, 0.5\}$ for NRZ signaling. As the number of taps increases, the PMF is widely distributed. As a result, the PDF of the FFE shows the centralized distribution. Moreover, because of output swing limitation, the final PDF of the FFE is further shrunk horizontally and stretched vertically. On the other hand, for the THP, denoting bis as the tap coefficients of the THP, the feedback equalization system with the modulo operator makes the final PDF of the THP uniformly distributed and offers a higher SNR. Furthermore, this tendency becomes more significant as the target channel loss increases and thus the tap coefficients. As the tap coefficients increase, the centralization of the PDF of the FFE becomes substantial, while the PDF of the THP remains uniform. Therefore, the THP is a viable candidate to equalize a high-loss channel on the TX. For FFE, the red-lined PDF becomes more centralized as the number and magnitude of taps increase. However, THP offers the uniformly distributed blue-lined PDF, even though the number and magnitude of taps increase. Therefore, the THP offers better SNR compared to FFE. However, when it comes to the physical implementation, considering the transfer function, the THP basically cannot remove a pre-cursor of the channel. Also, the feedback structure of the THP is unsuitable for high-speed operation. To equalize the pre-cursor ISI, the FFE is one of the straightforward options to adopt. To deal with the lack of pre-cursor controllability, THP can consider a pre-cursor as the main cursor or be combined with the one-tap FFE [49]. In the next section, these topics will be discussed. Fig. 3.5 Probability density function of TX output of FFE and THP #### 3.2 Pre-cursor Control Using THP #### 3.2.1 Pre-cursor THP Fig. 3.6 Structures of conventional THP and pre-cursor THP with tap coefficients As mentioned before, the tap coefficients of conventional THP are equal to the channel post-cursors, $h_i$ . However, since the channel has a pre-cursor, the multiplication of the transfer function of the equalization part and the channel is expressed as (3.3). $$H_{\text{conv,IIR}}(z)H_{\text{ch}}(z) = \frac{H_0(h_{-1}z^1 + 1 + \sum_{i=1}^{n} h_i z^{-i})}{1 + \sum_{i=1}^{n} h_i z^{-i}} = H_0 + \frac{H_0 h_{-1} z^1}{1 + \sum_{i=1}^{n} h_i z^{-i}}$$ (3.3) Because of the pre-cursor, there is the remained term, which is very complicated. However, using the pre-cursor THP, which considers the pre-cursor as the main cursor, the transfer function of the equalization part of pre-cursor THP is represented as (3.4). $$H_{\text{pre,IIR}}(z) = \frac{1}{1 + \sum_{i=1}^{\infty} \frac{h_{i-1}}{h_{-1}} z^{-i}}$$ (3.4) Then the multiplication of $H_{\text{pre,IIR}}(z)$ and the channel responses are shown as (3.5). $$H_{\text{pre,IIR}}(z)H_{\text{ch}}(z) = \frac{H_0(h_{-1}z^1 + 1 + \sum_{i=1}^{n} h_i z^{-i})}{1 + \sum_{i=1}^{n} \frac{h_{i-1}}{h_{-1}} z^{-i}} = H_0 h_{-1} z^1$$ (3.5) The result shows the pre-cursor THP can perfectly equalize all channel ISI. Incorporating (3.1) to (3.3) and (3.5) in PAM-L signaling, the vertical eye margins (VEM) of conventional THP and pre-cursor THP are derived as (3.6) and (3.7), respectively. $$VEM_{\text{convTHP}} \simeq \frac{H_0}{L} - \frac{(L-1)H_{-1}}{L}$$ (3.6) $$VEM_{\text{preTHP}} = \frac{H_{-1}}{L} \tag{3.7}$$ Generally, because $H_{-1}$ is much smaller than $H_0$ , $VEM_{convTHP}$ is much larger than $VEM_{preTHP}$ . However, when either $h_{-1}$ or L is sufficiently large, the magnitude relationship between the VEMs can be reversed. Even though the pre-cursor THP can remove the pre-cursor ISI, using THP solely is not suitable to compensate a channel having a pre-cursor. The FFE is the most widely used equalization technique, which can easily remove a pre-cursor by adopting a pre-tap. Therefore, combining THP and FFE is one of the options to remove a pre-cursor ISI. There are two ways to incorporate THP and FFE, which are THP-FFE and FFE-THP. Also, there is another consideration, which is the modulo value of RX ( $M_{RX}$ ). As shown in Fig. 3.3, the RX modulation is needed to get a proper data stream. The multiplying ratio of conventional THP for $M_{RX}$ is equal to $H_0$ , which is the coefficient of the zero-order term of z in (3.3). In the same way, the multiplying ratio of pre-cursor THP for $M_{RX}$ is equal to $H_{-1}$ . $$M_{\rm RX, \, convTHP} = M_{\rm TX} H_0 \tag{3.8}$$ $$M_{\rm RX, preTHP} = M_{\rm TX} H_{-1} \tag{3.9}$$ These results are very straightforward. However, when THP is combined with FFE, the RX modulo value and the tap coefficients of THP depend on the FFE. #### **3.2.2 THP-FFE** As opposed to a conventional THP, the tap coefficients of a THP-FFE, which is a series of a THP and an FFE, should be determined depending on the tap coefficients of the FFE. Assume that a one-tap FFE is followed by a conventional THP, letting $a_{-}$ be the pre-tap of the FFE and $b_i$ the modified tap coefficients of the THP. Then, the transfer functions of the THP, without the MOD, and a one-tap FFE are written as (3.2) and (3.10), respectively. $$H_{\text{FFE}}(z) = A_{-1}z^1 + A_0 = A_0(a_{-1}z^1 + 1)$$ (3.10) Where $A_0$ is equal to $1/(1+|a_{-1}|)$ . Then, the overall response at the channel output is given as (3.11). $$\begin{split} H_{\mathrm{ch}}(z)H_{\mathrm{IIR}}(z)H_{\mathrm{FFE}}(z) \\ &= H_0 \sum_{i=-1} h_i z^{-i} \times \frac{1}{1 + \sum_{i=1} b_i z^{-i}} \times A_0(a_{-1} z^{-1} + 1) \\ &= H_0 A_0 \times \frac{h_{-1} a_{-1} z^2 + (h_{-1} + a_{-1}) z^1 + (1 + h_1 a_{-1}) + \sum_{i=1} (h_i + h_{i+1} a_{-1}) z^{-i}}{1 + \sum_{i=1} b_i z^{-i}} \\ &= H_0 A_0 (1 + h_1 a_{-1}) \times \frac{\frac{h_{-1} a_{-1}}{1 + h_1 a_{-1}} z^2 + \frac{h_{-1} + a_{-1}}{1 + h_1 a_{-1}} z^1 + 1 + \sum_{i=1} \frac{h_i + h_{i+1} a_{-1}}{1 + h_1 a_{-1}} z^{-i}}{1 + \sum_{i=1} b_i z^{-i}} \end{split}$$ (3.11) To eliminate the resulting post-cursors, the $z^{-i}$ (i = 1, 2, ...) terms in the numerator and the denominator should be given as the same. Furthermore, to minimize the precursor ISI, $a_{-1}$ should be equal to $h_{-1}$ . Overall, $b_i$ is represented as (3.12). $$b_i = \frac{h_i - h_{i+1} h_{-1}}{1 - h_1 h_{-1}} \tag{3.12}$$ Note that the VEM of a THP-FFE in PAM-L signaling when $a_{-1} = -h_{-1}$ is approximately given by (3.13). $$VEM_{\text{THP-FFE}} \approx \frac{H_0(1 - h_1 h_{-1})}{(1 + h_{-1})} \left(\frac{1}{L} - \frac{(L - 1)h_{-1}^2}{L(1 - h_1 h_{-1})}\right)$$ (3.13) Although IIR filters and FFEs are linear systems, not affecting $b_i$ s, $M_{TX}$ and the modulus of the RX, $M_{RX}$ , may be altered. From Fig. 3.7, It is realized that a THP-FFE is equivalent to a cascade of (3.2) and (3.10) with its input equal to $D_{in}\pm M_{TX}$ . Therefore, with a THP-FFE, $M_{RX}$ is reduced to (3.14). $$\begin{split} M_{\text{RX,THP-FFE}} &= M_{\text{TX}} H_0 A_0 (1 + h_1 a_{-1}) \\ &= \frac{M_{\text{TX}} H_0 (1 + h_1 a_{-1})}{1 + |a_{-1}|} \\ &= M_{\text{RX}} \frac{1 + h_1 a_{-1}}{1 + |a_{-1}|} \end{split} \tag{3.14}$$ Fig. 3.7 Equivalent representations of THP-FFE #### **3.2.3 FFE-THP** Now a series of an FFE and a THP (FFE-THP), whose equivalent representations are illustrated in Fig. 3.8, is presented. While the tap coefficients and the VEM of an FFE-THP are given the same as a THP-FFE, $M_{\rm TX}$ of an FFE-THP is larger than that of a conventional THP by a factor of $1/(A_{-1}+A_0)$ . Furthermore, applying (3.11) to the input of its equivalent representation, $D_{\rm in}\pm M_{\rm TX}/(A_{-1}+A_0)$ , $M_{\rm RX}$ is enlarged to (3.15). $$\begin{split} M_{\text{RX,FFE-THP}} &= \frac{M_{\text{TX}}}{A_0 + A_{-1}} H_0 A_0 (1 + h_1 a_{-1}) \\ &= M_{\text{TX}} \frac{1 + |a_{-1}|}{1 - |a_{-1}|} H_0 \frac{1}{1 + |a_{-1}|} (1 + h_1 a_{-1}) \\ &= M_{\text{TX}} H_0 \frac{1 + h_1 a_{-1}}{1 - |a_{-1}|} \\ &= M_{\text{RX}} \frac{1 + h_1 a_{-1}}{1 - |a_{-1}|} \end{split} \tag{3.15}$$ Fig. 3.8 Equivalent representations of FFE-THP. # 3.3 Simulation Results of Conventional THP, Pre-cursor THP, THP-FFE, and FFE-THP The PAM-4 signaling simulations are conducted using System Verilog, with the Nyquist frequency set to 4 GHz, the TX output swing to 1 V, and consequently, $M_{\text{TX}}$ of a conventional THP to unity. Also, the number of taps is given large enough to remove all post-cursors. Fig. 3.9 shows the SBR and the loss of the simulated channel. The insertion loss is 16 dB at the Nyquist frequency, giving $h_{-1}$ as 0.1485, which is a considerable enough value that can severely degrade the VEM, especially for multi-level signaling. Fig. 3.9 Characteristics of simulated channel single bit response and insertion loss of channel #### 3.3.1 Conventional and Pre-cursor THP The eye diagram at the channel output with a conventional THP is shown in Fig. 3.10(a), featuring additional levels above/below the PAM-4 signal levels. Here, a thick signal line owing the remaining pre-cursor can be noticed. On the other hand, the eye diagram with a pre-cursor THP in Fig. 3.10(b) shows that its signal lines are much thinner as the pre-cursor THP compensates for all ISIs, including the pre-cursor. However, a significantly large number of additional levels appear since $T_i^{\text{pre}}$ s are much larger than unity. Moreover, even though the signal lines are thinner, the $VEM_{\text{pre}THP}$ is shown as about 14 mV, which is much smaller than $VEM_{\text{THP}}$ , which is shown as about 52.8 mV. The eye diagrams at the RX-MOD output are shown in Fig. 7. $M_{\text{RX}}$ with the conventional THP, which is equal to $M_{\text{TX}}H_0$ , is given by 0.3812, making the signal bounded from -190.6 mV to 190.6 mV, as shown in Fig. 3.11(a). On the other hand, $M_{\text{RX}}$ with the pre-cursor THP, which is equal to $M_{\text{TX}}H_{-1}$ , is given by 0.0566, making the signal bounded from -28.3 mV to 28.3 mV, as shown in Fig. 3.11(b). Fig. 3.10 Eye diagrams at channel output (a) conventional THP and (b) pre-cursor THP Fig. 3.11 Eye diagrams at RX-MOD output (a) conventional THP and (b) pre-cursor THP #### 3.3.2 THP-FFE and FFE-THP The eye diagrams at the channel output and the RX-MOD output for a THP-FFE are shown in Fig. 3.12. At the channel output, the eye diagram shows a much thinner signal line than the conventional THP since the 1-tap FFE compensates for the precursor ISI. In addition, as shown in Fig. 3.12(a), applying $a_{-1} = -h_{-1}$ , the VEM is shown as about 72.1 mV, which is much larger than the $VEM_{THP}$ . Applying $a_{-1} = -h_{-1}$ to (3.14), $M_{RX,THP-FFE}$ is given by 0.3072, making the signal bounded from -153.6 mV to 153.6 mV, as shown in Fig. 3.12(b). Similarly, the eye diagrams at the channel output and the RX-MOD output for an FFE-THP are shown in Fig. 3.13. The eye diagram at the channel output shows a much thinner signal line than the conventional THP. In addition, as shown in Fig. 3.13(a), the signal branches above/below the PAM-4 signal are introduced because the modulus of the THP is dictated by the FFE. Applying $a_{-1} = -h_{-1}$ , the VEM is the same as that of the THP-FFE. Moreover, applying $a_{-1} = -h_{-1}$ to (3.15), $M_{\rm RX,FFE-THP}$ is given by 0.4144, making the signal bounded from –207.2 mV to 207.2 mV as shown in Fig. 3.13(b). Fig. 3.12 Eye diagrams of THP-FFE (a) at channel output and (b) at RX-MOD output Fig. 3.13 Eye diagrams of FFE-THP (a) at channel output and (b) at RX-MOD output So far, we propose cascades of a THP and a 1-tap FFE for a channel having a precursor. The tap coefficients, VEMs, and the modulus of TX and RX for a conventional THP, a pre-cursor THP, and two cascade schemes of THP and FFE are derived. The derived tap coefficients and the modulus of RX are applied to the simulation, by which we verified that the VEMs of the proposed topologies outperform conventional THPs. As a result, THP-FFE shows the clearest eye diagram with the largest VEM. However, even though dealing with a pre-cursor may be solved by cascading FFE, the feedback structure of the equalization part of THP remains, which is not a suitable structure for high-speed operation. Therefore, the feed-forward THP, which is able to remove pre-cursor ISI with a pre-tap and to be adopted for high-speed operation, will be presented in the next chapter. #### **Chapter 4** ### Feed-Forward Tomlinson-Harashima Precoding #### 4.1 Design Process of FF-THP The design process of the proposed FF-THP is illustrated in Fig. 4.1. $\{D\}$ and $\{k\}$ denote sequences of the input data and the quotient resulting from a modulo operation for the present data. M represents the modulus of the modulo operation, and M corresponds to the maximum amplitude of the signal range. The FF-THP inherits the traditional THP operation, which has two main functions: a modulo operation to stabilize the output and a feedback equalization to compensate for a channel loss. These two key features are modified to build the FF-THP. Firstly, the modulo operation is replaced by the addition of a predicted modulo value, $\{kM\}$ , to the input as Fig. 4.1 Conversion steps from THP to the proposed FF-THP. (1) conventional THP (2) interpretation of modulo operation (3) modulo prediction (4) proposed FF-THP shown in Fig. 4.1(2) and Fig. 4.1(3), which is essential for the next step of modification. Secondly, the feedback equalizer is reconstructed as the equivalent FFE with pre-taps to remove a pre-cursor ISI, as shown in Fig. 4.1(4). Thus, the proposed FF-THP acquires the ability to remove pre-cursors of a channel as well as keep the modulo operation. The tap coefficients of the FFE are determined to maximize the VEM at the channel output. Because of the increased number of signal levels, FF- THP has some drawbacks requiring a larger input range and more samplers of a receiver than conventional FFE, similar to THP. However, using the structure of FF-THP instead of the feedback equalizer, a feedback time constraint is completely removed in equalization, which enables a high-speed operation. Moreover, a larger eye opening and a larger SNR suitable for multi-level signaling are obtained by predictive modulo operation. #### 4.2 Effectiveness of FF-THP #### 4.2.1 Mathematics in z-domain Response The primary function of an equalizer is providing a response to remove channel ISI. Assuming that a channel has one pre-cursor and N post-cursors, the z-domain responses (ZDRs) of the channel and the normalized channel ( $H_{ch}(z)$ and $h_{ch}(z)$ ) can be represented as (4.1) and (4.2), respectively. $$H_{ch}(z) = H_{-1}z^{1} + H_{0} + H_{1}z^{-1} + \dots + H_{N}z^{-N} = \sum_{i=-1}^{N} H_{i}z^{-i}$$ (4.1) $$h_{\rm ch}(z) = h_{-1}z^{1} + h_{0} + h_{1}z^{-1} + \dots + h_{N}z^{-N} = \sum_{i=-1}^{N} \frac{H_{i}}{H_{0}} z^{-i}$$ (4.2) Where $H_i$ and $h_i$ denote the magnitude of the $i^{th}$ tap of a single-bit response (SBR) and a normalized SBR, respectively. Since $h_{ch}(z)$ is normalized by the main cursor $H_0$ , $h_i$ is equal to $H_i/H_0$ with $h_0=1$ . As shown in Fig. 4.1(1), the feedback filter of the THP is comprised of post-taps concerning only the post-cursor of the normalized SBR and lacks the ability to remove the pre-cursor. Thus, the ZDR of the equalizer having tap coefficients of $h_1$ , $h_2$ , ..., $h_N$ is as follows, and the ZDR of its equivalent FIR implementation becomes (4.3), assuming the convergence of THP. $$H_{\text{THP}}(z) = \frac{1}{1 + h_1 z^{-1} + \dots + h_N z^{-N}}$$ $$= \sum_{i=0}^{N} a_i z^{-i} \left( a_0 = 1, \ a_n = \sum_{i=1}^{n} -h_i a_{n-i}, n = 1, 2, \dots \right)$$ (4.3) On the other hand, both FFE and FF-THP have the ability to compensate precursors by using pre-taps. Since the output range of the TX is limited between -M/2 and M/2, the amplitude adjusting coefficient is necessary for FFE [5]. Including the amplitude adjustment, the ZDRs of the FFE and the FF-THP using tap coefficients $(w_{-2}, w_{-1}, w_0(=1), w_1, ..., and w_N)$ equalizing the channel ISI including the pre-cursor are given below. $$H_{\text{FFE}}(z) = \frac{1}{\sum_{i=-2}^{N'} |w_i|} (\sum_{i=-2}^{N'} w_i z^{-i})$$ (4.4) $$H_{\text{FF-THP}}(z) = \sum_{i=-2}^{N'} w_i z^{-i}$$ (4.5) An expression of VEM can be derived by multiplying a ZDR of a channel and a ZDR of each equalizer. With the combined ZDR, R(z) representing the received signal, VEM in PAM-L signaling is described below. $$R(z) = \sum_{i} R_i z^{-i} \tag{4.6}$$ $$VEM_{R} = \frac{R_{0}}{L-1} - \sum_{i \neq 0} |R_{i}|$$ (4.7) Where $R_i$ denotes the $i^{th}$ coefficient of R(z), when a modulo operation is introduced, the amplitude of the data signal becomes M/L, reduced from M/(L-1) in PAM-L signaling. Therefore, for calculating the VEMs of the THP and the FF-THP, (4.7) must be multiplied by the amplitude ratio of (L-1)/L in (3.1). Calculating R(z) for three equalizers and using (4.7), VEMs are represented by (4.8), (4.9), and (4.10) as follows, assuming that N and N' go to infinity. $$VEM_{THP} = \frac{H_0 (1 - h_1 h_{-1})}{L} - \frac{(L - 1)H_{-1}}{L} \left( 1 + \sum_{n=1}^{\infty} |\sum_{i=1}^{n} h_i a_{n-i}| \right)$$ (4.8) $$VEM_{FFE} = \frac{H_0}{L - 1} \sum_{i=-2}^{1} |w_i| (\sum_{i=-2}^{1} w_i h_{-i}) - \frac{H_0}{\sum_{i=-2}^{1} |w_i|} (\sum_{j \neq 0} \sum_{i=-2}^{1} |w_i h_{-i+j}|)$$ (4.9) $$VEM_{\text{FF-THP}} = \frac{H_0}{L} \left( \sum_{i=-2}^{1} w_i h_{-i} \right) - \frac{(L-1)H_0}{L} \left( \sum_{j \neq 0} \sum_{i=-2}^{1} |w_i h_{-i+j}| \right)$$ (4.10) According to the above equations, as the channel has a larger pre-cursor, $H_{-1}$ , the VEM of the THP becomes smaller. Also, as tap coefficients to compensate channel ISI become larger, the VEM of the FFE becomes smaller than the VEM of the FFTHP. To demonstrate the effect of the pre-cursor and channel ISI, a hypothetical wireline channel is taken as an example with exponentially decaying post-cursors and one pre-cursor. In this case, the channel response in (2.35) and (2.36) can be simplified to (4.11). $$H_{\rm ch}(z) = \frac{1 - h_1}{1 + h_1(1 - h_1)} (h_{-1}z^1 + 1 + \sum_{i=-1} h_i^i z^{-i})$$ (4.11) Also, the ZDR of the THP, (4.3), is recalculated as (4.12). $$H_{\text{THP}}(z) = \frac{1}{1 + h_1 z^{-1} + h_1^2 z^{-2} + \dots} = 1 - h_1 z^{-1}$$ (4.12) Two pre-taps and one post-tap coefficient of FFE and FF-THP can be optimized for channel response (4.11). The tap coefficients are derived based on the partial differentiation of the ISI by each of $w_{-2}$ , $w_{-1}$ , and $w_1$ . The optimized tap coefficients are shown below from (2.52), (2.53), and (2.54). $$w_{-2} = \frac{h_{-1}^{2}}{1 - h_{-1}h_{1}} \tag{4.13}$$ $$w_{-1} = -\frac{h_{-1}}{1 - h_{-1}h_{1}} \tag{4.14}$$ $$w_1 = -h_1(1 - h_{-1}h_1) (4.15)$$ Applying (4.12) ~ (4.15) to (4.8) ~ (4.10), the optimized VEMs of the THP, the FFE, and the FF-THP are featured below with $h_1$ and $h_{-1}$ . $$VEM_{\text{THP}} = \frac{(1 - h_1)(1 - h_{-1}(L - 1 + h_1))}{(1 + (1 - h_1)h_{-1})L}$$ (4.16) $$VEM_{FFE} = \frac{(1 - h_1)}{(1 + h_1)(1 + (1 - h_1)h_{-1})(L - 1)} \times \frac{(1 - 3h_1h_{-1}(1 - h_1h_{-1}) + ((L - 1) - h_1^3)h_{-1}^3)}{(1 + (1 - 2h_1)h_{-1} + (1 - h_1 - h_1^2)h_{-1}^2)}$$ (4.17) $$VEM_{\text{FF-THP}} = \frac{(1 - h_1)}{(1 - h_1 h_{-1})} \times \frac{(1 - 3h_1 h_{-1} (1 - h_1 h_{-1}) + ((L - 1) - h_1^3) h_{-1}^3)}{(1 + (1 - h_1) h_{-1}) L}$$ (4.18) The SBR of the 1-pole channel having a pre-cursor of $h_{-1}$ and the first post-cursor $h_1$ , which is the same as in (4.11), is shown in Fig. 4.2. Fig. 4.2 Normalized single-bit response of a 1-pole channel having a pre-cursor From $(4.16) \sim (4.18)$ , the 3-D graph of calculated VEMs of THP, FFE, and FF-THP with respect to $h_{-1}$ and $h_1$ in PAM-4 and PAM-8 signalings are illustrated in Fig. 4.3 and Fig. 4.4, respectively. The 3-D graphs show that the FF-THP shows the largest VEM among them when channel loss is not small, with both PAM-4 and PAM-8 signaling. It can be noticed that when channel loss is small, the FFE offers the largest VEM because of the amplitude adjustment coefficient of THP and FF-THP in (3.1). Also, THP and FF-THP offer the same VEM when the channel does not have a pre-cursor ISI. However, because the THP lacks pre-cursor controllability, the VEM of THP sharply decreases in PAM-8 signaling compared to PAM-4 signaling. Fig. 4.3 3-D graphs of VEMs of THP, FFE, FF-THP in PAM-4 signaling Fig. 4.4 3-D graphs of VEMs of THP, FFE, FF-THP in PAM-8 signaling The cross-sectional diagrams of 3-D graphs are featured in Fig. 4.5 and Fig. 4.6. For PAM-4 signaling, the cross-sectional cases are $h_1 = 0.5$ and $h_{-1} = 0.2$ , which correspond to ~20dB channel loss. Also, for PAM-8 signaling, the cross-sectional cases are $h_1 = 0.25$ and $h_{-1} = 0.125$ , which correspond to ~10dB channel loss. Fig. 4.5 Cross-sectional diagram of VEMs of THP, FFE, FF-THP in PAM-4 signaling Fig. 4.6 Cross-sectional diagram of VEMs of THP, FFE, FF-THP in PAM-8 signaling Fig. 4.7 Ratio of VEM<sub>FF-THP</sub>/VEM<sub>THP</sub> and VEM<sub>FF-THP</sub>/VEM<sub>FFE</sub> in PAM-4 signaling Fig. 4.8 Ratio of VEM<sub>FF-THP</sub>/VEM<sub>THP</sub> and VEM<sub>FF-THP</sub>/VEM<sub>FFE</sub> in PAM-8 signaling As shown in the plots, the VEM of the FF-THP is the largest among the three. Also, as $h_{-1}$ increases, $VEM_{THP}$ sharply decreases, whereas the $VEM_{FFE}$ and $VEM_{FFE}$ and $VEM_{FFE}$ and $VEM_{FFE}$ increase along with the $h_{-1}$ and $h_{1}$ . As a result, the ratios between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{FFE}$ are shown in Fig. 4.7 and Fig. 4.8. Both the ratio between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{FFE}$ are shown in Fig. 4.7 and Fig. 4.8. Both the ratio between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{FFE}$ are shown in Fig. 4.7 and Fig. 4.8. Both the ratio between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{FFE}$ are shown in Fig. 4.7 and Fig. 4.8. Both the ratio between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{FFE}$ are shown in Fig. 4.7 and Fig. 4.8. Both the ratio between $VEM_{FFE}$ to $VEM_{THP}$ and $VEM_{THP}$ increase as $h_{-1}$ increases or $h_{1}$ increases in both PAM-4 and PAM-8 signaling. This means that as a pre-cursor and post-cursors of a channel increase, the effectiveness of FF-THP compared to THP and FFE becomes more significant. So far, we have verified in mathematics that FF-THP has strength in VEM compared to THP and FFE. In the next section, we will verify the effectiveness of FF-THP in behavior simulation. ### 4.2.2 SystemVerilog Simulation The SystemVerilog simulation is conducted on the THP, FFE, and FF-THP to verify the effectiveness of FF-THP. To simplify the channel, a 1-pole channel having a pre-cursor is used. For PAM-4 signaling, as mentioned before, 0.2 of $h_{-1}$ and 0.5 of $h_1$ channel corresponding to ~20-dB loss channel is used. The SBR of the channel modeled by the step function in Fig. 2.3 is shown in Fig. 4.9. Fig. 4.9 Single-bit response of the 1-pole channel having a pre-cursor ( $h_{-1} = 0.2$ and $h_1 = 0.5$ ) As seen in Fig. 4.9, $h_{-1} = H_{-1}/H_0 = 0.0909/0.4546 = 0.2$ , $h_1 = H_1/H_0 = 0.2273/0.4546 = 0.5$ , and $h_2 = H_2/H_0 = 0.1137/0.4546 = 0.25 = h_1^2$ . Also, $H_{-1} + H_0 + H_1 + H_2 + ...$ is equal to 1, which means that the channel offers a unity gain. Fig. 4.10 Eye diagrams of THP, FFE, FF-THP in PAM-4 signaling compensating for the channel ( $h_1 = 0.2$ and $h_1 = 0.5$ ) The eye diagrams of THP, FFE, and FF-THP are shown in Fig. 4.10. The tap coefficients of THP are the same as the normalized post-cursors of the channel, $h_i = 0.5^i$ , and the tap coefficients of FFE and FF-THP are determined as (4.13) ~ (4.15), which are $w_{-2} = 0.0444$ , $w_{-1} = -0.2222$ , and $w_1 = -0.45$ . The resulted VEMs of THP, FFE, and FF-THP at the center are 50mV, 69mV, and 87mV, respectively. In line with the mathematical evaluation, because the THP lacks pre-cursor controllability, the thickness of the signal, which is corresponding to R(z) in (4.6), is much larger than others, and the $VEM_{THP}$ is the smallest among them. On the contrary to THP, FFE and FF-THP have pre-cursor controllability with pre-taps. Therefore, their thickness of signal level is much smaller, and by virtue of modulo value, FF-THP offers the largest VEM among the three. Furthermore, the eye diagrams of THP, FFE, and FF-THP with Gaussian noise are shown in Fig. 4.11. The standard deviation of the Gaussian noise is 10mV. The resulted VEMs of THP, FFE, and FF-THP at the center are 30mV, 39mV, and 60mV, respectively. The ratio between $VEM_{FF-THP}$ to $VEM_{THP}$ increases from 1.74 to 2.00, and the ratio between $VEM_{FF-THP}$ to $VEM_{FFE}$ increases from 1.26 to 1.54 with the Gaussian noise. Therefore, FF-THP offers the largest VEM, with or without Gaussian noise, compensating for ~20dB loss channel in PAM-4 signaling circumstance. Also, the effectiveness of FF-THP increases even further with the Gaussian noise. Fig. 4.11 Eye diagrams of THP, FFE, FF-THP in PAM-4 signaling compensating for the channel ( $h_{-1} = 0.2$ and $h_1 = 0.5$ ) with Gaussian noise For PAM-8 signaling, the 1-pole channel having 0.125 of $h_1$ and 0.25 of $h_1$ is employed to verify the effectiveness of FF-THP. The SBR of the step-function-based channel is a similar shape to Fig. 4.9 with unity gain. Because the channel has 0.125 of $h_1$ , which cannot be equalized by THP, making $VEM_{THP}$ zero, as shown in Fig. 4.4, SystemVerilog simulation on FFE and FF-THP are conducted. The resulted eye diagram of FFE and FF-THP without and with Gaussian noise are shown in Fig. 4.12 and Fig. 4.13, respectively. Without Gaussian noise, the VEM of FFE is 46.9mV, and the VEM of FF-THP is 69.3mV. However, with Gaussian noise, the VEM of FFE decreases to 28.6mV, and the VEM of FF-THP decreases to 51.7mV. With the noise, the ratio VEMFF-THP/VEMFFE increases from 1.48 to 1.81, which is a 22% increment. As mentioned before, because the FF-THP has SNR gain compared to FFE by virtue of the modulo value, the effectiveness of FF-THP is enlarged with Gaussian noise, which means that the FF-THP has strength on VEM, especially when the multi-level signaling is adopted. We have confirmed in mathematics and behavior simulation that FF-THP adopting PAM-4 and PAM-8 signaling offers larger VEM compared to THP and FFE while compensating for significant channel loss. Also, the effectiveness of FF-THP is even enlarged as the number of signal levels and the channel loss increase. The implementation of multi-level TXs with FF-THP will be featured in the following chapters. Fig. 4.12 Eye diagrams of FFE and FF-THP in PAM-8 signaling compensating for the channel ( $h_{-1} = 0.125$ and $h_1 = 0.25$ ) Fig. 4.13 Eye diagrams of FFE and FF-THP in PAM-8 signaling compensating for the channel ( $h_{-1} = 0.125$ and $h_1 = 0.25$ ) with Gaussian noise ## Chapter 5 # 10 Gb/s PAM-4 Transmitter with FF-THP in 28 nm CMOS ## **5.1 Transmitter Implementation** #### **5.1.1 Overall Architecture** The overall block diagram of the proposed TX with the FF-THP is illustrated in Fig. 5.1 [54]. The digital block of the TX consists of an 8-bit parallel PRBS generator, a modulo prediction engine (MPE), and FFE cells. The analog block includes 4:1 serializers with 1-UI pulse generators, single-to-differential converters (S2Ds), an 8-bit differential digital-to-analog converter (DAC), and a phase-locked loop (PLL) based on a ring oscillator for 1.25-GHz quadrature clocks. The quadrature clock from PLL generates the 1-UI pulses, and the four pass-gates serialize the data with 4-phase of 1-UI pulses. Also, in the DAC, 50 $\Omega$ matching resistors are implemented to remove the reflection from the channel. The externally controlled 10-bit coefficients for the two pre-taps, the main tap, and the ten post-taps in the 4-phase FFE cells accurately compensate channel ISI and maximize the VEM. The operation of the TX is switched between the FFE mode and FF-THP modes to compare the performance of the two equalization methods. Fig. 5.1 Overall block diagram of 10 Gb/s PAM-4 FF-THP Transmitter #### **5.1.2 Modulo Prediction Engine** The structure of the MPE is presented in Fig. 5.2. The inputs of the modulo table cell (MTC) are the two last PAM-4 data ( $D_0$ and $D_1$ ), the modulo values for both data ( $M_0$ and $M_1$ ), and the current PAM-4 data ( $D_2$ ). Then, it generates the modulo value for the current data ( $M_2$ ). It is worth noting that since the MTC depends only on the last two data and the modulo values, it is possible to apply the MTC to another channel if the first and the second post-taps ( $w_1$ and $w_2$ ) are similar to those of a target channel. However, since the MTC considers only $w_1$ and $w_2$ , the residual ISI that are not removed by $w_1$ and $w_2$ may cause modulo prediction error and induce the additional ISI. Because a wireline channel shows a similar response as a one-pole channel, $w_1$ and $w_2$ can sufficiently compensate for the channel response. Therefore, the residual ISI is negligible, and the other tap coefficients are much smaller than $w_1$ and $w_2$ . Also, even if a modulo prediction error occurs, when $D_1$ is -0.375, which corresponds to PAM-4 data 00, whether $M_1$ is 0 or 1, $M_2$ depends on $D_2$ , as shown in the simplified table. Consequently, the modulo prediction error can be self-healed, and the burst error can be prevented. A modulo operation in THP is calculated based on a direct summation of multiplications of data and taps of the feedback equalizer. In MTC, however, a modulo value is predetermined by a channel. Therefore, the burden of digital computation is much reduced. In addition, a modulo look-ahead (MLA) technique is used through 9 modulo prediction units (MPUs), each of which is comprised of two MTCs. They take combination sets of predetermined modulo values of $\{-1, 0, 1\}$ $\{-1, 0, 1\}$ as previous modulo values ( $\{M_0, M_1\}$ ) and generate candidates for $M_2$ and $M_3$ ( $M_2\{-1, -1\}$ ) to $M_2\{1\ 1\}$ and $M_3\{-1\ -1\}$ to $M_3\{1\ 1\}$ ). The candidates are selected by the last modulo values, $M_0$ and $M_1$ . As a result, assisted by the MTC and the MLA technique, the digital computation operates with up to 1.25-GHz clock frequency. To further enhance the data rate, there are two options: increasing the clock frequency and expanding the parallelism. The MTC is designed considering the first and the second post-taps. Still, since the modulo prediction error can be self-healed, the MTC can be simplified so that it only considers $w_1$ at the expense of slight degradation of BER. The simplified version of the MTC can enhance the clock frequency. Moreover, expanding the 4-parallel structure to $2^N$ -parallel can nominally increase the data rate by the factor of N-2. Thus, with the simplified MTC and the expansion of parallelism, the data rate can be increased significantly. Also, the MPE is purely a digital structure; immediate improvements in efficiency and data rate are expected for newer technologies. #### 5.1.3 Feed-Forward Equalizer Fig. 5.3 Structure of one phase of 4-parallel FFE without pipelining The structure of one phase of the 4-parallel FFE is described in Fig. 5.3. The 5-bit sums of data and modulo value are multiplied by the 10-bit tap coefficients. The other phases of the output ( $D_0$ , $D_{90}$ , and $D_{270}$ ) are generated by the same structure but the time-shifted input data. To generate $D_0$ and $D_{90}$ , $D_{-2}+M_{-2}$ and $D_{-1}+M_{-1}$ are required, and they are derived from a one-clock delayed version of $D_2+M_2$ and $D_3+M_3$ . Because of the benefit of the feed-forward structure, the FFE is straightforward for pipeline multiplications and summations. For clarity, the pipelining in the figure is omitted but is implemented in the fabricated chip. As a result, contrary to THP, the digital computation of the FFE does not suffer from the timing issue and operates in high digital clock frequency. The tap coefficients, $w_i$ , corresponding to a specific channel, are determined to maximize a VEM by using the ArgMax function in Mathematica that finds the global maximum with given constraints. Optimized for the same SBR, the ratios between the main tap and the other 12-tap coefficients ( $w_i/w_0$ ) remain the same for the FFE and the FFTHP. Instead, the magnitude of the tap coefficients can be greater for the FF-THP because adding the modulo value guarantees that the output remains within the acceptable input range of the DAC driver. #### **5.1.4 Other Blocks** Fig. 5.4 Structures of components of data path and serializing timing diagram for 10 Gb/s PAM-4 transmitter The structures of components of the data path and serializing timing diagram are shown in Fig. 5.4. The input digital data are retimed to achieve a 4-phase structure by phase aligner and serialized by 1-UI pulse generators and pass-gate MUXs. The timing diagram of serialization is shown on the right side. Fig. 5.5 DNL and INL of 8-bit differential DAC The source-series termination (SST) based differential digital-to-analog converter (DAC) offers lower than 0.2 of differential non-linearity (DNL) and integral non-linearity (INL), as shown in Fig. 5.5. #### **5.2 Measurement Results** #### 5.2.1 Measurement Setup and Transmitter Output 1) Vector signal generator for 78.125 MHz PLL reference clock Fig. 5.6 Measurement setup for 10 Gb/s PAM-4 transmitter The measurement setup for the 10 Gb/s PAM-4 TX is presented in Fig. 5.6. The vector signal generator generates a 78.125 MHz reference clock for PLL that generates a 1.25-GHz clock with a 1/16 divider. To measure the performance of the FFE and the FF-THP, display port cable and SMA cables are used. On the other hand, to measure the transmitter output, the output of the test chip is directly connected to the oscilloscope. Fig. 5.7 Eye diagram and histogram of 10 Gb/s PAM-4 transmitter The TX eye diagram shows the 700 mV swing with a 99.8% of level mismatch ratio (R<sub>LM</sub>). Because the TX is designed to compensate for 20-dB channel loss, which requires large tap coefficients, the TX offers maximum output swing when it operates in equalization mode. Therefore, the swing magnitude in Fig. 5.7, without equalization, is smaller than the actual DAC output range. Fig. 5.8 Measured 10 Gb/s PAM-4 eye diagram and histogram of TX output of FFE and FF-THP (top) and distribution of TX output of FFE and FF-THP (bottom) Fig. 5.8 exhibits the measured 10 Gb/s PAM-4 eye diagram and the histogram of the eye diagram. The eye diagram of the TX output features 800 mV<sub>PP</sub> of the output range. For this measurement, a lossy channel is not added. The distribution at the bottom of Fig. 5.8 shows the centralized signal when the TX operates in FFE mode. On the other hand, when the TX operates in FF-THP mode, the signal of the FF-THP is evenly distributed. Because of the widespread distribution, the FF-THP features better SNR than the FFE. #### **5.2.2** Channel Response and Equalization Results Fig. 5.9 Measured insertion loss and normalized single bit response of the channel for 10 Gb/s PAM-4 transmitter The insertion loss and the normalized SBR of the measured channel are presented in Fig. 5.9. The channel loss is 21 dB at the Nyquist frequency of 2.5 GHz with the first post-cursor of the channel around 0.5, which is the natural response of ~20-dB channel, as mentioned before. Also, the sum of the normalized ISI of the SBR is 1.48 times greater than that of the main cursor Before representing the measurements of the channel output of the proposed TX, it is necessary to mention a method that indirectly evaluates the BER performance of TX [55]. Assuming that Gaussian noise is added to the output data, BER for the PAM-*L* signal and the decision threshold of the data *X* is represented by (5.1) and (5.2). $$BER_{L} = \frac{L-1}{L} erfc(\frac{d}{2\sqrt{2}\sigma}) \times \log_{2} L$$ (5.1) $$Decision Threshold(X) = Mean(histo.(X))$$ $$\pm Q^{-1}(BER)Std.(histo.(X))$$ (5.2) Where d and $\sigma$ denote the magnitude of data and the standard deviation of Gaussian noise, respectively. d can be substituted by the difference between the mean of X and the data adjacent to X, and $\sigma$ can be substituted by the standard deviation of X, respectively. Means and standard deviations of each PAM-4 data level can be obtained from the histogram of the received signal. Fig. 5.10 Measured 10 Gb/s eye diagram of FFE (top left) and FF-THP (bottom left), histogram of FFE (top right), and FF-THP (bottom right) Fig. 5.10 exhibits the measured 10 Gb/s PAM-4 eye diagrams of the fabricated chip compensating the channel. When TX operates in the FF-THP mode, additional two levels appear along with the conventional PAM-4 levels as expected. The proposed FF-THP achieves $R_{LM}$ of 99.1%, and the VEM is improved by 38.9% compared with the FFE. From the histograms, the means and standard deviations of the data signal are obtained. Fig. 5.11 Calculated decision threshold voltage of FFE and FF-THP and estimated bathtub curve of FFE and FF-THP Estimated based on Gaussian distribution, the decision thresholds and the bathtub curves of the FFE and the FF-THP are presented in Fig. 5.11. The proposed FF-THP achieves a BER lower than 10<sup>-8</sup> at the center of the eye and an 87.5% increased horizontal eye margin (HEM) compared with the FFE at the BER of 10<sup>-5</sup>. ### **5.2.3** Chip Photograph and Performance Summary Fig. 5.12 Chip photomicrograph of 10 Gb/s PAM-4 transmitter | | Blocks | Area | Power | | | |---|----------------------------|------------------------|---------|--|--| | 1 | PRBS Gen.<br>+ FF-THP | 0.0322 mm <sup>2</sup> | 32 mW | | | | 2 | Pulse Gen.<br>+ Serializer | 0.0026 mm <sup>2</sup> | 4.1 mW | | | | 3 | DAC + S2D | 0.0094 mm <sup>2</sup> | 22.2 mW | | | | 4 | PLL | 0.0304 mm <sup>2</sup> | 1.7 mW | | | | | Total | 0.0746 mm <sup>2</sup> | 60 mW | | | Fig. 5.13 Area and power breakdown at 10 Gb/s PAM-4 with FF-THP Fig. 5.12 features the chip photomicrograph. The proposed TX occupies an active area of 0.075 mm<sup>2</sup>. The power and the area breakdown of the fabricated chip are presented in Fig. 5.13. The digital area is 0.0322 mm<sup>2</sup> which takes 53.3% of total power. Without the PRBS generator, the FF-THP solely occupies 0.022 mm<sup>2</sup>. With a 1-V supply, the total power consumptions of digital and analog blocks are 32 mW and 28 mW, respectively. Table 5.1 compares the performance of the proposed FF-THP based TX with other PAM-4 TXs that compensate for a high channel loss or large ISI. The sum of channel ISI is an important parameter because VEMs of TX equalizers depend on it. Also, asymmetric link such as memory interface has multi drops, which are indicated by not the channel loss at Nyquist frequency but the sum of channel ISI. From the point of view of a channel ISI, the proposed design, assisted by the pretaps and the modulo-based signaling, can compensate for 1.48 of the sum of the normalized ISI, which is the largest. As a result, the FF-THP achieves the best FoM<sub>2</sub> of 4.05 pJ/b/ISI with lower than 10<sup>-8</sup> BER. | † Equalization on | FoM <sub>2</sub> (pJ/b/ISI)†††† | FoM <sub>1</sub> (pJ/b) | Estimated BER | Sum of normalized ISI††† ( | Channel loss [dB] | Active area [mm²] | Area of equalizer [mm <sup>2</sup> ]†† | Power [mW]† | Data rate [Gb/s] | Signal levels on eye<br>diagram | Number of DAC bits | Number of taps (pc | TX equalization | Technology 22 | JS: | |-----------------------------------------------|---------------------------------|-------------------------|------------------------|----------------------------|-------------------|-------------------|----------------------------------------|-------------|------------------|---------------------------------|--------------------|------------------------|--------------------|----------------|--------------------| | * Calculated based on the number of TX slicer | 5.18 | 1.71 | ı | 0.33** | 12 | 0.019 | 0.017 | 17.1 | 10 | 6 | 6 | 8<br>(post only) | THP | 22 nm SOI | JSSC 2013<br>[35] | | | 4.79 | 2.39 | - | 0.5 | | - | 0.023 | 268 | 112 | 5 | 6 - 8 | 8<br>(post only) | Table-based<br>FFE | 14 nm | SOVC 2018<br>[46] | | | 8.29 | 5.8 | - | 0.7 | - | - | 0.053 | 34.8 | 6 | 4 | 4.0* | 20<br>(post only) | MPC | 28 nm<br>FDSOI | ASSCC 2016<br>[47] | | | 5.56 | 4.63 | | 0.83** | 13 | 0.4323 | - | 926 | 200 | 4 | 3.2* | 5 | FFE | 28 nm | ISSCC 2021<br>[56] | | | 6.29 | 4.96 | <10 <sup>-12</sup> *** | 0.788** | 13.5 | 0.06 | 0.0154 | 158.6 | 32 | 4 | 4 | 1(pre)<br>+ 1(post) | FFE | 65 nm | JSSC 2017<br>[20] | | | 4.05 | 6.0 | <10 <sup>-8</sup> | 1.48 | 21 | 0.0746 | 0.022 | 60 | 10 | 6 | 8 | 2 (pre)<br>+ 10 (post) | FF-THP | 28 nm | This work<br>[54] | ††† Sum of ISI in SBR normalized by main cursor †† Area of precoder and equalizer †††† (Power)/(Data rate)/(Sum of normalized ISI) Table 5.1 Performance summary and comparison for 10 Gb/s PAM-4 transmitter <sup>\*\*</sup> Estimated by single bit response <sup>\*\*\*</sup> Measured by RX chip ## Chapter 6 # 42 Gb/s PAM-8 Transmitter with FF-THP in 28 nm CMOS ## **6.1 Transmitter Implementation** #### **6.1.1 Overall Architecture** The overall block diagram of the proposed 42Gb/s PAM-8 FF-THP is shown in Fig. 6.1 [57]. The 16-bit parallel PRBS generators, the 16-parallel MPE, and the 3-tap FFEs comprised of a pre-tap and two post-taps are included in the synthesized digital block operating at 875 MHz. Although not shown in Fig. 6.1 for clarity, the pipelining of the FFE cell is implemented to achieve the digital clock frequency. Then, 16-parallel 6-bit data are serialized with the help of the phase aligner, the pass-gate MUXs, the 1-UI pulse generators, and 16:1 serializers. The serialized data pass through the single-to-differential (S2D) circuit driving the source-series termination-based differential 6-bit DAC. Also, in the DAC, 50 $\Omega$ matching resistors are implemented to remove a reflection from a channel. For 16-phase clock generation, the digitally-controlled delay cells (DCDC) composed of an inverter-based voltage-controlled delay cell (VCDC) and a 6-bit resistive-DAC (RDAC) are implemented. ### **6.1.2 Modulo Prediction Engine** Fig. 6.2 Structure of 16-parallel modulo prediction engine The 16-parallel MPE is illustrated in Fig. 6.2. The MTC, which generates the modulo value $(M_1)$ , is dictated by the previous data $(D_0)$ , the modulo value $(M_0)$ , and the present data ( $D_1$ ). The first post-tap is assumed as 0.25, and the table is shown in Fig. 6.3. | | $M_0 = -1$ | $M_0 = 0$ | | | | | | $M_0 = 1$ | | | |------------------|------------------|------------------|------------------|------------------|------------------|-----------------|-----------------|-----------------|-----------------|-----------------| | $D_0$ | -0.5625<br>(111) | -0.4375<br>(000) | -0.3125<br>(001) | -0.1875<br>(010) | -0.0625<br>(011) | 0.0625<br>(100) | 0.1875<br>(101) | 0.3125<br>(110) | 0.4375<br>(111) | 0.5625<br>(000) | | -0.4375<br>(000) | -0.2969 | -0.3281 | -0.3594 | -0.3906 | -0.4219 | -0.4531 | -0.4844 | -0.5156 | -0.5469 | -0.5781 | | -0.3125<br>(001) | -0.1719 | -0.2031 | -0.2344 | -0.2656 | -0.2969 | -0.3281 | -0.3594 | -0.3906 | -0.4219 | -0.4531 | | -0.1875<br>(010) | -0.0469 | -0.0781 | -0.1094 | -0.1406 | -0.1719 | -0.2031 | -0.2344 | -0.2656 | -0.2969 | -0.3281 | | -0.0625<br>(011) | 0.0781 | 0.0469 | 0.0156 | -0.0156 | -0.0469 | -0.0781 | -0.1094 | -0.1406 | -0.1719 | -0.2031 | | 0.0625<br>(100) | 0.2031 | 0.1719 | 0.1406 | 0.1094 | 0.0781 | 0.0469 | 0.0156 | -0.0156 | -0.0469 | -0.0781 | | 0.1875<br>(101) | 0.3281 | 0.2969 | 0.2656 | 0.2344 | 0.2031 | 0.1719 | 0.1406 | 0.1094 | 0.0781 | 0.0469 | | 0.3125<br>(110) | 0.4531 | 0.4219 | 0.3906 | 0.3594 | 0.3281 | 0.2969 | 0.2656 | 0.2344 | 0.2031 | 0.1719 | | 0.4375<br>(111) | 0.5781 | 0.5469 | 0.5156 | 0.4844 | 0.4531 | 0.4219 | 0.3906 | 0.3594 | 0.3281 | 0.2969 | Fig. 6.3 Modulo table cell for PAM-8 In a similar way, the table cell can be generated for other targeted channels. The corresponding modulo-generating logic is represented as below. $$M_1 = \begin{cases} -1, & (D_1 = 0.4375) \bigcap ((M_0 = 1) \bigcup ((D_0 < -0.25) \bigcap (M_0 = 0))) \\ 1, & (D_1 = -0.4375) \bigcap ((M_0 = -1) \bigcup ((D_0 > 0.25) \bigcap (M_0 = 0))) \\ 0, & otherwise \end{cases}$$ (6.1) Because of the simplicity of MTC, the area and the operating time are significantly reduced compared to conventional THP. In addition, to reduce the feedback time, an MLA technique is introduced. To construct the 16-parallel structure, MTCs are grouped into 2/2/2/3/3/4, which operate simultaneously. The green-colored 2-MTC groups, the yellow-colored 3-MTC groups, and the blue-colored 4-MTC groups receive a predetermined modulo value, which is one of $\{-1, 0, 1\}$ . The modulo values generated by 2/3/4-MTCs, which receive the predetermined modulo value, are selected by the previously determined modulo value. For example, M2 and M3 are selected by $M_1$ as one of $\{M_2^1, M_2^0, M_2^{-1}\}$ and $\{M_3^1, M_3^0, M_3^{-1}\}$ , respectively. Similarly, $M_3$ , $M_5$ , $M_8$ , and $M_{11}$ determine $M_{4-5}$ , $M_{6-8}$ , $M_{9-11}$ , and $M_{12-15}$ . Also, the delay of MTC and MUX follows the relationship below. $$4T_{\text{MUX}} > 2T_{\text{MTC}} > 3T_{\text{MUX}} \tag{6.2}$$ Where $T_{\rm MTC}$ and $T_{\rm MUX}$ denote the delay of MTC and MUX, respectively. Considering (6.2), the delay of the blue-colored critical path of the MPE in Fig. 6.2 is given by $2T_{\rm MTC} + 5T_{\rm MUX}$ , which is the smallest for 16-parallel MPE structures. The various 16-parallel MPE structures are shown in Fig. 6.4. Although the critical path delay of the MPE comprised of 1/1/2/2/3/3/4 MTC groups is equal to $2T_{\rm MTC} + 5T_{\rm MUX}$ , the proposed MPE structure provides a small area of more than 4% considering the parallelism and the MLA technique. Fig. 6.4 Various 16-parallel MPE structures and their critical path delay | Stage | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Critical<br>path delay | #. of MTC and MUX<br>(considering MLA) | |-------------------------|-------------------|----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|------------------------------------------|--------------------------------------------| | #. of MTC at each stage | _ | _ | _ | 2 | ω | 4 | 4 | | MTC: 1 + 3x15 = 46<br>MUX: 16 - 1 = 15 | | Max. delay at the stage | $T_{MTC}$ | T <sub>MTC</sub><br>+T <sub>MUX</sub> | $T_{MTC}$ +2 $T_{MUX}$ | $T_{MTC}$ +3 $T_{MUX}$ | $T_{MTC}$ +4 $T_{MUX}$ | 4T <sub>MTC</sub><br>+T <sub>MUX</sub> | 4T <sub>MTC</sub><br>+2T <sub>MUX</sub> | 4T <sub>MTC</sub><br>+ 2T <sub>MUX</sub> | | | #. of MTC at each stage | 1 | 1 | 2 | 2 | 3 | 3 | 4 | | MTC: 1 + 3x15 = 46<br>MUX: 16 - 1 =15 | | Max. delay at the stage | $T_{MTC}$ | $T_{MTC} \ + T_{MUX}$ | 2T <sub>MTC</sub><br>+T <sub>MUX</sub> | 2T <sub>MTC</sub><br>+2T <sub>MUX</sub> | 2Τ <sub>ΜΤC</sub><br>+3Τ <sub>Μυχ</sub> | 2Τ <sub>ΜΤC</sub><br>+4Τ <sub>ΜUX</sub> | 2T <sub>MTC</sub><br>+5T <sub>MUX</sub> | 2Τ <sub>ΜΤC</sub><br>+ 5Τ <sub>ΜUX</sub> | | | #. of MTC at each stage | 1 | 2 | 2 | 3 | 4 | 4 | - | | MTC: 1 + 3x15 = 46<br>MUX: 16 - 1 =15 | | Max. delay at the stage | $T_{MTC}$ | 2T <sub>MTC</sub><br>+T <sub>MUX</sub> | 2T <sub>MTC</sub><br>+2T <sub>MUX</sub> | 2T <sub>MTC</sub><br>+3T <sub>MUX</sub> | 4T <sub>MTC</sub><br>+T <sub>MUX</sub> | 4Τ <sub>ΜΤC</sub><br>+2Τ <sub>Μ∪Χ</sub> | - | 4Τ <sub>ΜΤC</sub><br>+ 2Τ <sub>ΜUX</sub> | | | #. of MTC at each stage | 2 | 2 | 2 | 3 | 3 | 4 | - | | MTC: 2 + 3x14 = 44<br>MUX: 16 - 2 =14 | | Max. delay at the stage | $2T_{\text{MTC}}$ | 2Τ <sub>ΜΤC</sub><br>+Τ <sub>Μυχ</sub> | 2T <sub>MTC</sub><br>+2T <sub>MUX</sub> | 2Τ <sub>ΜΤC</sub><br>+3Τ <sub>ΜUX</sub> | 2Τ <sub>ΜΤC</sub><br>+4Τ <sub>ΜUX</sub> | 2Τ <sub>ΜΤC</sub><br>+5Τ <sub>ΜUX</sub> | - | 2Τ <sub>ΜΤC</sub><br>+ 5Τ <sub>ΜUX</sub> | | | #. of MTC at each stage | 2 | 3 | 3 | 4 | 4 | - | - | | MTC: $2 + 3x14 = 44$<br>MUX: $16 - 2 = 14$ | | Max. delay at the stage | 2T <sub>MTC</sub> | 3T <sub>MTC</sub><br>+T <sub>MUX</sub> | 3Τ <sub>ΜΤC</sub><br>+2Τ <sub>ΜUX</sub> | 4T <sub>MTC</sub><br>+T <sub>MUX</sub> | $4T_{MTC}$<br>+ $2T_{MUX}$ | - | - | 4Τ <sub>ΜΤC</sub><br>+ 2Τ <sub>ΜUX</sub> | | | #. of MTC at each stage | 2 | 2 | 3 | 4 | 5 | - | - | | MTC: 2 + 3x14 = 44<br>MUX: 16 - 2 =14 | | Max. delay at the stage | 2T <sub>MTC</sub> | 2T <sub>MTC</sub><br>+T <sub>MUX</sub> | 3T <sub>MTC</sub><br>+T <sub>MUX</sub> | 4T <sub>MTC</sub><br>+T <sub>MUX</sub> | 5T <sub>MTC</sub><br>+T <sub>MUX</sub> | - | - | 5T <sub>MTC</sub><br>+ T <sub>MUX</sub> | | | #. of MTC at each stage | З | 4 | 4 | 5 | - | - | - | | MTC: 3 + 3x13 = 44<br>MUX: 16 - 3 =13 | | Max. delay at the stage | $3T_{\text{MTC}}$ | 4T <sub>MTC</sub><br>+T <sub>MUX</sub> | 4T <sub>MTC</sub><br>+2T <sub>MUX</sub> | $4T_{MTC}$<br>+ $3T_{MUX}$ | - | - | - | 4T <sub>MTC</sub><br>+ 3T <sub>MUX</sub> | | | ( | | ··WOX | · = · WOX | | | | | . ( - MOA | | #### 6.1.3 Other Blocks Fig. 6.5 Structures of components of data path and serializing timing diagram for 42 Gb/s PAM-8 transmitter Similar to previous PAM-4 TX, 42 Gb/s PAM-8 TX adopts the 1-UI pulse generator and pass-gate MUX-based serialization. However, DCDC based digitally-controlled delay line (DCDL), whose delay range covers 71.4 ps, corresponding to the period of a 14-GHz clock, generates a 16-phase clock. The timing diagram is shown on the right side of Fig. 6.5. Fig. 6.6 Characteristics of DAC DNL & INL (top) and output resistance (bottom) The characteristics of SST-based differential 6-bit DAC are shown in Fig. 6.6. The DAC offers reasonable DNL and INL. Also, the around 50 $\Omega$ output resistance, which is matched to channel impedance to remove the reflection, is designed at Nyquist frequency. #### **6.2 Measurement Results** ### **6.2.1** Measurement Setup and Transmitter Output Fig. 6.7 Measurement setup for 42 Gb/s PAM-8 FF-THP transmitter Similar to the previous 10 Gb/s PAM-4 TX, the vector signal generator provides the reference clock, and the data pattern is examined by an oscilloscope. For a 16-phase clock for 42 Gb/s PAM-8 data, an 875 MHz reference clock is used, and a 7.7-dB loss channel is used to examine the effectiveness of equalization techniques. Fig. 6.8 Measured eye diagram of 42 Gb/s PAM-8 transmitter Fig. 6.9 Measured eye diagram of 32 Gb/s PAM-4 transmitter Fig. 6.8 shows the measured 42Gb/s PAM-8 eye diagram at the TX output, giving 1 $V_{ppd}$ with the average VEM of 51.9 mV and HEM of 20 ps. Also, the proposed TX can operate in PAM-4 signaling, similar to the previous TX. Fig. 6.9 shows the 32 Gb/s PAM-4 eye diagram at the TX output. It offers 1 $V_{ppd}$ with 240 mV of VEM with good linearity. ### **6.2.2** Channel Response and Equalization Results Fig. 6.10 Measured insertion loss and single bit response of the channel for 42 Gb/s PAM-8 transmitter The measured channel characteristics are shown in Fig. 6.10. The SBR, with a 7.7-dB channel loss at Nyquist frequency, exhibits a pre-cursor and significant post-cursor ISI, which are around 0.125 of $h_{-1}$ and 0.25 of $h_1$ , respectively. Fig. 6.11 Measured 42 Gb/s eye diagrams of channel outputs FFE (left) and FF-THP (right) Shown in Fig. 6.11 are the eye diagrams at the channel output with the FFE and the FF-THP, the latter exhibiting additional signal levels above/below the PAM-8 signals, with a 55% increase in average VEM compared with the former. In addition, the FF-THP provides a larger HEM compared to the FFE because its distributed output gives greater SNR by virtue of the modulo value. Fig. 6.12 Measured insertion loss and normalized single bit response of the channel for 32 Gb/s PAM-4 transmitter Fig. 6.13 Measured 32 Gb/s eye diagrams of channel outputs FFE (left) and FF-THP (right) The measured channel characteristics for 32 Gb/s PAM-4 TX are shown in Fig. 6.12. The SBR, with a 12.1-dB channel loss at Nyquist frequency of 8 GHz, exhibits a pre-cursor and significant post-cursor ISI, which are around 0.2 of $h_{-1}$ and 0.3 of $h_{1}$ , respectively. The 32 Gb/s eye diagrams at the channel output with the FFE and the FF-THP are shown in Fig. 6.13, the latter exhibiting additional signal levels above/below the PAM-4 signals. FF-THP offers more balanced eye height because of the SNR gain. ### **6.2.3** Chip Photograph and Performance Summary Fig. 6.14 Chip photomicrograph of 42 Gb/s PAM-8 transmitter | | Blocks | Area (mm²) | Power (mW) | | |---|---------------------|------------|------------|--| | 1 | DAC + S2D | 0.0025 | 20.22 | | | 2 | Serializer | 0.0056 | 9.52 | | | 3 | VCDL+RDAC | 0.0334 | 9.25 | | | 4 | Synthesized Digital | 0.0288 | 27.37 | | | | Total | 0.0703 | 66.36 | | Fig. 6.15 Area and power breakdown at 42 Gb/s PAM-8 with FF-THP The chip photo and area/power breakdown are in Fig. 6.14 and Fig. 6.15. The proposed FF-THP TX prototype fabricated in 28 nm CMOS technology occupies an active area of 0.0703 mm<sup>2</sup> and consumes 66.36 mW. The prototype chip operates at 42Gb/s in PAM-8, achieving energy efficiency of 1.58 pJ/b. The synthesized digital block, occupying 0.0288 mm<sup>2</sup>, consumes 41% of the total power consumption. The performance summary and comparisons with high-speed multi-level signaling TXs are shown in Table 6.1. The proposed FF-THP offers the lowest FoM of 1.58 pJ/b while compensating for the 7.7-dB channel loss. Also, the highest data rate among the TXs introduced 3-bit/Baud modulation. Active area [mm<sup>2</sup>] Channel loss [dB] Technology [nm] Data rate [Gb/s] Number of taps TX equalization method Number of data levels on eye Power [mW] Modulation Bit/symbol FoM (pJ/b) diagram Table-based **SOVC 2018** PPAM-5 2.39 揺 268 112 21\* [46] 4 N ω $\Omega$ **JSSC 2022** 0.4323PAM-4 4.63 926 표 200 [58] 28 $\frac{1}{3}$ S 2 4 **JSSC 2022** PAM-4 0.088 2.25 504 퓨 4.3 224 [59] 6 2 4 $\infty$ **ESSCIRC 2019** 8.66\*\* PAM-8 342.9 14.0\* 0.39 39.6 [60] 65 $\infty$ ω \* Receiver Eq. **SOVC 2021 SNRE-8** 89.77 3.32 0.7\*\* 퓨 [61] ဖွ 27 65 ယ ယ $\infty$ \*\* Tranceiver This work FF-THP 0.0703 PAM-8 66.36 1.58 7.7 [57] 28 42 6 ယ ω Table 6.1 Performance summary and comparison for 42 Gb/s PAM-8 transmitter Conclusions 108 # Chapter 7 ## **Conclusions** An increase in bandwidth is inevitable according to the need for an increase in data rate, and the need for compensation of channel loss and multi-level signaling method increases. In the case of multi-level signaling, a degradation in SNR is inevitable compared to NRZ signaling. In addition, the pre-cursor can increase as the portion of the rise/fall time increases, which is a consequence of increased data bandwidth, and it is necessary to remove it. In this regard, THP, which can achieve SNR improvement, is introduced, and several variations to remove a pre-cursor using it are presented. In this dissertation, high-speed multi-level TXs introducing the FF-THP are presented. The proposed FF-THP takes both advantages of the modulo-based equalization and the controllability over a pre-cursor. Moreover, the quantitative z-domain analysis on channel response and the equalization parts of the THP, the FFE, and the FF-THP is conducted. A simple one-pole channel with one precursor is employed to demonstrate the repercussions of a pre-cursor and the effec- Conclusions 109 tiveness of the FF-THP. From the analysis, the FF-THP shows the largest VEM among the TX equalization methods when the channel has a pre-cursor or large ISI. The two high-speed multi-level TX adopting FF-THP were fabricated in 28 nm CMOS technology. The first chip is a 10 Gb/s PAM-4 TX with FF-THP. A MPE and 12-tap FFE, including two pre-tap, are designed in a 4-parallel structure, which is matched to a 1.25 GHz 4-phase clock generated by PLL. The FFE tap coefficients are optimized to compensate for the 21-dB loss channel appropriately. The proposed FF-THP presents 87.5% wider HEM at the estimated BER of $10^{-5}$ and 38.9% larger VEM compared with the FFE. SST-based 8-bit DAC driver is designed to offer reasonable DNL and INL with 50 $\Omega$ matching. The TX achieves a data rate of 10 Gb/s PAM-4 with a power efficiency of 6.0 pJ/b while compensating for 21-dB loss and occupying the active area of 0.0746 mm<sup>2</sup>. The second chip presents a 42 Gb/s PAM-8 FF-THP TX. The MPE and FFE in the synthesized digital block are designed and optimized to achieve a 16-parallel structure and high-speed operation while compensating for 7.7-dB channel loss. 16-phase clock is generated by RDAC-based DCDL, and 1-UI pulse generator based 16-to-1 serializers are used to offer 14 Gbaud data. SST-based 6-bit DAC driver, which is adopted to enhance the power efficiency, shows 50 $\Omega$ matching with reasonable DNL and INL. These efforts have advanced the state-of-the-art 3-bit/Baud TX data rate of 42 Gb/s and power efficiency of 1.58 pJ/b with the active area of 0.0703 mm<sup>2</sup>. The effectiveness of FF-THP is verified in mathematics, simulation, and measurement result. Moreover, the digital-based equalization technique can take full advantage of process scaling. # **Bibliography** - [1] R. Balodis and I. Opmane, "History of Data Centre Development. In: Tatnall, A. (eds) Reflections on the History of Computing," *IFIP Advances in Information and Communication Technology*, vol. 387, Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-642-33899-1\_13. - [2] A. Rashid, "Data Center Architecture Overview," *National Accademy for Planning and Development (NAPD)*, 2018-2019. - [3] A. D. Papaioannou, R. Nejabati, and D. Simeonidou, "The Benefits of a Disaggregated Data Centre: A Resource Allocation Approach," *IEEE Global Communications Conference (GLOBECOM)*, Dec. 2016. - [4] Cisco, "Cisco Global Cloud Index: Forecast and Methodology, 2016–2021 white paper," *Online* (accessed: Dec. 05, 2022) https://virtualization.network/Resources/Whitepapers/0b75cf2e-0c53-4891-918e-b542a5d364c5\_white-paper-c11-738085.pdf (2018). - [5] ISSCC, "2022 Press Kit," *Online* (accessed: Dec. 05, 2022), https://www.isscc.org/past-conferences (2022). - [6] A. Roshan-Zamir *et al.*, "A 56-Gb/s PAM4 Receiver With Low-Overhead Techniques for Threshold and Edge-Based DFE FIR- and IIR-Tap Adapta- tion in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, Mar. 2019. - [7] K.-C. Chen, W. W.-T. Kuo, and A. Emami, "A 60-Gb/s PAM4 Wireline Receiver With 2-Tap Direct Decision Feedback Equalization Employing Track-and-Regenerate Slicers in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol.56, no. 3, Mar. 2021. - [8] E. Depaoli et al., "A 64 Gb/s Low-Power Transceiver for Short-Reach PAM-4 Electrical Links in 28-nm FDSOI CMOS," IEEE J. Solid-State Circuits, vol. 54, no. 1, Jan. 2019. - [9] M. Pisati et al., "A 243-mW 1.25–56-Gb/s Continuous Range PAM-4 42.5-dB IL ADC/DAC-Based Transceiver in 7-nm FinFET," IEEE J. Solid-State Circuits, vol. 55, no. 1, Jan. 2020. - [10] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2018. - [11] T. Ali *et al.*, "6.4 A 180 mW 56Gb/s DSP-based transceiver for high density IOs in data center switches in 7nm FinFET technology," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2019. - [12] J. Im *et al.*, "A 112Gb/s PAM-4 Long-Reach Wireline Transceiver Using a 36-Way Time-Interleaved SAR-ADC and Inverter-Based RX Analog Front-End in 7nm FinFET," *IEEE J. Solid-state* Circuits, vol. 56, no. 1, Jan. 2021. [13] A. Atharav and B. Razavi, "A 56-Gb/s 50-mW NRZ Receiver in 28nm CMOS, *IEEE J. Solid-State Circuits*, vol. 57, no. 1, Jan. 2022. - [14] T. Ali *et al.*, "6.2 A 460 mW 112Gb/s DSP-based transceiver with 38dB loss compensation for next-generation data centers in 7nm FinFET technology," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2020. - [15] E. Depaoli *et al.*, "6.6 A 4.9pJ/b 16-to-64Gb/s PAM-4 VSR transceiver in 28nm FDSOI CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2018. - [16] A. Cevrero *et al.*, "6.1 A 100Gb/s 1.1pJ/b PAM-4 RX with dual-mode 1-tap PAM-4 / 3-tap NRZ speculative DFE in 14nm CMOS FinFET, in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2019. - [17] J. Im *et al.*, "A 40-to-56 Gb/s PAM-4 receiver with ten-tap direct decision-feedback equalization in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, Dec. 2017. - [18] P.-J. Peng, J.-F. Li, L.-Y. Chen, and J. Lee, "A 56Gb/s PAM-4/NRZ transceiver in 40nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017. - [19] J. Han, N. Sutardja, Y. Lu, and E. Alon, "Design Techniques for a 60-Gb/s 288-mW NRZ Transceiver With Adaptive Equalization and Baud-Rate Clock and Data Recovery in 65-nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 53, no. 12, Dec. 2017. [20] A. Roshan-Zamir, O. Elhadidy, H.-W. Yang, and S. Palermo, "A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 52, no. 9, Sept. 2017. - [21] J. Bailey *et al.*, "A 112-Gb/s PAM-4 Low-Power Nine-Tap Sliding-Block DFE in a 7-nm FinFET Wireline Receiver," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, Jan. 2022. - [22] H. Lin *et al.*, "ADC-DSP-Based 10-to-112-Gb/s Multi-Standard Receiver in 7-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 56, no. 4, Apr. 2021. - [23] S. Shahramian *et al.*, "1.41pJ/b 56Gb/s PAM-4 Wireline Receiver Employing Enhanced Pattern Utilization CDR and Genetic Adaptation Algorithms in 7nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2019. - [24] S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, "A 52-Gb/s ADC-Based PAM-4 Receiver With Comparator-Assisted 2-bit/Stage SAR ADC and Partially Unrolled DFE in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, Mar. 2019. - [25] L. Wang, Y. Fu, M.-A. LaCroix, E. Chong, and A. C. Carusone, "A 64-Gb/s 4-PAM Transceiver Utilizing an Adaptive Threshold ADC in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol.54, no. 2, Feb. 2019. - [26] B. Ye *et al.*, "6.3 A 2.29pJ/b 112Gb/s Wireline Transceiver with RX 4-Tap FFE for Medium-Reach Applications in 28nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. [27] B. Zand *et al.*, "6.6 A 1-58.125Gb/s, 5-33dB IL Multi-Protocol Ethernet-Compliant Analog PAM-4 Receiver with 16 DFE Taps in 10nm," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. - [28] Y. Segal *et al.*, "6.1 A 1.41pJ/b 224Gb/s PAM-4 SerDes Receiver with 31dB Loss Compensation," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. - [29] G. Gangasani *et al.*, "6.5 A 1.6Tb/s Chiplet over XSR-MCM Channels using 113Gb/s PAM-4 Transceiver with Dynamic Receiver-Driven Adaptation of TX-FFE and Programmable Roaming Taps in 5nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. - [30] N. Kocaman *et al.*, "6.4 An 182mW 1-60Gb/s Configurable PAM-4/NRZ Transeiver for Large Scale ASIC Intergration in 7nm FinFET Technology," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. - [31] Z. Guo *et al.*, "6.2 A 112.5Gb/s ADC-DSP-Based PAM-4 Long-Reach Transeiver with >50dB Channel Loss in 5nm FinFET," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2022. - [32] IEEE 802.3 Ethernet Working Group, *Online* (accessed: Dec. 05, 2022) http://www.ieee802.org/3 - [33] InfiniBand Trade Association, *Online* (accessed: Dec. 05, 2022) http://www.infinibandta.org - [34] INCITS Fibre Channel Technical Committee (T11), online (accessed: Dec. - 05, 2022) http://www.incits.org/committees/t11 - [35] M. Kossel, *et al.*, "A 10 Gb/s 8-Tap 6b 2-PAM/4-PAM Tomlinson-Harashima Precoding Transmitter for Future Memory-Link Applications in 22-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3268-3284, Dec. 2013. - [36] G.-S. Jeong, B. Kang, H. Ju, K. Park, and D.-K. Jeong, "A Modulo-FIR Equalizer for Wireline Communications," *IEEE Trans. on Circuits and Systems I*, vol. 66, no. 11, pp. 4278-4286, Nov. 2019. - [37] S. Ibrahim and B. Razavi, "Low-Power CMOS Equalizer Design for 20-Gb/s Systems," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1321-1336, June 2011. - [38] Y. Lu and E. Alon, "Design techniques for a 66 Gb/s 46 mW 3-tap decision feedback equalizer in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3243-3257, Dec. 2011. - [39] J. Han, Y. Lu, N. Sutardja, K. Jung, and E. Alon, "A 60Gb/s 173mW receiver frontend in 65nm CMOS technology," *IEEE Symp. VLSI Circuits*, *Dig. Tech. Pap.*, pp. C230-C231, Aug. 2015. - [40] R. Rath, D. Clausen, S. Ohlendorf, S. Pachnicke, and W. Rosenkranz, "Tomlinson-Harashima Precoding for Dispersion Uncompensated PAM-4 Transmission with Direct-Detection," *IEEE Journal of Lightwave Technology*, vol. 35, no. 18, pp. 3909-3917, Sept. 2017. [41] R. Rath, C. Schmidt, and W. Rosenkranz, "Is Tomlinson-Harashima Precoding Suitable for Fiber-Optic Communication Systems?," ITG Symposium Proceedings - Photonic Networks, May. 2013. - [42] M. Tomlinson, "New automatic equaliser employing modulo arithmetic," *Electronics Lett.*, vol. 7, nos. 5-6, pp. 138-139, Mar. 1971. - [43] H. Harashima and H. Miyakawam "Matched-Transmission Technique for Channels With Intersymbol Interference," *IEEE Trans. Commun.*, vol. 20, no. 4, pp. 774-780, Aug. 1972. - [44] Y. Gu and K. Parhi, "High-Speed Architecture Design of Tomlinson-Harashima Precoders," *IEEE Trans. On Circuits and Systems I*, vol. 54, no. 9, pp. 1929-1937, Sept. 2007. - [45] M. Kossel, et al., "Feedback delay reduction of Tomlinson-Harashima precoder in 14 nm CMOS via pipelined MAC units operated entirely with CSA arithmetic," *Electronics Lett.*, vol. 52, no. 23, pp. 1906-1908, Nov. 2016. - [46] T. Toifl, et al., "A 0.3pJ/bit 112Gb/s PAM4 1+0.5D TX-DFE Precoder and 8-tap FFE in 14-nm CMOS," Symposium on VLSI Circuits Dig. of Tech. papers, pp. 53-54, June 2018. - [47] T. Kim, P. Bhargava, and V. Stojanovic, "A Model Predictive Control Equalization Transmitter for Asymmetric Interfaces in 28nm FDSOI," *IEEE ASSCC Dig. Tech. Papers*, pp. 237-240, Nov. 2016. [48] A. Suleiman *et al.*, "Model Predictive Control Equalization for High-Speed I/O Links," *IEEE Trans. on Circuits and Systems I*, vol. 61, no. 2, pp. 371-381, Feb. 2014. - [49] B. Kang and D.-K. Jeong, "Analysis of Cascaded Tomlinson-Harashima Precoding with Feedforward Equalizer for Pre-cursor Removal," *IEEE* 2022 37th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), July 2022. - [50] A. Deutsch *et al.*, "High-speed signal propagation on lossy transmission lines," *IBM J. Res. And Dev.*, vol. 34, no. 4, July 1990. - [51] Q. Yu and O. Wing, "Computational Models of Transmission Lines with Skin Effects and Dielectric Loss," *IEEE Trans. on Circuits and System I*, vol. 41, no. 2, Feb. 1994. - [52] W. Dally and J. Poulton, "Digital Systems Engineering," *Cambridge University Press*, 1998. - [53] B. Razavi, "Design of Integrated Circuits for Optical Communications," *McGraw-Hill*, 2003. - [54] B. Kang et al., "A 10Gb/s PAM-4 Transmitter With Feed-Forward Implementation of Tomlinson-Harashima Precoding in 28 nm CMOS," IEEE Access, Nov. 2021. - [55] T.-C. Huang, Q.-T. Chen, and T.-C. Lee, "A 4-PAM Adaptive Analog - Equalizer for Backplane Interconnections," *IEEE Symposium on VLSI Design, Automation and Test (VLSI-DAT)*, pp. 200-203, Apr. 2008. - [56] M. Choi et al., "An output-bandwidth-optimized 200 Gb/s PAM-4 100 Gb/s NRZ transmitter with 5-tap FFE in 28 nm CMOS," IEEE ISSCC Dig. Tech. Papers, pp.128-129, Feb. 2021. - [57] B. Kang, W. Jung, H. Kim, S. Lee, and D.-K. Jeong, "A 42Gb/s PAM-8 Transmitter with Feed-Forward Tomlinson-Harashima Precoding in 28nm CMOS," *IEEE ASSCC Dig. Tech. Papers*, Nov. 2022. - [58] Z. Wang et al., "An Output Bandwidth Optimized 200-Gb/s PAM-4 100-Gb/s NRZ Transmitter With 5-Tap FFE in 28-nm CMOS," IEEE J. Solid-State Circuits, Vol. 57, no. 1, Jan. 2022. - [59] J. Kim et al., "A 224-Gb/s DAC-Based PAM-4 Quarter-Rate Transmitter With 8-Tap FFE in 10-nm FinFET," IEEE J. Solid-State Circuits, Vol. 57, no. 1, Jan. 2022. - [60] Y. Chun et al., "A PAM-8 Wireline Transceiver with Receiver Side PWM (Time-Domain) Feed Forward Equalization Operating from 12-to-39.6Gb/s in 65nm CMOS" IEEE ESSCIRC Dig. Tech. Papers, Sept. 2019. - [61] M. Megahed et al., "A 27 Gb/s 5.39 pJ/bit 8-ary Modulated Wireline Transceiver Using Pulse Width and Amplitude Modulation Achieving 9.5 dB SNR Improvement over PAM-8" Symposium on VLSI Circuits Dig. of Tech. papers, Sept. 2021 # 초 록 하이퍼스케일 데이터 센터와 데이터 트래픽의 이러한 성장은 필연적으로 전송 속도와 대역폭의 증가를 필요로 한다. 따라서 다양한 입출력 표준의 레인당 데이터 속도는 시간이 지남에 따라 급격히 증가했으며, 또한 필스-진폭-변조 (PAM), 특히 PAM-4 와 같은 다중 레벨 신호는 많은 표준에서 널리 채택되었다. 다중 레벨 시그널링의 경우 영비복귀 시그널링에 비해 시그널-노이즈 비율 (SNR)의 저하가 불가피하다. 이러한 추세에 발맞춰 채널 손실도 해가 갈수록 증가했다. 또한, 상승/하강 시간의부분이 증가함에 따라 pre-cursor 가 증가할 수 있으므로 이를 제거할필요가 있다. 이와 관련하여 SNR 개선을 이룰 수 있는 Tomlinson-Harashima precoding (THP)을 소개하고, 이를 이용하여 pre-cursor 를 제거하기 위한 몇 가지 변형을 제시하였다. 피드 포워드 THP (FF-THP)을 도입한 고속 다중 레벨 송신기 (TX)가 구현하였다. 제안된 FF-THP는 모듈로 기반 등화의 장점과 pre-cursor 에 대한 제어 능력을 모두 가진다. 또한 THP, FFE, FF-THP의 채널 응답 및 등화 부분에 대한 정량적 z-도메인 분석을 수행하였다. pre-cursor 가 있는 간단한 1-극점 채널을 사용하여 pre-cursor의 영향과 FF-THP의 효율성을 보여주며, 분석을 FF-THP는 채널에 pre-cursor가 있거나 부호간 간섭이 큰경우 TX 등화 방식 중 가장 큰 수직 아이 마진 (VEM)을 보였다. FF-THP를 채택한 2개의 고속 다중 레벨 TX는 28 nm CMOS 기술로 제작되었다. 첫 번째 칩은 FF-THP를 도입한 10 Gb/s PAM-4 TX 이다. 모듈로 예측 엔진 (MPE)과 FFE 는 PLL 에서 생성된 4 상 클록과 일치하는 4 병렬 구조로 설계되었다. FFE 탭 계수는 21 dB 손실 채널을 적절하게 보상하도록 최적화되었다. 제안된 FF-THP 는 FFE 에 비해 더 넓은 수평 아이 마진과 더 큰 VEM 을 보여준다. TX 는 6.0 pJ/b 및 4.05 pJ/b/ISI 의 전력 효율로 10 Gb/s PAM-4 를 달성하는 동시에 21 dB 손실을 보상하고 0.0746 mm²의 활성 영역을 차지한다. 두 번째 칩은 42 Gb/s PAM-8 FF-THP TX이다. 합성된 디지털 블록의 MPE 및 FFE 는 7.7 dB 채널 손실을 보상하면서 16 병렬 구조 및고속 작동을 달성하도록 설계 및 최적화되었다. 16 위상 클록은 RDAC 기반 디지털 제어 지연 라인에 의해 생성되며 1-UI 펄스 발생기 기반 16-to-1 직렬변환기는 14 Gbaud 데이터를 제공하는 데 사용된다. 소스직렬 종단 기반 6-bit DAC 드라이버는 합리적인 DNL 및 INL 과 일치하는 50 $\Omega$ 을 가진다. 이러한 노력으로 42 Gb/s 의 최고 3 bit/baud TX 데이터 속도와 0.0703 mm² 의 활성 영역을 가진 최첨단 TX 와 비교할수 있는 1.58 pJ/b 의 전력 효율성을 가진다. FF-THP 의 유효성은 수학, 시뮬레이션 및 측정 결과를 통해 검증되었다. 또한 디지털 기반 등화 기술은 프로세스 스케일링을 최대한 활용할수 있다는 장점이 있다. 주요어 : multi-level transmitter, feed-forward equalizer(FFE), Tomlinson-Harashima pre-coding (THP), feed-forward Tomlinson-Harashima precoding (FF-THP), DAC driver 학 번 : 2017-27459