



# Design of Low-Power Transmitter with 1T1C Bootstrap technique for Memory Interfaces

# 1T1C 부트스트랩을 이용한 메모리 인터페이스 용 저전력 송신기 설계

by

**Hyeonseok Lee** 

August, 2023

Department of Electrical and Computer Engineering College of Engineering Seoul National University

# Design of Low-Power Transmitter with 1T1C Bootstrap technique for Memory Interfaces

지도 교수 정 덕 균

이 논문을 공학석사 학위논문으로 제출함 2023 년 8 월

> 서울대학교 대학원 전기·정보공학부 이 현 석

이현석의 석사 학위논문을 인준함 2023 년 8 월

| 위 원 장 _ | (인)        |
|---------|------------|
| 부위원장 _  | <u>(인)</u> |

위 원 \_\_\_\_\_ (인)

# Design of Low Power Transmitter with 1T1C Bootstrap technique for Memory Interfaces

by

Hyeonseok Lee

A Thesis Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Master of Science

> at SEOUL NATIONAL UNIVERSITY

> > February, 2023

Committee in Charge:

Professor Suhwan Kim, Chairman

Professor Deog-Kyoon Jeong, Vice-Chairman

Professor Woo-Seok Choi

## Abstract

This paper presents a 40.8-fJ/b/mm 12.8-Gb/s single-ended transmitter with a 1T1C bootstrap technique for low-power memory interfaces. The proposed bootstrap serializer and pre-driver allow the N-over-N driver to have a sufficient eye margin to achieve BER<10<sup>-12</sup> and a high data rate at low VDD and VDDQ. The bootstrap transmitter saves 13% power consumption compared to the conventional transmitters. The transmitter for the HBM interface driving a 6.6mm onchip channel achieves an energy efficiency of 40.8-fJ/b/mm at 0.75-V VDD and 0.25-V VDDQ. The transmitter for the LPDDR interface driving a 3mm FR-4 trace achieves an energy efficiency of 0.47-pJ/b at 0.9-V VDD and 0.5-V VDDQ. Both exhibit the highest energy efficiency compared to the state-of-the-art transmitters for the memory interfaces.

**Keywords** : HBM, LPDDR, memory interface, single-ended transmitter, bootstrap serializer, and pre-driver

Student Number : 2021-28627

# Contents

| ABSTRACT                                        | Ι   |
|-------------------------------------------------|-----|
| CONTENTS                                        | II  |
| LIST OF FIGURES                                 | IIV |
| LIST OF TABLES                                  | VI  |
| CHAPTER 1 INTRODUCTION                          | 1   |
| 1.1 MOTIVATION                                  | 1   |
| 1.2 THESIS ORGANIZATION                         | 3   |
| CHAPTER 2 BACKGROUNDS                           | 4   |
| 2.1 SERDES LINK                                 | 4   |
| 2.2 LOW POWER MEMORY INTERFACE                  | 7   |
| 2.2.1 BASIS OF MEMORY INTERFACE                 | 7   |
| 2.2.2 HIGH BANDWIDTH MEMORY(HBM) INTERFACE      | 9   |
| 2.2.3 LOW-POWER DUAL DATA RATE(LPDDR) INTERFACE | 11  |
| 2.3 BOOTSTRAP TECHNIQUE                         | 13  |
| 2.4 SERIALIZER                                  | 16  |
| 2.5 DRIVER                                      |     |

#### CHAPTER 3 DESIGN OF TRANSMITTER FOR LOW POWER MEMORY

#### INTERFACE

| 1   | 1 |
|-----|---|
| - 2 | L |
| _   |   |

40

| 3.1 OVERALL ARCHITECTURE  | 22 |
|---------------------------|----|
| 3.2 SERIALIZER            | 25 |
| 3.3 1T1C BOOTSTRAP        | 28 |
| 3.4 DRIVER WITH 2-TAP FFE | 33 |
| 3.5 SAMPLER               | 38 |

#### CHAPTER 4 MEASUREMENT RESULTS

| 4.4 Performance summary |    |
|-------------------------|----|
| CHAPTER 5 CONCLUSION    | 50 |
|                         |    |
| BIBLIOGRAPHY            | 51 |

# **List of Figures**

| FIG. 2.1 (A) PARALLEL COMMUNICATION AND (B) SERIAL COMMUNICATION                 |
|----------------------------------------------------------------------------------|
| FIG. 2.2 BASIC ARCHITECTURE OF SERDES LINK AND BACKPLANE ENVIRONMENT[8]6         |
| FIG. 2.3 SYSTEM-LEVEL SYNCHRONIZATION WITH DISTRIBUTED CLOCK SIGNALS [9]7        |
| FIG. 2.4 ARCHITECTURE OF HBM INTERFACE [10]9                                     |
| FIG. 2.5 COMPARISON OF CHIP PLACEMENT STRUCTURE OF GDDR5 AND HBM [10]10          |
| FIG. 2.6 (A) PSEUDO-OPEN-DRAIN INTERFACE ARCHITECTURE AND (B) LOW-VOLTAGE SWING- |
| TERMINATED LOGIC ARCHITECTURE [12]12                                             |
| FIG. 2.7 CONVENTIONAL BOOTSTRAP STRUCTURE                                        |
| FIG. 2.8 OPERATION OF BOOTSTRAP STRUCTURE                                        |
| FIG. 2.9 EQUIVALENT CIRCUIT OF BOOTSTRAP STRUCTURE                               |
| FIG. 2.10 16:1 SERIALIZER ARCHITECTURE                                           |
| FIG. 2,11 CONVENTIONAL 2:1 SERIALIZER STRUCTURE                                  |
| FIG. 2.12 TIMING DIAGRAM OF 2:1 SERIALIZER                                       |
| FIG. 2.13 FLIP-FLOP-BASED DIVIDER STRUCTURE                                      |
| FIG. 2.14 (A) CURRENT MODE DRIVER AND (B) VOLTAGE MODE DRIVER                    |
| FIG. 2.15 CURRENT-MODE LOGIC(CML) DRIVER                                         |
| FIG. 2.16 SOURCE-SERIES TERMINATION(SST) DRIVER                                  |
| FIG. 3.1 BLOCK DIAGRAM OF THE PROPOSED LOW-POWER TRANSMITTER                     |
| FIG. 3.2 PROPOSED 16:1 SERIALIZER                                                |
| FIG. 3.3 (A) 3-LATCHES 2:1 SERIALIZER AND (B) 1-LATCH 2:1 SERIALIZER             |
| FIG. 3.4 CMOS-BASED FLIP-FLOP DIVIDER                                            |

| Fig. 3.5 (a) 16:1 serializer transient simulation and (b) eye diagram – PRBs7 pattern     |
|-------------------------------------------------------------------------------------------|
|                                                                                           |
| FIG. 3.6 PRIOR WORK OF BOOTSTRAPPED PRE-DRIVER                                            |
| FIG. 3.7 PROPOSED 1T1C BOOTSTRAP SERIALIZER(A) AND PRE-DRIVER(B)                          |
| FIG. 3.8 EQUIVALENT CIRCUIT FOR EVALUATING 1T1C BOOTSTRAP PRE-DRIVER                      |
| FIG. 3.9 POST-LAYOUT SIMULATION RESULTS : TRANSIENT RESPONSE                              |
| FIG.3.10 POST-LAYOUT SIMULATION RESULTS : POWER COMPARISON                                |
| FIG. 3.11 SOURCE-SERIES TERMINATION(SST) DRIVER                                           |
| FIG. 3.12 IMPEDACE TUNING : SERIES-STACKED FET BANKS METHOD(A), SLICE DIVIDING            |
| METHOD(B)                                                                                 |
| FIG. 3.13 N-OVER-N DRIVER                                                                 |
| FIG. 3.14 PROPOSED PRE-DRIVER AND N-OVER-N DRIVER WITH 2-TAP FFE                          |
| FIG. 3.15 POST-LAYOUT SIMULATION OF N-OVER-N DRIVER WITH 2-TAP FFE                        |
| FIG. 3.16 CONVENTIONAL PMOS STRONG-ARM LATCH                                              |
| FIG. 3.17 TRANSIENT RESPONSE OF DESINGED STRONG-ARM LATCH                                 |
| FIG. 4.1 CHIP PHOTOMICROGRAPH                                                             |
| FIG. 4.2 CROSS SECTION VIEW AND INSERTION LOSS OF THE ON-CHIP METAL CHANNEL               |
| FIG. 4.3 MEASUREMENT SETUP                                                                |
| Fig. 4.4 measured Eye diagrams of the $\ensuremath{TX}\xspace_1$                          |
| Fig. 4.5 BER measurement results of the $\text{trx}_2$ : eye diagram(top) and ber bathtub |
| CURVE(BOTTOM)                                                                             |
| Fig. 4.6 shmoo plot of the $\text{Trx}_2$ : data-rate vs vdd(top left), data-rate vs vddq |
| (BOTTOM LEFT) AND VDDQ VS VDD (RIGHT)                                                     |
| FIG. 4.7 POWER BREAKDOWN OF THE PROPOSED TRANSMITTERS                                     |

# **List of Tables**

| TABLE 4.1 COMPARISON WITH MEMORY INTERFACES    | .48 |
|------------------------------------------------|-----|
| TABLE 4.2 COMPARISON WITH ON-CHIP SERIAL LINKS | .49 |

## Chapter 1

# Introduction

#### **1.1 Motivation**

As data rates of DRAM continue to increase, the power consumption of the DRAM interface is also significantly increasing. As a result, recent attempts have been made to reduce power consumption by applying lower voltages than the transmitter's supply voltage to the driver [1,2]. In particular, in HBM [3-5] or low-frequency mode LPDDR [6], extremely low VDDQ is used to improve the power efficiency of the driver without matching the output impedance. However, the power consumption of the pre-driver and serializer still remains high due to the requirement for VDD to be much higher than VDDQ. Bootstrap is a technique for generating a sufficient output swing at low VDD [5,7]. However, prior-art boot-

strap structures require complicated hardware that makes high-speed operations difficult due to its timing margin and increased parasitic capacitance.

In this paper, a 1T1C bootstrap technique implemented using only a single capacitor and a single transistor is proposed, enabling not only the driver but also the serializer and pre-driver to operate at low supply voltage. This simple structure increases the timing margin and improves power efficiency, allowing bootstrapping operation even at a high data rate of 12.8-Gb/s at low supply voltages of 0.75-V VDD and 0.25-V VDDQ.

### **1.2 Thesis Organization**

This thesis is organized as follows. In Chapter 2, the backgrounds of the SerDes link and low power memory interface is presented. And, the operation of serializer, driver, and bootstrap technique are presented.

In chapter 3, a 12.8-Gb/s/pin single-ended transmitter with 1T1C bootstrap technique for low power dram interface is presented. It describes the structure, operating principle and post-layout simulation results of bootstrapped serializer and pre-driver. Then implements and post-layout simulation of n-over-n driver and sampler are presented.

In chapter 4, measurement results of the prototype chip are shown. The BER measurement of transceiver, eye diagram of transmitter, and power breakdown are measured. Measured performances are compared with the state-of-the-art transmitters for low power memory interfaces in the comparison table.

Chapter 5 summarizes the proposed work and concludes this thesis.

## Chapter 2

## Backgrounds

### 2.1 SerDes Link

Fig. 2.1 illustrates data transmission in parallel communication and serial communication. Compared to parallel communication, serial communication is free from issues such as clock skew between channels and channel crosstalk caused by numerous physical wires. To enable serial communication, a Serializer/Deserializer (SerDes) link is utilized. It performs the conversion of data from parallel to serial form and vice versa (Fig. 2.2). This enables rapid trans-mission and processing of data in high-speed communication links.

Furthermore, SerDes links offer various advantages, including bandwidth expansion, improved power efficiency, and signal integrity. In high-speed communication systems, SerDes links are commonly used for data transmission



Fig. 2.1 (a) Parallel communication and (b) serial communication.

and find applications in diverse fields. For instance, they are utilized for communication between servers within data centers, connections between network equipment, high-resolution video and audio streaming, and high-speed serial interfaces (such as PCI Express, USB, etc.). SerDes links employ various technologies and protocols to meet the requirements for signal integrity and speed in high-speed data transmission. These include clock recovery, signal encoding and decoding, adaptive equalization, channel coding, and low-noise and attenuation techniques. These technologies and protocols maintain signal integrity, minimize noise and distortion, and enhance reliability and transmission performance.



Fig. 2.2 Basic architecture of SerDes Link and backplane enviroment[8].

### **2.2 Low Power Memory Interface**

#### 2.2.1 Basis of Memory Interface

Modern high-speed DRAM memory systems transmit various types of signals, such as address, command, control, data, and clock signals through the same type of PCB traces. Consequently, clock signal's propagation delay can affect other signals. Fig. 2.3 shows that problems can occur when the propagation delay of the clock signal is not negligible compared to the clock period.



Fig. 2.3 System-level synchronization with distributed clock signals [7]

Therefore, synchronization should be performed considering the phase difference resulting from the propagation of the clock signal. In general, three synchronization systems are used in memory interfaces. First method is the global clocking system which is applied when the propagation delay of clock signals between different devices in the system is small com-pared to the clock period. Therefore, timing differences are considered within the timing margin of the signal protocol. This approach is limited to memory systems operating at low frequencies, such as SDRAM memory systems. Second is source-synchronous clocking system that transmits a clock signal along with a data signal from the transmitter. The timing of the uni-directional data signals is synchronized to the clock signal generated by the same device, not to a global clock. Lastly, PLL and DLL-based synchronization techniques are utilized for achieving synchronization between the memory controller and the DRAM de-vice. These systems are implemented to actively compensate for signal skew as the data rate increases dramatically.

#### 2.2.2 High Bandwidth Memory(HBM) Interface

High-Bandwidth Memory (HBM) is a memory chip structure that vertically stacks memory chips. The stacked chips are connected through a fast interconnect called an interposer.(Fig. 2.4) Multiple stacks of HBM are connected to CPUs or GPUs via the interposer and assembled on a circuit board. The HBM stack is closely and rapidly interconnected through the interposer, to the extent that it is indistinguishable from on-chip integrated RAM.



Fig. 2.4 Architecture of HBM interface [7]

GDDR5 is widely used in almost all high-performance graphics cards to date. However, with the increasing speed of graphic chips, there is a growing demand for fast bandwidth. Meeting the increased bandwidth specifications with GDDR consumes excessive power. HBM significantly improves memory power efficiency and can provide more than three times the bandwidth per unit power compared to GDDR. The stacked structure of HBM enhances not only power efficiency but also space efficiency (Fig. 2.5). Compared to GDDR5, HBM can save up to 94% of space.



Fig. 2.5 Comparison of chip placement structure of GDDR5 and HBM [7]

#### 2.2.3 Low-Power Dual Data Rate(LPDDR) Interface

LPDDR is a memory technology suitable for mobile devices, offering highspeed data transfer, low power consumption, and optimized performance. The primary feature of LPDDR is its low power consumption. LPDDR achieves drastic power reduction through various power-saving mechanisms such as power gating, dynamic voltage scaling, and deep power-down mode. With these technologies, memory operates at low voltage and reduces power consumption during idle or low utilization periods.

Up to LPDDR3, LPDDR adopted the Pseudo-Open-Drain I/O solution used in GDDR4/5/5x/6 to ensure receiver termination. (Fig. 2.6(a)) [11] However, additional power consumption is introduced by termination. Therefore, next-generation LPDDR interfaces implemented a low-voltage swing-terminated logic (LVSTL) architecture to minimize it. (Fig. 2.6(b))



Fig. 2.6 (a) Pseudo-Open-Drain Interface architecture and (b)Low-Voltage swing-terminated logic architecture [7]

Most of the power consumption comes from the AC power ( $\propto$ CV<sup>2</sup>f) involved in charging and discharging the parasitic capacitance. In LVSTL, the increased pull-up resistance due to the natural saturation of the pull-up NMOS limits the voltage swing and reduces CV<sup>2</sup>f. LVSTL demonstrates a 40% reduction in power consumption compared to DDR's POD [13].

### 2.3 Bootstrap Technique

The conventional bootstrap technique in the driver [7] is composed of a pullup and pull-down control pair. It boosts the gate voltage to push the driver MOS into the deep triode region, enhancing its driving capability. The driver and switch pass gate are driven simultaneously by the data path, repeating precharging and driving depending on the transition of the data input.

Fig. 2.7 illustrates the conventional bootstrap technique in the driver.



Fig. 2.7 Conventional Bootstrap structure.

 $C_{BP}$  and  $C_{BN}$  are the bootstrap capacitors, while Mp1 and Mn1 are the precharging transistors for  $C_{BP}$  and  $C_{BN}$ , respectively. INV<sub>P</sub> and INV<sub>N</sub> are the pre-drivers which boost  $C_{BP}$  and  $C_{BN}$ , and  $M_{PD}$  and  $M_{ND}$  are the output drivers. The voltage swing of N<sub>BT</sub> boosts to 2VDD and -VDD, enhancing the driving capability of M<sub>PD</sub> and M<sub>ND</sub>, N<sub>BT</sub> improves the precharge capability and eliminates reverse leakage current by feedback controlling M<sub>P1</sub> and M<sub>N1</sub>.

Fig. 2.8 shows the operation of bootstrap circuit when the input switches from high to low and from low to high. Assume that the bootstrap capacitors  $C_{BP}$  and  $C_{BN}$  are precharged with a voltage potential of VDD before Vin transits from high to low, node  $N_{BP}$  has an initial voltage of VDD, and node  $N_{BT}$  has an initial voltage of -VDD.



Fig. 2.8 Operation of bootstrap structure.

Ideally, when  $V_{IN}$  transits from high to low,  $N_{OP}$  transits from low to high then  $N_{BP}$  is boosted to 2VDD. Simultaneously,  $M_{P2}$  turns on and  $M_{N2}$  turns off, and  $N_{BP}$ 's 2VDD charges  $N_{BT}$  via  $M_{P2}$ . When  $N_{BT}$  is charged to a voltage above  $V_{th}$ ,  $M_{N1}$  turns on and precharges  $N_{BN}$  to GND, thereby  $C_{BN}$  having a voltage potential of -VDD. When Vin transits from low to high, a similar mechanism makes  $N_{BT}$  to -VDD.

In ideal case, the voltage swing of node  $N_{BT}$  is boosted to 2VDD and -VDD. However, due to the parasitic capacitances of node  $N_{BT}$ , they occur charge-sharing effect with bootstrap capacitance. [jssc-ref17].

Fig. 2.9 depicts the equivalent circuit of the upper side when  $N_{BT}$  transits above VDD.  $V_{BTP}$  and  $C_{BTP}$  are the voltage and total parasitic capacitance of  $N_{BT}$ , respectively. Assuming that  $V_{BTP}$  transits from -VDD to 2VDD, eq 2.1 is hold.



$$V_{OUT} = \frac{C_{BP}}{C_{BP} + C_{PTP}} 2VDD - \frac{C_{PTP}}{C_{BP} + C_{PTP}} VDD$$
(2.1)

Fig. 2.9 equivalent circuit of bootstrap structure.

### 2.4 Serializer

A serializer combines multiple low-speed parallel channels into a high-speed stream of 1-bit serial data. Transmitters incorporate serializers, allowing the inputs to be much slower than the output, thereby simplifying the design of the package and the PC board. Fig.2.10 shows the 16:1 Serializer that consists of 4stage 2:1 serializer. [14]



Fig. 2.10 16:1 Serializer architecture.

If data arrives at arbitrary timings, the output of the MUX may experience glitches or pulse width distortion. Consequently, both input data of the Mux must not transition simultaneously and should be aligned with the clock. To address this, flip-flops capable of retiming with a clock are placed before the Mux. Fig. 2.11 illustrates the resulting configuration described above. Five latches are positioned in front of the Mux, ensuring that each data input goes through one flipflop for stable transitions.



Fig. 2.11 Conventional 2:1 serializer structure.

Fig. 2.12 depicts the operation of these flip-flops. When the clock is high, the first four latches function as flip-flops, while L3 holds the sampled data  $D_1$  until the clock's falling edge. Meanwhile, L2 and L5 sense the outputs of L1 and L4, respectively. During this period, the output  $D_{OUT}$  of the 2:1 Mux is the sampled  $D_2$  from L5. When the clock is low, L5 holds the sampled data  $D_2$  until the next rising edge, and the output  $D_{OUT}$  of the Mux becomes the sampled  $D_1$  from L3.

Dividers are placed at each stage of the serializer and can be implemented simply using a single CMOS flip-flop-based negative feedback loop. (Fig. 2.13) By inverting the output of the flip-flop and feeding it back to the input, a negative feedback loop is formed, forcing each latch to toggle between 0 and 1 alternately. In a 16:1 serializer, clocks of frequency divided-by-8 are required. This can be achieved by stacking three dividers that divide by 2.



Fig. 2.12 Timing diagram of 2:1 serializer.



Fig. 2.13 Flip-flop-based divider structure.

## **2.5 Driver**

As the required bandwidth of serial link systems increases, various interface circuits have been proposed for the design of transmitter drivers. Among them, the most commonly used topologies for transmitter drivers are current-mode drivers and voltage-mode drivers [15]. Fig. 2.14 illustrates their basic forms. Voltage-mode drivers have an output swing determined by the supply voltage, while current-mode drivers have an output swing determined by the drain current of the MOSFET.



Fig. 2.14 (a) Current mode driver and (b) Voltage mode driver.

Alternatively, in voltage-mode drivers, the MOSFET has a low impedance, making the MOSFET's resistance (or a series resistor) the output impedance. In currentmode drivers, the MOSFET has a high impedance, and a resistance connected in parallel becomes the output impedance. To prevent signal reflection that impairs signal integrity, the output impedance of the driver should match the transmission line's characteristic impedance Z0, which is typically  $50\Omega$ .

Current-mode drivers are implemented as current-mode logic drivers (Fig. 2.15) [16]. The MOSFET operates in the saturation region, acting as a current source. A resistor connected in parallel is used to match the output impedance to  $Z_0$ . When the MOSFET is turned on, there is a voltage drop of  $I_SZ_0$ , while there is no voltage drop on the side where the MOSFET is turned off, resulting in a voltage swing of  $I_SZ_0$ .



Fig. 2.15 Current-mode Logic(CML) driver.

Voltage-mode drivers typically use source-series termination (SST) drivers (Fig. 2.16) [15]. A resistor is connected in series in the basic voltage-mode driver.

The MOSFET operates in the linear region, acting as a resistor. The output impedance of the driver is equal to  $R_{ON} + R_S = Z_0$ , where  $R_{ON}$  represents the MOSFET's on-resistance and  $R_S$  is the series resistor.



Fig. 2.16 Source-Series Termination(SST) driver.

## **Chapter 3**

# Design of Transmitter for low power memory interface

## **3.1 Overall architecture**

The overall block diagram of the proposed low power memory interface is shown in Fig. 3.1. The memory interface consists of two transmitters, each designed for LPDDR and HBM, respectively. And HBM interface side includes receiver. The transmitter is composed of a 16-bit PRBS pattern generator, a 16:2 serializer, a 2:1 serializer, a pre-driver, and an N-over-N driver with a 2-tap feedforward equalizer (FFE).

Two external differential forwarded 6.4GHz CLKs are supplied, and two transmitters share one differential CLK, the other one is used by receiver. Serializer and CLK divider are digitally controllable to adjust the timing of data and CLK, allowing for fine timing control. The bootstrapped circuits which are described in detail in the next section are placed in front of the driver and the 2:1 serializer operating at high speed. The output impedance of the N-over-N driver can be digitally controlled, so flexible impedance matching is possible. The transmitter with a 20 $\Omega$  output impedance for the HBM interface is connected to the 6.6mm-long metal channel that emulates a silicon interposer with ground shielding. 12.8 Gb/s Data is transmitted along the metal channel and sent to the RX for testing. Two half-rate samplers consisting of strong-ARM latches are used for data verification. The transmitter for the LPDDR interface is 50 $\Omega$ -terminated, and is measured through an external 28.5mm-long FR-4 trace. Supply Voltage of Pre-Driver and N-over-N driver for HBM interface are 0.75V and 0.25V, respectively, and supply voltage of Pre-Driver and N-over-N driver for LPDDR interface are 0.9V and 0.5V, respectively.





### **3.2 Serializer**

The parallel PRBS digital output signal of 16 bits is converted into a 1-bit 12.8Gb/s signal by a 16:2 serializer and a 1T1C bootstrap 2:1 serializer. (Fig. 3.2) The 16:2 serializer consists of stacked 2:1 serializers, and the 2:1 serializer is a latch-less 2:1 MUX structure [17] where the latch is removed from the conventional three-latch 2:1 mux cell to reduce power consumption.



Fig. 3.2 Proposed 16:1 Serializer.

The 2:1 mux is implemented by connecting two tri-state inverters. The final stage of 2:1 serializer is capable of serializing a 2-bit 6.4Gb/s signal into a 1-bit 12.8Gb/s signal at a supply voltage of 0.75V by applying the 1T1C bootstrap technique, which will be described in section 3.3.

As shown in Fig. 3.3, a conventional 2:1 serializer consists of three latches and one 2:1 mux. L1 and L2 hold the input during the stage progression to prevent glitches, and L3 prevents the input from changing while the clock samples the input. However, if the timing of the two data inputs to the serializer is well-controlled, L1 and L2 can be removed without problems. [18] This is when the two data inputs transit slightly after the clock's edge and settle before the next clock edge. This condition is easily implemented due to the clock delay caused by the divider. L3 prevents both inputs of the 2:1 mux from transitioning simultaneously like the conventional structure. Power efficiency is improved by reducing the number of clocked CMOS-based latches.



Fig. 3.3 (a) 3-latches 2:1 Serializer and (b) 1-latch 2:1 serializer.

Fig. 3.4 shows that each divider is composed of a CMOS-based flip-flop and an inverter, where the output of the FF is inverted and fed back into the input of the FF. Digitally control of the input of the first DIV2, clk and clkb, and the output of DIV8 allows the timing of the divided clocks and data to be aligned and stable operation of the latch-less serializer.



Fig. 3.4 CMOS-based flip-flop divder.

Fig. 3.5 illustrates the waveforms of the output from the serializer and the individual outputs from the dividers. The PRBS-7 16-bit parallel 800Mb/s signals are successfully serialized into a 1-bit 12.8Gb/s signal (Fig a), and the serialized signal is matched with the ideal PRBS-7 pattern (Fig b).



Fig. 3.5 (a) 16:1 Serializer transient simulation and (b) eye diagram – PRBS 7 pattern.

### 3.3 1T1C Bootstrap

In conventional bootstrap topology, the pass gate that performs precharging is driven by the output of the inverter. Due to this method, the inverter drives heavy load and is not suitable for high-speed applications. Furthermore, since the precharging pass gate remains on when the bootstrap vdd and vss are pre-charged, and turns off during drive, the bootstrap efficiency is degraded.

Pre-driver bootstrap topology [5] that bootstrap vdd and vss simultaneously is shown in Fig. 3.6 Boostrapped pre-driver output is used as input to the n-overn driver. Like the previous topology, the load at the output node is small, but due to the large parasitic resistance and capacitance caused by the use of many pass gates, it is not suitable for low supply voltage and high-speed applications.



Fig. 3.6 Prior work of bootstrapped pre-divder.

Furthermore, although VSS bootstrap is effective in boosting PMOS input, it is not

so for NMOS input, as a result, efficiency per additional power consumption is reduced.

Fig. 3.7(a) and Fig. 3.7(b) show the schematic and operation of the 1T1C bootstrap serializer and the pre-driver. During the pre-charge phase, bootstrap capacitors  $C_H$  and  $C_L$  store a voltage potential of VDD. As Vin makes transitions, the node H is boosted to VDDH, or the node L is boosted to VSSL depending on the polarity of Vin. Compared to the prior-art bootstrap topologies, the 1T1C bootstrap reduces output load and increases boosting efficiency at the high data-rate thanks to its simple implementation.



Fig. 3.7 Proposed 1T1C bootstrap serializer(a) and pre-driver(b).

Fig. 3.8 shows a circuit diagram featuring a boosted voltage driving mechanism. Ideally, the output voltage ( $V_{OUT}$ ) should be boosted from the VDD to 2VDD. However, due to the charge-sharing effect of the parasitic node capacitance, the actual output voltage is

$$V_{OUT} = \frac{2C_H + C_D}{C_H + C_D + C_{OUT}} VDD$$
(3.1)



Fig. 3.8 Equivalent circuit for evaluating 1T1C bootstrap pre-driver.

Upon applying the design parameters, the boosting factor is computed to be 1.36, resulting in an output voltage of 1.02V when VDD is set to 0.75V. The transfer function from the inverter ( $V_{IN}$ ) output to  $V_{OUT}$  is expressed as

$$\frac{V_{OUT}}{V_{IN}}(s) = \frac{C_H}{\alpha + (\beta R_{ON,INV} + \zeta)s + \beta \gamma s^2}$$
(3.2)  
$$C_D = C_{D2} + C_{S1}$$
$$C_G = C_{INV} + C_{G1}$$
$$\alpha = C_H + C_G + C_D$$
$$\beta = C_H C_D + C_H C_G + C_D C_G$$

$$\gamma = R_{ON,INV}R_{ON,2}C_{OUT}$$
$$\zeta = R_{ON,INV}(C_H + C_G) + R_{ON,2}(C_D + C_H)$$

By Eq. 3.2, the dominant pole pd of the transfer function from the output of  $INV_P$  to  $V_{OUT}$  is expressed as

$$p_{d} = \frac{-\beta R_{ON,INV} - \zeta + \sqrt{\left(\beta R_{ON,INV} + \zeta\right)^{2} - 4\alpha\beta\gamma}}{2\beta\gamma}$$
(3.3)

When the design parameter is applied, the circuit can achieve an all-pass response within the desired frequency range, with the dominant pole located at  $p_d \approx 251.2 \text{ Grad/s}$ .

Fig. 3.9 depicts post-layout simulation results that the 1T1C bootstrap increases the output swing by 33% at 12.8-Gb/s. As a result, when bootstrap is utilized while decreasing VDD from 1V to 0.75V, it is possible to achieve 13% power savings while maintaining the same eye margin. Compared to the transmitter without bootstrap, the eye width and height of the serializer output are improved by 30% and 104% with the same VDD as shown in Fig. 3.10. Also, the pre-driver output produces 6% and 24% enhancement in the eye width and height, respectively.



Fig. 3.9. Post-layout simuation results: transient response.

|                      |                   | W/O<br>Bootstrap<br>VDD=1V | W/<br>Bootstrap<br>VDD=0.75V |
|----------------------|-------------------|----------------------------|------------------------------|
|                      | 16:1 SER          | 1.71                       | 1.56                         |
| Power<br>Consumption | Buff +<br>Pre-Drv | 1.74                       | 1.46                         |
| (mW)                 | Drv               | 0.21                       | 0.17                         |
|                      | Total             | 3.66                       | 3.19                         |
| Eye width            | n (ps)            | 71.9                       | 72.1                         |
| Eye height           | : (mV)            | 227                        | 222                          |

Fig. 3.10. Post-layout simulation results: power comparison.

#### **3.4 Driver with 2-tap FFE**

A voltage mode driver produces an output signal with low impedance. This topology is widely used in memory interfaces because it is more power efficient than a current mode driver. To reduce nonlinearity caused by changes in input voltage, a source-series termination (SST) driver with a series resistor is commonly used. (Fig. 3.11) The output impedance of SST is the sum of the series resistor and the transistor output impedance. Although it has the aforementioned advantages, the size of the pre-driver must be increased to accommodate the low resistance of the transistor, resulting in high load capacitance.



Fig. 3.11. Source-series termination(SST) driver.[19]

Because output impedance can also vary due to PVT variation, impedance correction is necessary. There are two main methods for impedance correction.[19] The first method digitally controls impedance by connecting a digitally controlled set of FETs in series at the top and bottom of the actual line-driver stage. Impedance tuning is achieved through series-stacked FET banks, allowing for small tuning resolutions. However, these additional FETs limit voltage headroom. The second method involves slicing the driver to adjust the number of active slices. When the driver is divided into N slices, each slice's resistor is N\*R. This method still allows the driver to be kept simple and small, and there is no voltage headroom penalty when disabling the tri-stating slice through the data path. However, to cover low impedance while considering variation and deviation, many slices are needed, which can result in significant loading on the pre-driver. These two methods are usually combined in design, with the first method used for fine tuning and the second for course tuning.



Fig. 3.12. Impedance tuning : series-stacked FET banks method(a), slice dividing method(b).[19]

An n-over-n driver is a form of driver that replaces the pull-up MOSFET in a pover-n driver with an NMOS (Fig. 3.13). Because it can take a lower driver supply voltage than p-over-n drivers to achieve the same output impedance, it consumes less power. However, to maintain the pull-up NMOS's triode region, a higher overdrive voltage than that of the PMOS is required. To satisfy the relaxed definition of the DTR (Deep Triode Region)  $V_{DS} < 1/2$  ( $V_{GS} - V_{th,n}$ ), a differential-ended nover-n driver requires  $V_G > 5/4 V_S + V_{th,n}$ , while a single-ended n-over-n driver requires  $V_G > 3/2 V_S + V_{th,n}$ . To satisfy these requirements, a high pre-driver supply voltage is necessary, and the power consumption of the driver's front end can increase.



Fig. 3.13. N-over-N driver.

Fig. 3.14 shows the structure of the implemented pre-driver and driver with a 2-tap feed-forward equalizer (FFE). The pre-driver with 1T1C bootstrap applied to both the main-tap and post-tap is connected to the N-over-N driver. The N-over-N driver consists of 12 slices for the main-tap and 4 slices for the post-tap,

totaling 12 slices that are activated. The digital controllable series FETs mentioned earlier are used to correct the impedance of the N-over-N driver. The 1T1C bootstrap pre-driver not only sets the deep triode region of the N-over-N driver MOSFET but also enables a low output impedance with high  $V_{GS}$ , reducing the size of the driver.



Fig. 3.14. Proposed Pre-driver and N-over-N driver with 2-tap FFE.

Fig. 3.15 depicts the 12.8 Gb/s post-layout simulation result of turning on post FFE tap in sequence by connecting the designed on-chip metal channel to the output stage. A maximum voltage swing of 120mV was obtained when three post taps were turned on. Fig. shows the post-layout simulation result with a 500hm ground termination at the output and an external channel model with -9dB. When three post FFE taps were turned on, a maximum voltage swing of 63mV was obtained.





Fig. 3.15. Post-layout simulation of N-over-N driver with 2-tap FFE.

#### **3.5 Sampler**

In the prototype chip for the HBM interface, the data is transmitted through the 6.6mm metal channel. This data is detected as either 0 or 1 by two half-rate samplers. Fig. 3.16 illustrates the samplers that are consisted of a cascade of conventional PMOS strongarm latch with RS latch. [20] The strongarm latch is widely used in comparators due to its advantages of 1) consuming zero static power and 2) directly producing rail-to-rail output[21].



Fig. 3.16. Conventional PMOS strong-arm latch.

The strongarm latch operates in four phases. In the first phase, when the CLK is high, the NMOS discharges and resets OUT, OUTB, P, and Q to VSS. In the second phase, when the CLK becomes low, the PMOS turns on, activating the differential pair and generating a differential current proportional to  $V_{IN}$ - $V_{REF}$ . In the third

phase, when the voltage of P and Q reaches Vth, the cross-coupled PMOS is activated, and drain current flows through OUTB and OUT. In the last step,  $V_{OUT}$  and  $V_{OUTB}$  continue to rise until V<sub>th</sub>, triggering the cross-coupled NMOS to activate. At this point, the cross-coupled NMOS exhibits positive feedback behavior. The positive feedback of these transistors causes one output to become a VDD, while the other output returns to the VSS to produce inter-rail output. The Fig. 3.17 shows the voltage waveforms at each node of the designed strongarm latch.

The comparator structure generates kickback noise, which disturbs the input voltage due to the charging of the parasitic capacitance of the differential pair. To reduce kickback noise, complementary capacitance is added to the drain of the transistor, as depicted in Fig. 3.16. [22-24] This addition shifts the responsibility of charging the parasitic capacitance to the added capacitance instead of the input, effectively canceling out kickback noise. In this case, the parasitic capacitance is charged by the complementary capacitance instead of the input, and thus the kickback noise is canceled.



Fig. 3.17. Transient response of designed strong-arm latch.

## Chapter 4

## **Measurement Results**

#### 4.1 Die Photomicrograph

The proposed transmitters with 1T1C bootstrap technique are fabricated in a 40-nm CMOS process. The chip photomicrograph is shown in Fig. 4.1; TX's total area is 0.00508mm<sup>2</sup>. TX<sub>1</sub> is  $50\Omega$  terminated transmitter for the LPDDR interface and wire-bonded to PCB for verifying the transmitter performance. TX<sub>2</sub> and RX<sub>2</sub> are the on-chip channel transmitter and receiver for the HBM interface, respectively. Fig. 4.2 illustrates the cross section view of the on-chip channel, which is laid by a 6.6mm-long metal, emulating a wire in a silicon interposer. To minimize the impact of noise, ground shielding is used for isolation and a ground plane is positioned beneath the channel. the measured insertion loss at the Nyquist frequency 6.4GHz is -8.41dB.



|    | 16:2 SER          | 32um × 28um |
|----|-------------------|-------------|
| ту | 2:1 SER & Retimer | 27um × 43um |
| TX | Pre-Driver        | 19um × 27um |
|    | Driver & FFE      | 17um × 16um |
|    | RX                | 16um × 18um |

Fig. 4.1. Chip photomicrograph.



Fig. 4.2. Cross section view and loss of the on-chip metal channel.

### 4.2 Measurement Setup

The testing setup for the designed prototype chip is shown in Fig. 4.3 An external 6.4GHz differential clock is generated from BERT (Keysight J-Bert N4903B) and supplied to both TX and TRX DUT. TX2 and RX2 receive different 6.4 GHz differential clocks in TRX chip, respectively. The oscilloscope (Tektronix MSO 73304DX) also receives two Sync Clocks 6.4GHz from BERT and DUT. The data pattern, timing of Serializer, impedance calibration of driver and taps of FFE are controlled via I2C communication with the PC. The parameter setting and measurement setup of oscilloscope and power supply are automatically controlled via VISA communication with the PC. By doing so, it is possible to auto-evaluate the eye by performing a 2-D sweep of the reference voltage of the receiver and the data timing of transmitter. The output of TX1 is connected to the oscilloscope through a 28.5um FR-4 trace, enabling direct visualization of the transmitter's eye. The output of TX2 is transmitted to RX2 through a 6.6mm metal channel, and the output of RX2 is analyzed for BER through a 75.6mm FR-4 trace connected to BERT.



Fig. 4.3. Measurement Setup.

### **4.3 Measurement Results**

Fig. 4.4 depicts the measured eye diagrams of the TX1. The eye shows a 154mV height and 0.69-UI width. Fig. 4.5 shows the eye diagram and bathtub curves of the TX2 when FFE is turned on and off. When FFE is turned on, the eye satisfying the BER < 10-12 condition is 0.29UI at 12.8Gb/s.

Fig. 4.6 and Fig. 4.7 show Shmoo plots varying the supply voltages and the data rate of the TX2. VDDQ vs VDD shmoo plot shows that a maximum operation VDD=0.9V / VDDQ=0.5V @12.8Gb/s



280

Fig. 4.4. Measured eye diagrams of the TX<sub>1</sub>

speed of 12.8Gb/s is achieved at 0.74-V VDD and 0.215-V VDDQ and above, so the margins of the supply voltages are sufficient.



Fig. 4.5. BER measurement result of the  $TRX_2$ : eye diagram(top) and BER bathtub curve(bottom).





### **4.4 Performance Summary**

The power breakdown of the transmitters is shown in Fig. 11. Based on postlayout simulation results, the power dissipation of the 16:2 serializer, 2:1 serializer and pre-driver, and driver for the HBM interface are 0.542mW, 2.488mW, 0.414mW, respectively. The total power dissipation of the TX1 and TX2 are 6.01mW and 3.44mW, respectively. The performances of the transmitters are summarized and compared with the state-of-the-art transmitters using TX equalizer for recent memory interfaces and on-chip serial links in Tables I and II. Using the proposed 1T1C bootstrap serializer and pre-driver, high energy efficiency is achieved while satisfying the BER < 10-12 condition. The measured energy efficiency of the TX1 is 0.47-pJ/b (Table 4.1) and the energy efficiency per channel length of the TX2 is 40.8-fJ/b/mm (Table 4.2), which is the best among the state-of-the-art memory interfaces and on-chip serial links, respectively.



| Memory Interface      | НВМ         | LPDDR     |
|-----------------------|-------------|-----------|
| Output Impedance      | 20Ω         | 50Ω       |
| 16:2 SER              | 0.542mW     | 0.758mW   |
| 2:1 SER+Pre-Driver    | 2.488mW     | 3.623mW   |
| Drv                   | 0.414mW     | 1.548mW   |
| Digital + Tx Clk path | 33.0mW      | 33mW      |
| Rx + Rx Clk path      | 19.8mW      | х         |
| Tx Total              | 3.444mW     | 5.929mW   |
| Energy efficiency     | 40.8fJ/b/mm | 0.463pJ/b |

\*Based on post-layout simulation results

Fig. 4.7. Power breakdown of the proposed transmitters.

|            |                          | СО                     | COMPARISON WITH MEMORY INTERFACES | H MEMORY INT     | ERFACES         |                  |           |
|------------|--------------------------|------------------------|-----------------------------------|------------------|-----------------|------------------|-----------|
|            |                          | Park,<br>ISSCC'23 [1]  | Jeong,<br>SOVC'20 [2]             | Moon<br>ISSCC'22 | Kang<br>JSSC'22 | Chiu<br>ISSCC'20 | This work |
| Tec        | Technology               | 40nm CMOS              | 65nm CMOS                         | 28nm LPP         | 28nm CMOS       | 65nm CMOS        | 40nm CMOS |
| Data r     | Data rate [Gb/s]         | 32                     | 28                                | 20               | 21              | 32               | 12.8      |
| Sig        | Signalling               | NRZ                    | PAM-4                             | NRZ              | Duobinary       | PAM-4            | NRZ       |
| Supply     | VDD [V]                  | 1                      | 1                                 | 1.1              | 1               | 1.2              | 0.9       |
| Voltage    | VDDQ [V]                 | 0.6                    | 0.8                               | 1.1              | 0.8             | 1.2              | 0.5       |
| TX eq      | TX equalization          | 2-tap pre-<br>emphasis | 2-tap pre-<br>emphasis            | 4-tap AFFE       | 3-tap FFE       | 3-tap FFE        | 2-tap FFE |
| Driv       | Driver type              | PN-over-NP             | N-over-N                          | Inverter         | SST             | SST              | N-over-N  |
| Energy eff | Energy efficiency [pJ/b] | 0.51                   | 0.58                              | 1.18             | 0.9             | 0.97             | 0.47      |

| 40.8      | 254            | 42.9                | 97.6                        | 47.1                                             | 49.1                 | Energy efficiency [fJ/b/mm] | Energy effi |
|-----------|----------------|---------------------|-----------------------------|--------------------------------------------------|----------------------|-----------------------------|-------------|
| 6.6       | 6              | 10                  | 5                           | 5.6                                              | 6                    | Channel Length [mm]         | Channe      |
| N-over-N  | Inverter       | SST                 | Capacitive                  | Capacitive                                       | N-over-N             | Driver type                 | Dr          |
| 2-tap FFE | 2-tap FFE      | Passive EQ          | 3-tap FFE                   | 2-tap FFE                                        | 2-tap FFE            | TX equalization             | TX e        |
| 0.25      | 1.2            | 1                   | 0.45                        | 1                                                | 0.6                  | VDDQ [V]                    | Voltage     |
| 0.75      | 1.2            | 1                   | 1                           | 1                                                | 1.1                  | VDD [V]                     | Supply      |
| NRZ       | NRZ            | NRZ                 | NRZ                         | NRZ                                              | PAM-4                | Signalling                  | Si          |
| 12.8      | 4              | 18                  | 10                          | 12                                               | 12                   | Data rate [Gb/s]            | Data        |
| 40nm CMOS | 65nm CMOS      | 28nm CMOS           | 65nm CMOS                   | 65nm CMOS                                        | 40nm CMOS            | Technology                  | Te          |
| This work | Ko<br>ISSCC'20 | Dehlaghi<br>JSSC'16 | Wei<br>JSSC'18              | Lee,<br>ISSCC'22                                 | Park,<br>SOVC'22 [3] |                             |             |
|           |                | IAL LINKS           | TABLE II<br>TH ON-CHIP SERI | TABLE II<br>COMPARISON WITH ON-CHIP SERIAL LINKS | СО                   |                             |             |

# Chapter 5

# Conclusion

In this thesis, the 12.8-Gb/s single-ended transmitters with a 1T1C bootstrap technique for low-power memory interfaces in a 40-nm CMOS process are proposed. The proposed bootstrap serializer and pre-driver have enabled the N-over-N driver to achieve a sufficient eye margin to achieve BER<10-12 and a high data rate at low VDD and VDDQ while reducing power consumption by 13% compared to conventional transmitters. Both the transmitter for the HBM interface driving a 6.6mm on-chip channel and the transmitter for the LPDDR interface driving a 3mm FR-4 trace achieve the highest energy efficiency compared to the state-of-the-art transmitters for memory interfaces, with energy efficiencies of 40.8-fJ/b/mm and 0.47-pJ/b, respectively.

## **Bibliography**

- [1] J. -H. Park et al., "A 32Gb/s/pin 0.51 pJ/b Single-Ended Resistor-less Impedance-Matched Transmitter with a T-Coil-Based Edge-Boosting Equalizer in 40nm CMOS," 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2023, pp. 410-412..
- [2] Y. -U. Jeong et al., "A 28-Gb/s/pin PAM-4 Single-Ended Transmitter with High-Linearity and Impedance-Matched Driver and 3-Point ZQ Calibration for Memory Interfaces," SOVC, 2020, pp. 1-2.
- [3] J. -H. Park et al., "A 68.7-fJ/b/mm 375-GB/s/mm Single-Ended PAM-4 Interface with Per-Pin Training Sequence for the Next-Generation HBM Controller," 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 2022, pp. 150-151.
- [4] K. Chae et al., "A 4nm 1.15TB/s HBM3 Interface with Resistor-Tuned Offset-Calibration and In-Situ Margin-Detection," 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2023, pp. 1-3.
- [5] J. -Y. Kim et al., "A Low Power TSV I/O with Data Rate up to 10 Gb/s for Next Generation HBM," 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 2022,

pp. 152-153.

- [6] K. -S. Ha et al., "23.1 A 7.5Gb/s/pin LPDDR5 SDRAM With WCK Clocking and Non-Target ODT for High Speed and With DVFS, Internal Data Copy, and Deep-Sleep Mode for Low Power," 2019 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2019, pp. 378-380.
- [7] Y. Ho and C. Su, "A 0.1–0.3 V 40–123 fJ/bit/ch On-Chip Data Link With ISI-Suppressed Bootstrapped Repeaters," in IEEE Journal of Solid-State Circuits, vol. 47, no. 5, pp. 1242-1251, May 2012.
- [8] J. L. Zerbe et al., "Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell," in IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2121-2130, Dec. 2003, doi: 10.1109/JSSC.2003.818572.
- [9] B. Jacob et al., "MEMORY SYSTEMS Cache, DRAM, Disk", Morgan Kaufmann, 1<sup>st</sup> edition, Sep 2007.
- [10] AMD.com, Infographic: Introducing HBM. [Online] [Accessed on 2015] https://www.amd.com/en/technologies/hbm.
- [11] M. Mansuri et al., "A scalable 0.128-to-1Tb/s 0.8-to-2.6pJ/b 64-lane parallel I/O in 32nm CMOS," 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, USA, 2013,

pp. 402-403.

- [12] T. M. Hollis et al., "Recent Evolution in the DRAM Interface: Mile-Markers Along Memory Lane," in IEEE Solid-State Circuits Magazine, vol. 11, no. 2, pp. 14-30, Spring 2019
- [13] JEDEC, Advanced Mobile Memory Technology. [Online] [Accessed on May.2013], https://www.jedec.org/sites/default/files/JY\_Choi\_Mobile \_Forum\_May\_2013.pdf.
- [14] D. Gong et al., "Development of A 16:1 serializer for data transmission at 5Gbps," in Topical workshop on electronics in particle physics, pp. 21-25, Sep. 2019.
- [15] H. Ju., "A study on Low-Power, High-Speed PAM-4 Transmitter with Current-Driven Feedback Driver," Doctor Dissertation, Seoul National University.
- [16] Bae, W. Supply-Scalable High-Speed I/O Interfaces. Electronics 2020, 9, 1315.
- [17] Y. Chang, A. Manian, L. Kong and B. Razavi, "An 80-Gb/s 44-mW Wireline PAM4 Transmitter," in IEEE Journal of Solid-State Circuits, vol. 53, no. 8, pp. 2214-2226, Aug. 2018,
- [18] J. Kim et al., "3.5 A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode

transmitter in 14nm CMOS," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, USA, 2015, pp. 1-3.

- [19] M. Kossel et al., "A T-Coil-Enhanced 8.5 Gb/s High-Swing SST Transmitter in 65 nm Bulk CMOS With ≪-16 dB Return Loss Over 10 GHz Bandwidth," in IEEE Journal of Solid-State Circuits, vol. 43, no. 12, pp. 2905-2920, Dec. 2008.
- [20] T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto and O. Watanabe, "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," 1992 Symposium on VLSI Circuits Digest of Technical Papers, Seattle, WA, USA, 1992, pp. 28-29.
- [21] B. Razavi, "The StrongARM Latch [A Circuit for All Seasons]," in IEEE Solid-State Circuits Magazine, vol. 7, no. 2, pp. 12-17, Spring 2015.
- [22] P. M. Figueiredo and J. C. Vital, "Kickback noise reduction techniques for CMOS latched comparators," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 53, no. 7, pp. 541-545, July 2006.
- [23] T. Cho and P. Gray, "A 10 b 20 Msample/s 35 mW pipeline A/D converter", IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 166-172, Mar. 1995.
- [24] W. Song, et al., "A 10-b 20-Msample/s low-power CMOS ADC", IEEEJ. Solid-State Circuits, vol. 30, no. 5, pp. 514-521, May 1995.

# 초 록

본 논문은 저전력 메모리 인터페이스를 위한 1T1C 부트스트랩 기술을 적용한 단일 종단 송신기를 제안한다. 제안한 부트스트랩 직렬기 및 사전 드라이버를 통해 N-over-N 드라이버가 BER<10<sup>-12</sup> 를 달성하기에 충분한 eye 마진을 가지도록 하고, 낮은 VDD 및 VDDQ 에서 높은 데이터 속도를 가질 수 있다. 부트스트랩 송신기는 기존 송신기에 비해 13%의 전력을 절약한다. 6.6mm on-chip 채널을 구동하는 고대역폭 메모리 인터페이스용 송신기는 0.75-V VDD 및 0.25-V VDDQ 에서 40.8-fj/b/mm 의 에너지 효율을 달성한다. 3mm FR-4 trace 를 구동하는 저전력 DDR 인터페이스용 송신기는 0.9-V VDD 및 0.5-V VDDQ 에서 0.47-pj/b 의 에너지 효율을 얻는다. 두 송신 기 모두 최근 발표된 메모리 인터페이스 및 on-chip 직렬 링크용 송신기 들 대비 가장 높은 에너지 효율을 갖는다.

주요어 : 고대역폭 메모리, 저전력 DDR, 메모리 인터페이스, 단일 종단 송신기, 부트스트랩 직렬기와 사전 드라이버

학 번 : 2021-28627