



PH.D. DISSERTATION

# A DESIGN OF QUARTER-RATE TRANSMITTER USING SINGLE-ENDED SIGNALING FOR MEMORY INTERFACES

메모리 인터페이스를 위한 단일 종단 신호를 사용하는 쿼터 레이트 송신기 설계

by

Joo-Hyung Chae

February 2019

Department of Electrical Engineering and Computer Science College of Engineering Seoul National University

### ABSTRACT

## A DESIGN OF QUARTER-RATE TRANSMITTER USING SINGLE-ENDED SIGNALING FOR MEMORY INTERFACES

JOO-HYUNG CHAE DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE COLLEGE OF ENGINEERING SEOUL NATIONAL UNIVERSITY

A quarter-rate transmitter using single-ended signaling for memory interfaces has been presented. With the increasing demands for higher memory bandwidth, we have claimed following points to raise the data-rate per pin.

First, we have adopted a quarter-rate architecture because it has a more relaxed timing margin on its critical path, lower simultaneous switching noise, power consumption, and clock frequency, compared to full-rate and half-rate designs.

Second, a quadrature clock corrector uses relaxation oscillators to detect duty-cycle and quadrature phase errors by transforming them into pairs of frequencies, which are then digitized and compared. It achieves good detection accuracy and can detect a wide range of duty-cycle and quadrature phase errors. The prototype is implemented in a 55nm CMOS process with a supply voltage of 1.2V and occupies an area of 0.003mm<sup>2</sup>. The experimental results show that the operation range is from 1GHz to 3GHz, the power efficiency is 0.79mW/GHz, the maximum duty-cycle error is 0.8% at 3GHz, and the maximum quadrature phase error is 1.1° at 3GHz.

Third, we have presented a 4:1 overlapped time-division multiplexing driver combined with a serializer timing adjuster. The final 4:1 serialization required in a quarterrate transmitter is performed by this overlapped time-division multiplexing driver containing four unit drivers. Two of four unit drivers output two identical 1UI full-rate DQ signals simultaneously and these signals are merged while they perform final serialization. This reduces the output capacitance. Correct timing of this serialization process is maintained by adaptive alignment of the four phases of the clock signal. Incorporated in the 12.8Gb/s quarter-rate transmitter, a prototype has been implemented in a 55nm CMOS technology. A single-ended output swing of this transmitter is 400~600mV<sub>pp</sub>, and it has an energy efficiency of 1.8pJ/bit.

Finally, we have combined the merits of 1-tap pull-up amplitude equalization with 4tap pull-down phase equalization to compensate for channel losses without significantly raising the power consumption. This scheme has been incorporated in a quarter-rate transmitter for memory interfaces. Fabricated in a 65nm CMOS process, a prototype performs single-ended signaling with a data-rate of 16Gb/s and a channel loss of -14.7dB. Despite having two equalization schemes, its energy efficiency is only 1.04pJ/bit. **Keywords**: Memory interface, quarter-rate transmitter, duty-cycle corrector, quadrature clock corrector, overlapped time-division multiplexing driver, output driver, equalization

Student Number: 2012-20870

# **CONTENTS**

| ABSTRACT         | 1                                                                     |
|------------------|-----------------------------------------------------------------------|
| CONTENTS         | 4                                                                     |
| LIST OF FIGURES  | 57                                                                    |
| LIST OF TABLE    |                                                                       |
| CHAPTER 1        | 1                                                                     |
| INTRODUCTION     | 1                                                                     |
| 1.1              | MOTIVATION1                                                           |
| 1.2              | DESIGN CONSIDERATIONS                                                 |
| 1.2.1            | QUARTER-RATE ARCHITECTURE                                             |
| 1.2.2<br>Quadrat | DUTY-CYCLE AND QUADRATURE PHASE CORRECTION FOR<br>URE CLOCK SIGNALS11 |
| 1.2.3            | 4:1 SERIALIZATION14                                                   |
| 1.2.4            | POWER-EFFICIENT EQUALIZATION FOR IMPROVED SIGNAL INTEGRITY            |
| 1.2.5            | SUMMARY19                                                             |
| 1.3              | THESIS ORGANIZATION                                                   |
| CHAPTER 2        |                                                                       |
| QUARTER-RATE     | TRANSMITTER FOR MEMORY INTERFACES21                                   |
| 2.1              | OVERALL ARCHITECTURE                                                  |
| CHAPTER 3        |                                                                       |
| QUADRATURE CI    | LOCK CORRECTOR WITH A DUTY-CYCLE AND QUADRATURE                       |
| PHASE DETECTO    | R BASED ON A RELAXATION OSCILLATOR24                                  |
| 3.1              | ARCHITECTURE                                                          |

| 3.2<br>Relaxation   | DUTY-CYCLE AND QUADRATURE PHASE DETECTORS BASED ON A OSCILLATOR                                     |     |
|---------------------|-----------------------------------------------------------------------------------------------------|-----|
| 3.3<br>Detector     | EFFECTIVENESS OF DUTY-CYCLE AND QUADRATURE PHASE                                                    | .35 |
| 3.4<br>Detector     | MISMATCH EFFECTS OF DUTY-CYCLE AND QUADRATURE PHASE                                                 | .36 |
| 3.5                 | DUTY-CYCLE AND PHASE ADJUSTER                                                                       | .38 |
| CHAPTER 4           |                                                                                                     | .39 |
| 4:1 OVERLAPPED      | TIME-DIVISION MULTIPLEXING DRIVER COMBINED WITH A                                                   |     |
| SERIALIZER TIM      | ING ADJUSTER                                                                                        | .39 |
| 4.1                 | PROPOSED DRIVER TOPOLOGY                                                                            | .40 |
| 4.1.1               | 2:1 SERIALIZER IN 4:1 OVERLAPPED MULTIPLEXING DRIVER                                                | .43 |
| 4.2                 | COMPARISON OF OUTPUT CAPACITANCE                                                                    | .46 |
| 4.3                 | Serializer Timing Adjuster                                                                          | .50 |
| 4.4                 | 32:4 Serializer                                                                                     | .55 |
| CHAPTER 5           |                                                                                                     | .57 |
| MIXED PULL-UP       | AMPLITUDE AND PULL-DOWN PHASE EQUALIZATION                                                          | .57 |
| 5.1                 | MIXED EQUALIZATION FOR MEMORY INTERFACE                                                             | .57 |
| 5.2<br>Block        | AMPLITUDE EQUALIZATION AND PHASE EQUALIZATION CONTROL                                               | .62 |
| 5.3                 | AMPLITUDE EQUALIZATION PULSE GENERATOR                                                              | .64 |
| 5.4                 | DATA AND CLOCK DELAY LINE                                                                           | .66 |
| CHAPTER 6           |                                                                                                     | .68 |
| EXPERIMENTAL ]      | RESULTS                                                                                             | .68 |
| 6.1<br>Quadrature   | QUADRATURE CLOCK CORRECTOR WITH A DUTY-CYCLE AND<br>PHASE DETECTOR BASED ON A RELAXATION OSCILLATOR | .68 |
| 6.2<br>With a Seria | OVERLAPPED TIME-DIVISION MULTIPLEXING DRIVER COMBINED<br>LIZER TIMING ADJUSTER                      | .76 |
| 6.3<br>Equalizatio  | MIXED PULL-UP AMPLITUDE AND PULL-DOWN PHASE                                                         | .82 |
| -                   |                                                                                                     |     |

| CONCLUSION   |  |
|--------------|--|
|              |  |
| BIBLIOGRAPHY |  |

# **LIST OF FIGURES**

| Figure 1.1.1. Classification of DRAM according to the application1                              |
|-------------------------------------------------------------------------------------------------|
| Figure 1.1.2. Market trend in (a) virtual reality, (b) artificial intelligence, (c) internet of |
| things, and (d) autonomous vehicles2                                                            |
| Figure 1.1.3. Quarterly DRAM revenues from 2011 to 2017Q1                                       |
| Figure 1.1.4. DRAM revenues and market shares (a) from 2017Q3 to 2017Q4 and (b)                 |
| from 2018Q1 to 2018Q24                                                                          |
| Figure 1.1.5. DRAM data-bandwidth trends5                                                       |
| Figure 1.1.6. Per-pin data-rate according to year for memory interfaces                         |
| Figure 1.1.7. Architecture of the memory interface7                                             |
| Figure 1.1.8. Power breakdown of the memory interface                                           |
| Figure 1.2.1.1. AC noise effect in single-ended signaling                                       |
| Figure 1.2.1.2. Frequency relationship between the DQ and clock signal10                        |
| Figure 1.2.2.1. Comparison of a half-rate and quarter-rate architecture11                       |
| Figure 1.2.2.2. Effect of duty-cycle and quadrature phase errors on data window                 |
| Figure 1.2.3.1. Conventional architectures for a quarter-rate transmitter using a 4:1           |
| serializer, a pulse generator, a pre-driver, and a driver                                       |
| Figure 1.2.3.2 Conventional architectures for a quarter-rate transmitter using a pre-driver,    |
| a pulse generator, and a 4:1 multiplexing driver15                                              |
| Figure 1.2.3.3. Timing diagram of the architecture of Figure 1.2.3.1 and Figure 1.2.3.2. 16     |
| Figure 1.2.4.1. Memory interface configuration                                                  |
| Figure 2.1.1. Architecture of our quarter-rate transmitter, together with the clock             |
| Figure 3.1.1 Architecture of the quadrature clock corrector25                                   |
| Figure 3.1.2 Operation sequence of our quadrature clock corrector                               |
| Figure 3.2.1. Block diagram of a relaxation oscillator-based duty-cycle detector                |
| Figure 3.2.2. Timing diagram of a relaxation oscillator-based duty-cycle detector of            |

| ICK/IBCK                                                                                         |
|--------------------------------------------------------------------------------------------------|
| Figure 3.2.3. Conversion of a quadrature phase to a duty-cycle using XOR and XNOR logic          |
| (a) when ICK and QCK are in perfect quadrature and (b) when the quadrature                       |
| phase error occurs between ICK and QCK                                                           |
| Figure 3.2.4. Block diagram of a quadrature phase detector, based on a relaxation oscillator.    |
|                                                                                                  |
| Figure 3.2.5. Operation of a relaxation oscillator in the quadrature phase detector (a) on left, |
| which performs XOR function, and (b) on right, which performs XNOR                               |
| function                                                                                         |
| Figure 3.3.1. Simulated output frequency of a relaxation oscillator in (a) the duty-cycle        |
| detector and (b) the quadrature phase detector                                                   |
| Figure 3.4.1. Monte Carlo simulation results (500 runs) of the frequency difference              |
| between RCK1 and RCK2, to show mismatch effect from different random                             |
| variables in our duty-cycle detector when the duty-cycle of input clock is 50%                   |
| at 3GHz                                                                                          |
| Figure 3.4.2. Monte Carlo simulation results (500 runs) of the frequency difference              |
| between RCK1 and RCK2, to show mismatch effect from different random                             |
| variables in our quadrature phase detector when the quadrature phase of input                    |
| clock is 90° at 3GHz                                                                             |
| Figure 3.5.1. Block diagram of (a) the duty-cycle adjuster and (b) the phase adjuster38          |
| Figure 4.1.1. Quarter-rate transmitter architecture using our 4:1 OVTDM driver40                 |
| Figure 4.1.2. Operation and timing of 4:1 OVTDM driver                                           |
| Figure 4.1.1.1. Tri-state inverter based 2:1 serializer                                          |
| Figure 4.1.1.2. Simulated waveforms of outputs of a 2:1 serializer, which is based on tri-       |
| state inverter, and a driver44                                                                   |
| Figure 4.1.1.3. Split-load based 2:1 serializer45                                                |
| Figure 4.1.1.4. Simulated waveforms of outputs of a 2:1 serializer, which is based on split-     |
| load, and a driver45                                                                             |
| Figure 4.2.1. Comparison of a conventional 4:1 multiplexing driver and 4:1 OVTDM driver,         |

| in terms of size, output resistance, and output capacitance                                         |
|-----------------------------------------------------------------------------------------------------|
| Figure 4.2.2. Comparison of a 4:1 MUX POD driver, 4:1 OVTDM POD driver, and 4:1                     |
| OVTDM LVSTL driver, in terms of W/L, output capacitance, and output                                 |
| resistance47                                                                                        |
| Figure 4.2.3. Comparison of the output capacitance through simulated (a) rise time and (b)          |
| fall time, of a 4:1 MUX POD driver, 4:1 OVTDM POD driver, and 4:1 OVTDM                             |
| LVSTL driver                                                                                        |
| Figure 4.3.1. Timing diagrams for 4:1 overlapped multiplexing driver when (a) the timing            |
| of serialization is correct, and when (b) incorrect timing fails to provide an                      |
| adequate timing margin for serialization                                                            |
| Figure 4.3.2. Block diagram of a serializer timing adjuster                                         |
| Figure 4.3.3. Timing diagram in the serializer timing adjuster when phase drifting occurs.          |
|                                                                                                     |
| Figure 4.3.4. Timing diagram in the serializer timing adjuster53                                    |
| Figure 4.3.5. Simulated eye diagrams for 4:1 OVTDM driver at 12.8Gb/s53                             |
| Figure 4.4.1. Alternative designs for the 2:1 serializers used in                                   |
| Figure 4.4.2. Alternative designs for the 2:1 serializers used in                                   |
| Figure 4.4.3. Comparison of power consumption                                                       |
| Figure 5.1.1. Waveforms of the transmitter output $OUT_{Tx}$ , the micro-bump output $OUT_{Bump}$ , |
| and the receiver input $IN_{RX}$ of Figure 1.2.4.1 when there is no equalization and                |
| when the amplitude and phase equalization and our mixed equalization are                            |
| adopted                                                                                             |
| Figure 5.1.2. Transmission of (a) transition bit and (b) non-transition bit when pull-up            |
| amplitude equalization is adopted                                                                   |
| Figure 5.1.3. Comparison of various equalization techniques60                                       |
| Figure 5.2.1. Block diagrams and generation of the control signals of the (a) pull-up               |
| amplitude and (b) pull-down phase equalization, for mixed equalization62                            |
| Figure 5.3.1. Block diagram of the amplitude equalization pulse generator                           |
| Figure 5.3.2. The operation example of the pull-up amplitude equalization                           |

| Figure 5.4.1. Block diagram of the data and clock delay line                                           |
|--------------------------------------------------------------------------------------------------------|
| Figure 5.4.2. The operation example of the pull-down phase equalization                                |
| Figure 6.1.1. Die micrograph with a magnified layout                                                   |
| Figure 6.1.2. Measurement setup                                                                        |
| Figure 6.1.3. Measured waveforms of uncorrected output clock signals and corrected                     |
| output clock signals (a) at 1GHz and (b) at 3GHz70                                                     |
| Figure 6.1.4. Measured results of duty-cycle correction (a) at 1GHz and (b) at 3GHz71                  |
| Figure 6.1.5. Measured results of quadrature phase correction (a) at 1GHz and (b) at 3GHz.             |
| Figure 6.1.6. RMS and peak-to-peak jitter of (a) the uncorrected clock signal ICK <sub>CH,IN</sub> and |
| (b) the corrected clock signal ICK <sub>CH,OUT</sub> , at 3GHz73                                       |
| Figure 6.1.7 Power breakdown of the quadrature clock corrector, running at 3GHz. At this               |
| frequency the total power consumption is 2.08mW74                                                      |
| Figure 6.2.1. Die micrograph with the magnified layout of our 4:1 OVTDM driver                         |
| incorporated in the quarter-rate transmitter, measurement setup, and the channel                       |
| loss of a 2.6" FR4 trace, an SMA connector, and a 35" SMA cable76                                      |
| Figure 6.2.2. Measured results of 4-phase clock correction at 12.8Gb/s77                               |
| Figure 6.2.3. Measured eye diagrams of the single-ended data output signal at 12.8Gb/s.                |
|                                                                                                        |
| Figure 6.2.4. Measured eye width and height at different data-rates, with and without the              |
| serializer timing adjuster79                                                                           |
| Figure 6.2.5. Power breakdown of the data path in our quarter-rate transmitter at 12.8Gb/s.            |
|                                                                                                        |
| Figure 6.3.1. Die micrograph                                                                           |
| Figure 6.3.2. Measurement setup                                                                        |
| Figure 6.3.3. Measured channel loss characteristics                                                    |
| Figure 6.3.4. Measured output of the transmitter after passing through the channel, using              |
| a fixed data pattern at 16Gb/s without equalization                                                    |
| Figure 6.3.5. Measured output of the transmitter after passing through the channel, using              |

| a fixed data pattern at 16Gb/s with mixed equalization                                               |
|------------------------------------------------------------------------------------------------------|
| Figure 6.3.6. Measured eye diagrams using the PRBS7 pattern, without and with duty-                  |
| cycle and quadrature phase correction at 16Gb/s                                                      |
| Figure 6.3.7. Measured eye diagrams at 16Gb/s, without and with correction of 4:1                    |
| serialization timing by the serializer timing adjuster                                               |
| Figure 6.3.8. Measured eye diagrams before and after channel, (a) without equalization,              |
| (b) with pull-up amplitude equalization, (c) with pull-down phase equalization,                      |
| and (d) with mixed equalization                                                                      |
| Figure 6.3.9. Measured eye diagrams with a PRBS7 data pattern at 16Gb/s: without                     |
| equalization, duty-cycle correction, and quadrature phase correction, when                           |
| $V_{DD}$ =1.0V and $V_{DDQ}$ =0.6V, with mixed equalization, duty-cycle correction, and              |
| quadrature phase correction, at the same voltages, and at a reduced $V_{\text{DD}}$ of 0.9V          |
| and V <sub>DDQ</sub> of 0.3V90                                                                       |
| Figure 6.3.10. Power breakdown of our quarter-rate transmitter at 16Gb/s, using a $V_{\text{DD}}$ of |
| 0.9V and V <sub>DDQ</sub> of 0.3V92                                                                  |
| Figure 6.3.11. Area breakdown of our quarter-rate transmitter                                        |

# LIST OF TABLE

| Table 6.1.1 Performance Summary and Comparison with Other Recent Quadrature Clock  |
|------------------------------------------------------------------------------------|
| Correctors                                                                         |
| Table 6.2.1 Performance Summary and Comparison with Other Quarter-Rate Transmitter |
| Designs                                                                            |
| Table 6.3.1 Performance Summary and Comparison with Other State-Of-The-Art         |
| Transmitter Designs                                                                |

# **CHAPTER 1**

## INTRODUCTION

#### **1.1 MOTIVATION**



Figure 1.1.1. Classification of DRAM according to the application.

Dynamic random-access memory (DRAM) is a kind of random access memory (RAM), which is a storage device for storing each bit constituting information in capacitors. It has a characteristic of a volatile memory because it requires a constant supply of power to maintain data. A recent DRAM is synchronized by an externally supplied clock signal to operate an external pin interface, which is called a synchronous dynamic random-access memory (SDRAM). The DRAM can be categorized into three types depending on an





Figure 1.1.2. Market trend in (a) virtual reality, (b) artificial intelligence,

(c) internet of things, and (d) autonomous vehicles.

application: computing DRAM, graphics DRAM, and mobile DRAM [1.1.1], as shown in Figure 1.1.1.

In addition to the existing applications, the DRAM has been recently adopted in applications such as virtual reality (VR), augmented reality (AR), artificial intelligence (AI),



Figure 1.1.3. Quarterly DRAM revenues from 2011 to 2017Q1.

internet of things (IoT), and autonomous vehicles. As the demand for these applications increases and the market grows, as shown in Figure 1.1.2, the demand for the DRAM is also increasing. Figure 1.1.3 shows quarterly DRAM revenues worldwide from 2011 to 2017Q1, by manufacturer, and Figure 1.1.4 shows DRAM revenues and market shares from 2017Q3 to 2018Q2. We can see that the DRAM demand has continued to increase. In particular, since 2016, its demand has increased sharply with the growth of application markets mentioned above. As the market for virtual reality, augmented reality, artificial gradually supporting a variety of high-performance functions. To cope with this phenomenon, it is necessary to increase the bandwidth of the DRAM [1.1.2].

| Ranking | Company      | Revenue |        |       | Market Share |        |
|---------|--------------|---------|--------|-------|--------------|--------|
|         |              | 4Q17    | 3Q17   | QoQ   | 4Q17         | 3Q17   |
| 1       | Samsung      | 10,066  | 8,790  | 14.5% | 46.0%        | 45.8%  |
| 2       | SK Hynix     | 6,291   | 5,514  | 14.1% | 28.7%        | 28.7%  |
| 3       | Micron Group | 4,562   | 4,023  | 13.4% | 20.8%        | 21.0%  |
| 4       | Nanya        | 558     | 439    | 26.9% | 2.5%         | 2.3%   |
| 5       | Winbond      | 173     | 177    | -2.2% | 0.8%         | 0.9%   |
| 6       | Powerchip    | 104     | 103    | 0.6%  | 0.5%         | 0.5%   |
|         | Others       | 144     | 135    | 7.0%  | 0.7%         | 0.7%   |
|         | Total        | 21,898  | 19,181 | 14.2% | 100.0%       | 100.0% |

#### (a)

| Ranking | Conservation of | Revenue |              |        | Market Share |        |
|---------|-----------------|---------|--------------|--------|--------------|--------|
|         | Company         | 2Q18    | 1Q18         | QoQ    | 2Q18         | 1Q18   |
| 1       | Samsung         | 11,207  | 10,360       | 8.2%   | 43.6%        | 44.9%  |
| 2       | SK Hynix        | 7,685   | 6,432        | 19.5%  | 29.9%        | 27.9%  |
| 3       | Micron Group    | 5,541   | 5,213        | 6.3%   | 21.6%        | 22.6%  |
| 4       | Nanya 🗾         | 826     | 642          | 28.6%  | 3.2%         | 2.8%   |
| 5       | Winbond         | 190     | 175          | 8.7%   | 0.7%         | 0.8%   |
| 6       | Powerchip       | 97      | 113          | -13.5% | 0.4%         | 0.5%   |
|         | Others          | 144     | 1 <b>4</b> 3 | 1.2%   | 0.6%         | 0.6%   |
|         | Total           | 25,691  | 23,076       | 11.3%  | 100.0%       | 100.0% |

(b)

Figure 1.1.4. DRAM revenues and market shares (a) from 2017Q3 to 2017Q4

and (b) from 2018Q1 to 2018Q2.



Figure 1.1.5. DRAM data-bandwidth trends.

Figure 1.1.5 shows DRAM data-bandwidth trends. The bandwidth of DRAM is continuously increasing, and a new product called high-bandwidth memory (HBM) has been released since 2014 to meet with this trend. The bandwidth of the DRAM can be increased by adopting a dual-inline memory module (DIMM) configuration, by raising the number of I/O pins, or by using a higher data-rate per pin. However, a DIMM configuration requires multi-drop signaling, that causes signal attenuation, reflection, and crosstalk on the connector and motherboard channels, which reduces performance. This approach also



Figure 1.1.6. Per-pin data-rate according to year for memory interfaces.

requires issues to be addressed at the architecture level because the interface needs to manage timing on many memory modules [1.1.3]. Introducing more I/O pins increases chip cost and routing congestion on the printed circuit board (PCB) [1.1.4]. Thus, increasing the data-rate per pin in memory interface has been the industry trend for many years, as shown in Figure 1.1.6, and standards have evolved in a way that focuses on interface signaling and clocking [1.1.5].



Figure 1.1.7. Architecture of the memory interface.

Figure 1.1.7 shows the one of the memory interface architecture [1.1.6], [1.1.7]. In memory interfaces, single-ended signaling is adopted due to the area and cost, and bidirectional transmission is used for DQ and DQS signal. Using the DQS signal, memory interfaces can use strobe-based source synchronous signaling The configuration between controller and memory is asymmetric because they have asymmetric architecture, termination, and signaling. Before the normal operation, various training operation is performed in controller.

In memory interfaces using single ended signaling, it is difficult to increase the datarate per pin due to the increased channel loss, simultaneous switching noise, and crosstalk.



Figure 1.1.8. Power breakdown of the memory interface.

In particular, inter-symbol interference is the biggest cause, so equalization is required to compensate for it. In memory interfaces, the most of required equalization is performed in the transmitter, while the receiver has a simple structure and performs only no or minimal equalization due to noise, area, and timing issue.

Figure 1.1.8 shows an example of power breakdown of the memory interface [1.1.8]. There is a significant power consumption in the transceiver (TRX) and the clock distribution including clock correcting blocks such as the duty-cycle corrector (DCC) and the quadrature error corrector (QEC). Especially in the transceiver, the power efficiency of the transmitter (TX) is worse than that of the receiver (RX) [1.1.9], [1.1.10] because most equalization is performed in transmitter and the output driver consumes large current. Therefore, we have focused on a design of a transmitter for memory interfaces.

#### **1.2 DESIGN CONSIDERATIONS**

#### **1.2.1 QUARTER-RATE ARCHITECTURE**



Figure 1.2.1.1. AC noise effect in single-ended signaling.

As described in Chapter 1.1, one way to increase memory bandwidth is to use a higher data-rate per pin: for example, a DDR4 device operates at up to 3.2Gb/s/pin, and DDR5 devices will operate at up to 6.4Gb/s/pin. This increase in data-rate requires more data buffers and higher power consumption, which in turn increases electromagnetic interference and jitter accumulation [1.2.1]. In addition, increased simultaneous switching noise also affect signal integrity, because single-ended signaling is used in most memory interfaces to reduce the pin-count [1.2.2]. In the receiver, a reference voltage generator, which makes the voltage  $V_{REF}$  that is the basis of the data decision, is shared by multiple DQ pins, and the supply noise is coupled with the  $V_{REF}$ . However, AC noise due to the simultaneous switching noise in the DQ path cannot be tracked by the  $V_{REF}$  because most of AC noise is generated from the transmitter, as shown in Figure 1.2.1.1.



Figure 1.2.1.2. Frequency relationship between the DQ and clock signal in full-rate, half-rate, and quarter-rate designs.

Figure 1.2.1.2 shows the frequency relationship between the DQ and clock (CK) signal in full-rate, half-rate, and quarter-rate designs. Although a quarter-rate design requires four clock phases CK<sub>0</sub>, CK<sub>90</sub>, CK<sub>180</sub>, and CK<sub>270</sub>, the quarter-rate transmitter has a more relaxed timing margin on its critical path, lower simultaneous switching noise, power consumption, and clock frequency, compared to full-rate [1.2.3] and half-rate designs [1.2.4]. Therefore, the quarter-rate architecture is suitable for the next-generation memory interfaces.

### **1.2.2 DUTY-CYCLE AND QUADRATURE PHASE CORRECTION FOR QUADRATURE CLOCK SIGNALS**

|           | Half-rate | Quarter-rate |
|-----------|-----------|--------------|
| CK tree   | Simple    | Complex      |
| Concern   | DCC       | DCC, IQ skew |
| Frequency | High      | Low          |

Figure 1.2.2.1. Comparison of a half-rate and quarter-rate architecture.

The use of rising and falling edge of a multiphase clock signal and the adoption of a quarter-rate architecture are recently considered for high-performance memory interfaces [1.2.5], [1.2.6]. Figure 1.2.2.1 compares the half-rate structure and the quarter-rate structure. Compared to the half-rate structure, the quarter-rate structure uses lower clock frequencies, which allows more timing margins to be achieved, but the use of more clock phases can make the clock tree relatively complicated. In addition, errors in quadrature (IQ) skew, as well as errors in duty-cycle ratio, affects the performance of the overall memory system. Among the above issues, we focus on correcting errors in the duty-cycle and IQ phase skew.

In a memory interface with a quarter-rate architecture, high-speed differential clock signals are divided by two and become quadrature clock signals [1.2.7]. There can be



Figure 1.2.2.2. Effect of duty-cycle and quadrature phase errors on data window.

subject to duty-cycle and quadrature phase errors in these quadrature clock signals, as they pass down the long clock distribution tree through many clock buffers, where they may be affected by supply and ground noise, unbalanced PMOS and NMOS strengths, and process, voltage, and temperature (PVT) variations. Distortion of the clock signals affects the valid data window on the operation, as shown in Figure 1.2.2.2, increasing bit-error-rate (BER). Thus, various types of quadrature clock correctors [1.2.4], [1.2.8]-[1.2.12] have recently been used to perform both duty-cycle and the quadrature phase compensation. In these quadrature clock correctors, the precision of the duty-cycle and quadrature phase detector is critical because the correction performance depends directly on the detection accuracy of the duty-cycle and quadrature phase [1.2.9], [1.2.13].

Duty-cycle and quadrature phase errors can be detected by analog circuits [1.2.9], in which an integrator is combined with a comparator. However, this type of circuit can be affected by any error in the common-mode or offset voltage in the comparator, matching between capacitors, the precision of the reference circuit, and any mismatch between the pull-up and pull-down current in the integrator. In addition, the range of control and accuracy for the duty-cycle and quadrature phase correction get worse as the supply voltage decreases in the deep-submicron process. A digital phase detector (PD) can be used to find duty-cycle and quadrature phase errors [1.2.7]. This simple design avoids the problems associated with analog circuits, but it can have a nonlinear gain and a large static phase offset. Another development [1.2.9] provides a narrow phase detection window in the digital phase detector. To solve this problem, a sense-amplifier-based phase detector [1.2.10] has been introduced. However, the intrinsic phase offset needs to be further reduced as the operating frequency increases. A time-to-digital converter (TDC) based detector [1.2.11], [1.2.12] has a finer resolution than previous digital designs, but the clock signal can be distorted within the TDC, which can degrade the accuracy. Therefore, the above issues should be addressed in the design of a quadrature clock corrector for memory interfaces.

#### 1.2.3 4:1 SERIALIZATION



Figure 1.2.3.1. Conventional architectures for a quarter-rate transmitter using a 4:1 serializer, a pulse generator, a pre-driver, and a driver.

In a quarter-rate transmitter, 4:1 serializer can be used before the pre-driver [1.2.14]. Figure 1.2.3.1 shows one of the structures using 4:1 serializer: it has a 4:1 serializer (SER), a pulse generator (Gen), a pre-driver, and a driver (Drv). The use of a 4:1 serializer permits a driver with the simple structure; but it also allows inter-symbol interference to occur when data is output at full-rate, due to the large drain capacitance of the 4:1 serializer and the limited bandwidth of the pre-driver [1.2.15].

This issue can be addressed by introducing a 4:1 multiplexing (MUX) driver [1.2.16], [1.2.17]. One of these structures is shown in Figure 1.2.3.2: it has a pre-driver, a pulse generator, and a 4:1 multiplexing driver. This avoids full-rate data signals until the final output pad. The cost of this approach is a quadrupling of the output capacitance of the driver,



Figure 1.2.3.2 Conventional architectures for a quarter-rate transmitter using a predriver, a pulse generator, and a 4:1 multiplexing driver.

because each unit driver has to provide the correct channel impedance termination, which increases dynamic power consumption and limits the output bandwidth. Figure 1.2.3.3 shows the timing diagram of both of these structures. In both of these structures, the four pulse signals P0, P90, P180, and P270 are generated by the pulse generator. The 4:1 serialization can be performed using these pulse signals at the 4:1 serializer in Figure 1.2.3.1 and the 4:1 multiplexing driver in Figure 1.2.3.2. However, the pulse width of the four pulse signals is narrower than those of the clock signals, which limits the achievable bandwidth. A level-shifting pre-driver [1.2.17] has been proposed to reduce I/O capacitance of the 4:1 multiplexing driver, but it needs a large capacitor in the pre-driver, and it is vulnerable to process, voltage, and temperature (PVT) variations. The 4:1 multiplexing driver can be implemented by configuring a 2:1 multiplexing driver in series to reduce I/O capacitance, but it has the drawback of producing high latency [1.2.18]. All these approaches implementing the 4:1 multiplexing driver improve the timing margin of the



Figure 1.2.3.3. Timing diagram of the architecture of Figure 1.2.3.1 and Figure 1.2.3.2.

final serialization. However, as data-rates continue to increase, additional timing margin must be secured. Recent approaches use delay matching buffers [1.2.19], or a clock phase calibration loop [1.2.15] to correct the timing of serialization, but significantly increase power consumption. Therefore, the above issues should be alleviated in the quarter-rate transmitter design for memory interfaces.

### **1.2.4 POWER-EFFICIENT EQUALIZATION FOR IMPROVED SIGNAL** INTEGRITY



Figure 1.2.4.1. Memory interface configuration.

As the data-rate per pin increases, however, the channel loss also increases significantly, deteriorating signal integrity; thus, equalization (EQ) should be used to compensate for compensate for inter-symbol interference (ISI) due to the channel loss [1.2.20], [1.2.21]. Since a continuous-time linear equalizer amplifies crosstalk and high-frequency noise and a multi-tap decision feedback equalizer has a limitation on feedback time, area, and power [1.2.23], [1.2.23], in memory interfaces shown in Figure 1.2.4.1, required equalization is performed mainly into transmitter-side amplitude equalization [1.2.25]-[1.2.27], [1.2.16] with the minimal receiver-side equalizer [1.2.28]. Since transmitter-side amplitude equalization is known to increase the power consumption of the pre-driver and cause excessive current consumption of the output driver, the signaling and

switching power is increased, leading to a lot of simultaneous switching noise [1.2.29]. This adversely affects the single-ended signaling adopted in memory interfaces, making it difficult to lower the supply voltage of the output driver.

To alleviate disadvantages of transmitter-side amplitude equalization, chord signaling [1.2.30], [1.2.31] and the encoding based equalization technique such as pulse-width modulation (PWM) [1.2.32], [1.2.33] have been presented. However, chord signaling is hard to compensate for high loss in the channel and is generalized to differential signaling structure, which is difficult to applicate to memory interfaces using single-ended signaling. PWM encoding requires a high bandwidth in the data path to make a narrow pulse for PWM encoding, resulting in poor energy efficiency and difficulty in raising a data-rate. Recently, various phase equalization schemes [1.2.28], [1.2.34]-[1.2.36] have been presented to mitigate the issues of aforementioned equalization schemes. This equalization can reduce pattern based data-dependent jitter (DDJ) in an energy-efficient manner, improving signal integrity of the output signal and lowering power consumption and simultaneous switching noise. However, it is less effective than amplitude equalization in compensating for channel losses [1.2.36]. Therefore, the above issues should be overcome in the equalization design for memory interfaces.

#### 1.2.5 SUMMARY

In order to alleviate all the above issues, we have considered following points: first, a quarter-rate architecture was adopted to improve the performance of the transmitter by relaxing timing margin on its timing critical path, lowering simultaneous switching noise, power consumption, and clock frequency; second, quadrature clock signals were corrected to improve signal integrity with low power in the quarter-rate architecture by a relaxation oscillator based quadrature clock corrector; third, 4:1 serialization was performed by the overlapped multiplexing driver, which reduces the input and output capacitance; finally, a mixed equalization scheme was presented to compensate for channel losses without significantly raising the power consumption. These design considerations have been verified by measurements of prototype chips, and they are also applicable to a variety of memory interfaces by slightly modifying spec-sensitive parts.

#### **1.3 THESIS ORGANIZATION**

This thesis is organized as follows: in Chapter 2, a quarter-rate transmitter for memory interfaces are introduced; in Chapter 3, the thesis presents the quadrature clock corrector with a duty-cycle and quadrature phase detector based on a relaxation oscillator; in Chapter 4, a 4:1 overlapped time-division multiplexing driver combined with a serializer timing adjuster is presented; in Chapter 5, mixed pull-up amplitude and pull-down phase equalization scheme is introduced; in Chapter 6, experimental results are presented; and in Chapter 7, the thesis is summarized with the discussion of contribution.

### CHAPTER 2

# QUARTER-RATE TRANSMITTER FOR MEMORY INTERFACES

#### 2.1 OVERALL ARCHITECTURE

We have implemented a quarter-rate transmitter for memory interfaces, assuming that a ground termination is used in the receiver side such as low-power double data-rate (LPDDR) memory and high-bandwidth memory (HBM) interfaces. Figure 2.1.1 shows our design for a quarter-rate transmitter, together with a clock (CK) path and a ZQ path to determine the channel impedance termination at the output driver.

The clock path consists of a clock buffer, an IQ generator (Gen), a duty-cycle corrector, and a quadrature (Quad) phase corrector. In the clock buffer, the differential clock signals CK and CKB are amplified and converted to CMOS level, and their duty-cycle is corrected to 50%. Four quadrature clock signals CK<sub>I</sub>, CK<sub>Q</sub>, CK<sub>IB</sub>, and CK<sub>QB</sub> at half the frequency are generated from incoming differential clock signals CK and CKB in the IQ generator, and



Figure 2.1.1. Architecture of our quarter-rate transmitter, together with the clock and ZQ path.

their duty-cycle and quadrature phase errors are corrected by a duty-cycle and quadrature phase corrector. These quadrature clock signals are delivered by the clock tree to each block in the transmitter.

The 32-bit pseudo-random binary sequence (PRBS) generator produces parallel PRBS data in a range of patterns. These parallel data are serialized by the 32:4 serializer, and then serialized data are retimed in the data aligner. A serializer (SER) timing adjuster controls

each phase of the quadrature clock signals to ensure timing margins between data and clock signals during final 4:1 serialization at the output driver. An amplitude equalization (AEQ) control block generates the 4-bit PU signals PU<3:0> required for the 2-tap pull-up amplitude equalization, and this equalization is performed by the amplitude equalization pulse generator. A phase equalization (PEQ) control block generates the 4-bit PD signals PD<3:0> which support the 4-tap pull-down phase equalization by controlling the timing of the rising and falling edges of the data and clock signals in the data and clock delay line (DL). In these two equalization control blocks,  $Co_{AEQ}$  and  $Co_{PEQ}$  signals adjust the equalization strength. In order to reduce the output capacitance and the area of the 4:1 multiplexing output driver, a 4:1 overlapped time-division multiplexing (OVTDM) output driver is adopted. This driver uses a power-isolated low-voltage swing terminated logic to improve energy efficiency 2 and further reduce I/O capacitance. The four parallel data signals from the data aligner pass through the pre-driver to the 4:1 OVTDM driver, which performs 4:1 serialization and transmits the DQ signal at full-rate.

### CHAPTER 3

# QUADRATURE CLOCK CORRECTOR WITH A DUTY-CYCLE AND QUADRATURE PHASE DETECTOR BASED ON A RELAXATION OSCILLATOR

We have addressed the issues described in Chapter 1.2.2, for a quadrature clock corrector which corrects duty-cycle and quadrature phase errors. Our duty-cycle and quadrature phase detector uses relaxation oscillators with a modified input stage to convert the duty-cycle and quadrature phase of the clock signals into frequencies, which are then compared in the digital frequency detector. It achieves good accuracy over a wide range of operating frequencies and can detect a wide range of duty-cycle and quadrature phase errors.

#### **3.1** ARCHITECTURE



Figure 3.1.1 Architecture of the quadrature clock corrector.

Figure 3.1.1 is a block diagram of our quadrature clock corrector, which consists of two duty-cycle detectors, four duty-cycle adjusters, a quadrature phase detector, four phase adjusters, four cross-coupled latches, and several clock buffers. The duty-cycle and quadrature phase detector has the relaxation oscillator structure. Figure 3.1.2 shows the operation sequence of our quadrature clock corrector. First, the cross-coupled latch corrects the phase error of the differential clock signals ICK<sub>IN</sub>-IBCK<sub>IN</sub> and QCK<sub>IN</sub>-QBCK<sub>IN</sub>. Then, duty-cycle detectors measure the error in the duty-cycle of these signals and send duty-cycle control codes Code<sub>Duty,I</sub>, Code<sub>Duty,IB</sub>, Code<sub>Duty,Q</sub>, and Code<sub>Duty,QB</sub> to duty-cycle adjusters, which restore the duty-cycle of each signal to 50%. The quadrature phase errors of ICK<sub>IN</sub>-



1. Correct the differential phase errors of ICK<sub>IN</sub>-IBCK<sub>IN</sub> and QCK<sub>IN</sub>-QBCK<sub>IN</sub>

Figure 3.1.2 Operation sequence of our quadrature clock corrector.

 $QCK_{IN}$  and  $IBCK_{IN}$ -QBCK<sub>IN</sub> are measured by the quadrature phase detector, and the resulting control codes CodeI and CodeQ are passed to phase adjusters, which correct the phase of the quadrature clock signals.

### 3.2 DUTY-CYCLE AND QUADRATURE PHASE DETECTORS BASED ON A RELAXATION OSCILLATOR



Figure 3.2.1. Block diagram of a relaxation oscillator-based duty-cycle detector.

Several types of low-frequency oscillator draw relatively little power. Among them is the relaxation oscillator, which operates by charging and discharging a capacitor between two fixed voltages. It only requires a small number of active transistors and exhibits good noise performance 2, [3.2.2]. A modified relaxation oscillator can be used to convert clock errors to frequencies, and thus it finds application in duty-cycle and quadrature phase detectors. This approach can measure clock errors accurately in the frequency domain, and its economy in the area and power consumption is particularly valuable in memory interfaces, which have many parallel data paths.

Figure 3.2.1 Figure 3.2.1. Block diagram of a relaxation oscillator-based duty-cycle detector.is a block diagram of our duty-cycle detector, based on a relaxation oscillator. It consists of a current source, a small resistor and capacitor, a Schmitt trigger, a digital



Figure 3.2.2. Timing diagram of a relaxation oscillator-based duty-cycle detector of ICK/IBCK.

frequency detector, and a binary counter. Our circuit differs from that of previous relaxation oscillators [3.2.2], [3.2.3] in that a Schmitt trigger is used for comparing voltages, and the input stage is changed so that the duty-cycle of differential clock signals can adjust the frequency. To keep the capacitor small while producing a low output frequency, each current source only produces a small current, which keeps the power consumption and area requirement low.

The operation timing diagram of our duty-cycle detector for ICK and IBCK is shown in Figure 3.2.2. The ICK and IBCK signals have a differential relationship, such that the pulse-width at the high of the ICK signal would be the same as that at the low of the IBCK signal. Thus, if the ICK signal corresponds to a duty-cycle of (50-a)%, then the IBCK signal corresponds to a duty-cycle of (50+a)%. V<sub>1</sub> is generated by integrating the current I<sub>REF</sub> in the capacitor during the periods which ICK is low and IBCK is high, V<sub>2</sub> is generated by integrating the  $I_{REF}$  in the capacitor during the periods which ICK is high and IBCK is low. Therefore the frequencies of the clock signals RCK<sub>1</sub> and RCK<sub>2</sub> generated by applying V<sub>1</sub> and V<sub>2</sub> to the Schmitt trigger reflect duty-cycles of (50-a)% and (50+a)%. The digital frequency detector receives RCK<sub>1</sub> and RCK<sub>2</sub> signals from the relaxation oscillator and converts frequencies to digital values, and then compares their relative magnitudes to determine which signal is higher in frequency. If the frequency of RCK<sub>1</sub> is higher than RCK<sub>2</sub>, the UP/DN signal outputs UP, and in the opposite case, DN is output. The binary counter receives this UP/DN signal and increases or decreases control codes Code<sub>Duty,I</sub> and Code<sub>Duty,IB</sub> which are then used to adjust the duty-cycle. Once this process is completed, the digital frequency detector outputs the reset signal RST<sub>ROSC</sub>, the relaxation oscillator is initialized, and duty-cycle detection starts again. This periodic operation corrects the dutycycle error of the clock signals.

The period of the output clock  $T_{RCK}$  generated by the relaxation oscillator in our dutycycle detector can be expressed as follows:

$$T_{RCK} = (T_{ICK} \cdot N_{ICK}) + (T_{IBCK} \cdot N_{IBCK}) = 2 \cdot T_{ICK} \cdot N_{ICK},$$
(1)

where  $T_{ICK}$  and  $T_{IBCK}$  are the periods of the ICK and IBCK signals, and  $N_{ICK}$  and  $N_{IBCK}$  are the numbers of clock cycles of the ICK and IBCK signals in one period of  $T_{RCK}$  respectively. Using the relation between current and voltage that applies to a capacitor, we can obtain:

$$\frac{I_{\text{REF}}}{C} \cdot (50 - a)\% \cdot T_{\text{ICK}} \cdot N_{\text{ICK}} = V_{\text{TH}(L \to H)} - V_{\text{TH}(H \to L)},$$
(2)



Figure 3.2.3. Conversion of a quadrature phase to a duty-cycle using XOR and XNOR logic (a) when ICK and QCK are in perfect quadrature and (b) when the quadrature phase error occurs between ICK and QCK.

where  $I_{REF}$  is the current generated by the current source, and  $V_{TH(L \rightarrow H)}$  and  $V_{TH(H \rightarrow L)}$  are the low-to-high and high-to-low threshold voltages of the Schmitt trigger. Thus  $T_{RCK1}$  and  $T_{RCK2}$  can be expressed as follows:

$$T_{RCK1} = \frac{2C \cdot (V_{TH(L \to H)} - V_{TH(H \to L)})}{I_{REF} \cdot (50 - a)\%} + b$$
(3)

and

$$T_{RCK2} = \frac{2C \cdot (V_{TH(L \to H)} - V_{TH(H \to L)})}{I_{REF} \cdot (50 + a)\%} + b,$$
(4)



Figure 3.2.4. Block diagram of a quadrature phase detector, based on a relaxation oscillator.

where b is a nonlinear factor such as the loop latency and the frequencies of the ICK and IBCK signals. It can be seen that  $T_{RCK1}$  and  $T_{RCK2}$  of our relaxation oscillators are linearly affected by the duty-cycle.

The quadrature phase between the quadrature clock signals ICK and QCK can be converted to a duty-cycle using XOR and XNOR logic. If these clock signals have the correct quadrature phase  $T_{CK}/4$ , the clock signals  $A_{XOR}$  and  $B_{XNOR}$  have the duty-cycle of 50%, as shown in Figure 3.2.3(a). If the quadrature phase error  $T_{ERR}$  occurs, the clock signals  $A_{XOR}$  and  $B_{XNOR}$  having non-50% duty-cycles (50-a%, 50+a%) are produced at the output of XOR and XNOR logic, as shown in Figure 3.2.3(b). Therefore, detecting and comparing duty-cycle errors of these signals provide the information needed to correct the quadrature phase error.



Figure 3.2.5. Operation of a relaxation oscillator in the quadrature phase detector (a) on left, which performs XOR function, and (b) on right, which performs XNOR function.

Figure 3.2.4 shows the block diagram of our quadrature phase detector based on the relaxation oscillator. The six transistors in the relaxation oscillator on the left perform the similar function as XOR logic, and the six on the right provide the similar function as XNOR logic. Figure 3.2.5 shows the operation of a relaxation oscillator in the quadrature

clock corrector. Since the combination of the pulse-widths in  $A_{XOR,O}/A_{XOR,E}$  and  $B_{XNOR,O}/B_{XNOR,E}$ , which corresponds to the duty-cycle of  $A_{XOR}$  and  $B_{XNOR}$  in Figure 3.2.3, generate RCK<sub>1</sub> and RCK<sub>2</sub> signals, these clock signals contain the quadrature phase error information. The subsequent operations are the same as those of the duty-cycle detector described above, and control codes Code<sub>1</sub> and Code<sub>Q</sub> are generated and transmitted to phase adjusters, correcting the quadrature phase.



# QUADRATURE PHASE DETECTOR

**EFFECTIVENESS OF DUTY-CYCLE AND** 

3.3

Figure 3.3.1. Simulated output frequency of a relaxation oscillator in (a) the dutycycle detector and (b) the quadrature phase detector.

To assess the effectiveness with which (3) and (4) are realized by our duty-cycle and quadrature phase detector, we performed a post-layout simulation. Figure 3.3.1(a) and (b) show the simulated output frequency of the relaxation oscillator in our detector against duty-cycle and quadrature phase in input frequencies of 1GHz, 2GHz, and 3GHz. Their slope of the frequency curve are 1.06MHz/% and 0.86MHz/°. Since the frequency detector is designed to detect the frequency difference of at least 0.4MHz, our duty-cycle and quadrature phase detector can detect the duty-cycle difference of at least 0.38% and the quadrature phase difference of at least 0.47°. It also shows that it works well over a wide range of input frequencies and can detect a wide range of duty-cycle and quadrature phase.

### **3.4 MISMATCH EFFECTS OF DUTY-CYCLE AND QUADRATURE PHASE DETECTOR**

A mismatch between the left and right relaxation oscillators in our detector can cause deviation of the duty-cycle and quadrature phase after the correction. To verify the effect of the mismatch, Monte Carlo simulation of the output frequency difference between RCK<sub>1</sub> and RCK<sub>2</sub> in relaxation oscillators was performed. Since the duty-cycle and the quadrature phase of the input clock signals are 50% and 90° at 3GHz, the frequency difference is zero if there is no mismatch, but a deviation, which means a residual error, can occur in a real implementation due to the device mismatch. Figure 3.4.1 and Figure 3.4.2 show the results of the transistor mismatch, the mismatch in the Schmitt trigger, and the mismatch in resistor and capacitor. Among these results, the frequency variation due to the mismatch of the Schmitt trigger is most significant on both detectors, which indicates that the frequency of the relaxation oscillator is sensitive to  $V_{TH(L \rightarrow H)}$ - $V_{TH(H \rightarrow L)}$ . In order to mitigate the mismatch after physical implementation, we considered the following two issues in the layout [3.2.3]. First, all the devices were kept close, put in the same orientation, and additional dummy fill was included, to reduce the random mismatch between devices in our detector. Second, the symmetric placement of devices and the same length of wire were used to reduce deterministic mismatch due to the asymmetry of the layout.



Figure 3.4.1. Monte Carlo simulation results (500 runs) of the frequency difference between RCK<sub>1</sub> and RCK<sub>2</sub>, to show mismatch effect from different random variables in our duty-cycle detector when the duty-cycle of input clock is 50% at 3GHz.



Figure 3.4.2. Monte Carlo simulation results (500 runs) of the frequency difference between  $RCK_1$  and  $RCK_2$ , to show mismatch effect from different random variables in our quadrature phase detector when the quadrature phase of input clock is 90° at 3GHz.

#### 3.5 DUTY-CYCLE AND PHASE ADJUSTER



Figure 3.5.1. Block diagram of (a) the duty-cycle adjuster and (b) the phase adjuster.

Figure 3.5.1 is the block diagram of the duty-cycle adjuster, which is based on an inverter with different PMOS and NMOS widths 3.5. The control code received by the duty-cycle detector selects the width of the PMOS and NMOS required to restore the duty-cycle of the output clock signal to 50%. The resolution of the duty-cycle adjuster at 1GHz, 2GHz, and 3GHz is about 0.06%, 0.14%, and 0.21% respectively. Figure 3.5.1(b) is the block diagram of the phase adjuster, in which a MOSCAP array receives the control code from the quadrature phase detector and corrects the quadrature phase error. The resolution of the phase adjuster at 1GHz, 2GHz, and 3GHz is about 0.40°, 0.81°, and 1.22° respectively. Since the nonlinearity of these adjusters due to PVT variation can affect the correction performance and jitter, the symmetrical layout and close placement of devices were performed, and the linearity was checked through various simulations.

### **CHAPTER 4**

## 4:1 OVERLAPPED TIME-DIVISION MULTIPLEXING DRIVER COMBINED WITH A SERIALIZER TIMING ADJUSTER

To alleviate the issues about 4:1 serialization discussed in Chapter 1.2.3, we have presented a 4:1 OVTDM driver combined with a serializer timing adjuster for single-ended memory interfaces. Our OVTDM driver containing four unit drivers performs 4:1 serialization; two of the four unit drivers output two identical 1UI full-rate DQ signals at the same time and these signals summed up while performing final serialization. This reduces the output capacitance of the output driver. The serializer timing adjuster combined with this driver corrects the timing of the final serialization by adjusting the phases of each quadrature clock signal.

#### 4.1 PROPOSED DRIVER TOPOLOGY



Figure 4.1.1. Quarter-rate transmitter architecture using our 4:1 OVTDM driver.

The input and output capacitance mainly consists of the package and pad capacitance, the input capacitance of the receiver, and the output capacitance of the transmitter; thus, the receiver and the transmitter should be designed to have the small input and output capacitance.

We have addressed the disadvantages of the previous structures of the quarter-rate transmitter in Chapter 1.2.3, by introducing an overlapped time-division multiplexing method which allows overlap among serial symbol signals [4.1.1]. Figure 4.1.1 shows a quarter-rate transmitter using a 4:1 overlapped multiplexing (OVM) driver. In this design, four 2:1 serializers are inserted before the driver, which is combined with an adaptive clock



Figure 4.1.2. Operation and timing of 4:1 OVTDM driver.

phase aligner. The 2:1 serializer has a split-load structure which produces less datadependent jitter because the NMOS and PMOS transistors in its last stage are independently controlled [4.1.2]. The four half-rate data signals generated by the 2:1 serializer are serialized finally to the full-rate DQ signal in the 4:1 overlapped multiplexing driver. Two of the four unit drivers in our driver output two identical 1UI full-rate DQ signals at the same time and these signals summed up to reduce the I/O capacitance. The adaptive clock phase aligner tracks the clock and data phases, and maintains the correct timing of final serialization by controlling the phase of the clock signals sent to the pull-up and pull-down MOS transistors in the driver. This driver uses 1-tap, rather than multi-tap, de-emphasis (De-Emp), in order to reduce area and power consumption. 1-tap de-emphasis is implemented using an additional driver path and the tap coefficient can be adjusted using a 4-bit thermometer code EQ to compensate for the channel loss at various data-rate.

Figure 4.1.2 shows an operation and timing of a 4:1 OVTDM driver. First, two unit drivers  $Drv_0$  and the  $Drv_3$  are turned on by ICK and QBCK, and the full-rate signal D3 is transmitted to the DQ output. Next,  $Drv_0$  and  $Drv_1$  are turned on by ICK and QCK, and the full-rate signal D0 is transmitted to the DQ output. Then,  $Drv_1$  and  $Drv_2$  output the full-rate signal D1, after which  $Drv_2$  and  $Drv_3$  output the full-rate signal D2. These operations are performed so that final serialization and data output continue without interruptions: when each unit driver is turned on, it outputs 2UI DQ signals, at which time two of the four unit drivers operate simultaneously to output a 1UI full-rate DQ signal.

#### 4.1.1 2:1 SERIALIZER IN 4:1 OVERLAPPED MULTIPLEXING DRIVER

A 2:1 serializer is placed before the 4:1 OVTDM driver. Figure 4.1.1.1 shows a tristate inverter based 2:1 serializer. Using this structure will result in a lot of data-dependent jitter in the process of creating high-speed data, which can make signal quality worse, as shown in Figure 4.1.1.2. Figure 4.1.1.3 shows the split-load based serializer. With this splitload based structure, the data-dependent jitter characteristics are improved because the NMOS and PMOS of the last stage are independently controlled, as shown in Figure 4.1.1.4.



Figure 4.1.1.1. Tri-state inverter based 2:1 serializer.



Figure 4.1.1.2. Simulated waveforms of outputs of a 2:1 serializer, which is based on tri-state inverter, and a driver.



Figure 4.1.1.3. Split-load based 2:1 serializer.



Figure 4.1.1.4. Simulated waveforms of outputs of a 2:1 serializer, which is based on split-load, and a driver.

#### 4.2 COMPARISON OF OUTPUT CAPACITANCE



Figure 4.2.1. Comparison of a conventional 4:1 multiplexing driver and 4:1 OVTDM driver, in terms of size, output resistance, and output capacitance.

Figure 4.2.1 compares the size, the output resistance, and the output capacitance of the conventional 4:1 multiplexing driver of Figure 1.2.3.2 with our 4:1 OVTDM driver of Figure 4.1.1. The four parallel unit drivers in our driver have an output resistance of  $2R_o$ , an output capacitance when they are on,  $C_{on}/2$ , and an output capacitance when they are off,  $C_{off}/2$ . When both of them are turned on to transmit the DQ signal, the total output resistance  $R_{out}$  is equal to  $R_o$ , which maintains the required channel impedance termination. The size of each of our unit drivers is about half that of a conventional unit driver, and the total output capacitance  $C_{out}$  is reduced by  $2C_{off}$ . Therefore, compared to the multiplexing (MUX) driver, the OVTDM driver can reduce its area and output capacitance, leading to an



Figure 4.2.2. Comparison of a 4:1 MUX POD driver, 4:1 OVTDM POD driver, and 4:1 OVTDM LVSTL driver, in terms of W/L, output capacitance, and output resistance. driver, in terms of size, output resistance, and output capacitance.

improvement in signal integrity [1.2.6]. This 4:1 OVTDM driver architecture is adopted in our quarter-rate transmitter, and this driver uses a power isolated LVSTL to improve energy efficiency [4.2.1].

Figure 4.2.2 compares the ratio of width to length of the transistor W/L, output capacitance  $C_{OUT}$ , and output resistance  $R_{OUT}$ , between the 4:1 MUX pseudo open drain (POD) driver, the 4:1 OVTDM POD driver, and the 4:1 OVTDM LVSTL driver. The 4:1 OVTDM LVSTL driver uses the NMOS transistor for both pull-up and pull-down drivers and requires a supply voltage that is lower than that of the pre-driver. This allows the transistor to have the smallest width among the three output driver structures while



Figure 4.2.3. Comparison of the output capacitance through simulated (a) rise time and (b) fall time, of a 4:1 MUX POD driver, 4:1 OVTDM POD driver, and 4:1 OVTDM LVSTL driver.

maintaining the required channel impedance termination. Therefore, our 4:1 output driver has the reduced output capacitance.

To verify the 4:1 OVTDM LVSTL driver has reduced output capacitance, we simulated the rise and fall time of the output signal for three structures of the output driver described above. If the three drivers have the same output resistance, the rise and fall time of the output signal is mainly affected by the output capacitance, so that their output capacitances can be compared. Figure 4.2.3 shows their simulated rise and fall time of the output signal. Compared to the 4:1 MUX POD driver and the 4:1 OVTDM POD driver, our 4:1 OVTDM LVSTL driver has the lowest rise and fall time, due to the reduced output

capacitance.

#### 4.3 SERIALIZER TIMING ADJUSTER

Figure 4.3.1(a) shows the timing diagram of our 4:1 OVTDM driver when serialization correctly timed, and Figure 4.3.1 (b) shows what happens when serialization timing margin is not adequate. Drifting of the alignment between the phase of the data and clock signals which control the timing of serialization, resulting from any mismatch between the data path and the clock path, or from PVT variations, will introduce intersymbol interference and deteriorate signal integrity. The setup and hold time margins in the 4:1 multiplexing driver can also affect this margin [1.2.15]. This is why the timing of serialization must be corrected.

Figure 4.3.2 shows the serializer timing adjuster which restores the correct serialization timing. It consists of a replica 2:1 serializer, a replica DQ pulse generator, a relaxation oscillator (ROSC), a digital frequency detector (FD), and a binary counter (CNT). Four fixed data patterns, D0:0, D1:1, D2:0, and D3:1 are supplied to this circuit, and then pass through the replica 2:1 serializer, generating the four serialized data patterns D0D1, D1D2, D2D3, and D3D0. The signals containing these patterns pass to the replica DQ pulse generator which outputs the pulses A, B, C, and D. This pulse generator, composed of logic gates, mimics the pre-driver where the timing variation between the clock and the data can occur. The widths of pulses A and B are equal to the 1UI width of the DQ signal generated by the pull-up transistor in the driver; and the widths of pulses C and D are equal to the 1UI width of the DQ signal generated by the pull-down transistor in the same driver. Two relaxation oscillators of corresponding types, ROSC<sub>PU</sub> for pulses A and B, and ROSC<sub>PD</sub> for



Figure 4.3.1. Timing diagrams for 4:1 overlapped multiplexing driver when (a) the timing of serialization is correct, and when (b) incorrect timing fails to provide an adequate timing margin for serialization.



Figure 4.3.2. Block diagram of a serializer timing adjuster.

pulses C and D, accumulate the widths of the pulses to the capacitor in terms of voltage, and convert these voltages into frequencies in a power-efficient manner [4.3.1]. Therefore, the relaxation oscillator operates as a pulse-to-frequency converter. The frequencies coming from these relaxation oscillators are digitized and compared by the digital frequency detector, which produces the outputs UPDN<sub>PU</sub> and UPDN<sub>PD</sub>. These are fed to the binary counter, which generates Code<sub>PU</sub> and Code<sub>PD</sub> that go in the clock delay cell and adjust each phase of the quadrature clock signals. The replica DQ pulse generator is located together with the 4:1 OVM driver, so it is affected by the dynamic noise similar to this driver. Since the serializer timing adjuster performs calibration based on pulses A, B, C, and D generated by this replica DQ pulse generator, it can help ensure timing margins during 4:1 serialization, improving signal integrity of the output DQ signal.



Figure 4.3.3. Timing diagram in the serializer timing adjuster

when phase drifting occurs.



Figure 4.3.4. Timing diagram in the serializer timing adjuster

when serialization is correctly timed.



Figure 4.3.5. Simulated eye diagrams for 4:1 OVTDM driver at 12.8Gb/s.

Figure 4.3.3 shows the timing diagram of the serializer timing adjuster when the alignment between the phase of the data and clock signals has drifted; this can be caused by PVT variations, device mismatches, or setup and hold time margins. Figure 4.3.4 shows how the adjuster controls the phase of the clock signals so as to equalize the pulse widths of A and B and of C and D, restoring the correct serialization timing. Figure 4.3.5 shows simulated eye diagrams of the output of 4:1 OVTDM driver at 12.8Gb/s, without and with the serializer timing adjuster in operation. An improvement in signal integrity is clearly apparent.

If the phase alignment is performed between the clock and the incoming real data, more accurate phase alignment and better performance can be achieved. However, the power and area issues can occur. Our serializer timing adjuster alleviates these issues with a relaxation oscillator structure that corrects the clock phase using a simple fixed data patterns 0101.

#### 4.4 32:4 SERIALIZER



Figure 4.4.1. Alternative designs for the 2:1 serializers used in the 32:4 serializer, based on flip-flops.

Our 32:4 serializer consists of a 32:16 serializer, a 16:8 serializer, and an 8:4 serializer, each of which is in parallel with a 2:1 serializer. We looked at alternative ways of implementing these 2:1 serializers: Figure 4.4.1 shows a 2:1 serializer based on flip-flops, and Figure 4.4.2 shows a 2:1 serializer based on tri-state inverters. The former requires two D flip-flops (DFFs), a latch, and a 2:1 multiplexer; but the latter only requires three tri-state inverters, which reduces both area and power consumption. Figure 4.4.3 compares the power consumption, over a range of supply voltages, of 32:4 serializers implemented using these two structures, and motivates our choice of the structure based on tri-state inverters.



Figure 4.4.2. Alternative designs for the 2:1 serializers used in

the 32:4 serializer, based on tri-state inverters.



Figure 4.4.3. Comparison of power consumption.

### CHAPTER 5

## MIXED PULL-UP AMPLITUDE AND PULL-DOWN PHASE EQUALIZATION

#### 5.1 MIXED EQUALIZATION FOR MEMORY INTERFACE

When various equalization schemes including our mixed pull-up amplitude and pulldown phase equalization are applied in the memory configuration of Figure 1.2.4.1, waveforms in the transmitter (TX) output  $OUT_{TX}$ , the micro-bump output  $OUT_{Bump}$ , and the receiver (RX) input  $IN_{RX}$  are shown in Figure 5.1.1. All these equalization schemes can compensate for channel losses by reducing the data-dependent jitter, improving signal integrity. In the memory interface using a VSSQ termination, such as the LPDDR memory and HBM, if equalization is not performed, there is no current consumption when the pulldown data is transmitted [5.1.1]. However, when amplitude equalization is used, the power consumption is increased due to occurring the additional current consumption of the output driver when non-transition bits among pull-down data are transmitted. When using phase equalization, there is little excessive current consumption in the output driver. Therefore,



Figure 5.1.1. Waveforms of the transmitter output  $OUT_{TX}$ , the micro-bump output  $OUT_{Bump}$ , and the receiver input  $IN_{RX}$  of Figure 1.2.4.1 when there is no equalization and when the amplitude and phase equalization and our mixed equalization are adopted.

we address mixed equalization using amplitude equalization in the pull-up data transmission and phase equalization in the pull-down data transmission. This can increase the data-rate per pin by achieving effective equalization while improving the power efficiency of the transmitter. We can expect the same effect by applying the pull-up phase and pull-down amplitude equalization in reverse for our method in DDR and GDDR



Figure 5.1.2. Transmission of (a) transition bit and (b) non-transition bit when pull-up amplitude equalization is adopted.

memories using VDDQ termination.

In amplitude equalization for the pull-up data transmission, the output swing  $V_{o,post}$ , which corresponds to the non-transition bit, can be adjusted by reducing the number of pull-up drivers that are turned on, as shown in Figure 5.1.2. Using this relaxed matching of the impedance  $R_{OUT}$ , power efficiency can be further improved without deteriorating signal integrity [5.1.2].

To verify the effectiveness of mixed equalization, we simulated different equalization schemes including our mixed pull-up amplitude and pull-down phase equalization, at the data-rate of 10Gb/s by applying a supply voltage of 1.0V to the output driver. The insertion loss of the channel used in the simulation is -9.5dB at the Nyquist frequency of 5GHz. A pad capacitance and a bonding wire model were also inserted to reflect the effect of



Figure 5.1.3. Comparison of various equalization techniques.

simultaneous switching noise due to the dynamic current consumption of each scheme. Figure 5.1.3 shows simulated waveforms to compare various equalization techniques. When there is no equalization, the eye has a width of 46.7ps and a height of 78.5mV, and the power consumption of output driver is 4.84mW. The eye is improved when equalization is performed: 1-tap amplitude equalization improves the eye width by 86.9% and the eye height by 84.2%, but the power consumption of 17.6% is increased in the output driver. When 4-tap phase equalization is used, the eye width and height is 81.2ps and 129.2mV respectively, and the power consumption is 4.93mW. It is not a significant increase in the power consumption, but the effect of equalization is weaker than amplitude equalization. Mixed 1-tap pull-up amplitude and 4-tap pull-down phase equalization achieves the eye width of 85.4ps and the eye height of 139.3mV, which is better than phase equalization, and consumes the lowest power 4.10mW; thus, our mixed equalization improves the eye width by 82.9% and the eye height by 77.5%, and decreases the power consumption by 15.3%. The mixed pull-up amplitude and pull-down phase equalization has comparable performance to other methods with improving power efficiency.

## 5.2 AMPLITUDE EQUALIZATION AND PHASE EQUALIZATION CONTROL BLOCK



Figure 5.2.1. Block diagrams and generation of the control signals of the (a) pull-up amplitude and (b) pull-down phase equalization, for mixed equalization.

To apply mixed equalization to the 4:1 OVTDM driver with a simple structure, control signals for pull-up and pull-down equalization are generated in the amplitude equalization and phase equalization control block through the previous data patterns. Figure 5.2.1(a) shows the block diagram of the pull-up amplitude equalization control block and its control signal generation. When the data changes from 0 to 1, two of the pull-up amplitude equalization control signals PU<3:0> are turned on, and they are transmitted to the amplitude equalization pulse generator, providing 1-tap pull-up amplitude equalization. In other data patterns, these control signals are turned off. The pull-down phase equalization

control signals PD<3:0> are turned on one by one by serial identical 0 data, and then reset when a different data arrives, as illustrated in Figure 5.2.1(b). These control signals are transmitted to the data and clock delay lines for the purpose of timing control on their rising and falling edges, which reduces the data-dependent jitter.

#### 5.3 AMPLITUDE EQUALIZATION PULSE GENERATOR



Figure 5.3.1. Block diagram of the amplitude equalization pulse generator.



Figure 5.3.2. The operation example of the pull-up amplitude equalization.

An amplitude equalization pulse generator, as shown Figure 5.3.1, creates a signal AEQ Pulse with a width  $\Delta t$  in response to the control signals PU<3:0>. The pulse-width  $\Delta t$  is determined by the delay time of inverters INV<sub>Pulse</sub>, which approximately equals the 1UI at the data-rate of 16Gb/s. The AEQ Pulse signal is transmitted and added to the DQ output to implement 1-tap amplitude equalization. The equalization strength control signals Co<sub>AEQ</sub><3:0> can be adjusted manually according to the channel loss. Figure 5.3.2 shows an

operation example supposed that DQ changes from 0 to 1, and maintains 1 during 3UI. PU<1> and PU<2> signals are turned on when the successive data D0 and D1 change from 0 to 1 and they are output to DQ, and these two control signals are reset when both data D0 and D1 output 0. These PU<1> and PU<2> are delivered to the amplitude equalization pulse generator, which adds the pulse to output DQ when D1 is output, and 1-tap amplitude equalization is performed with an amplitude level that corresponds to the value of  $Co_{AEQ}$ .

#### 5.4 DATA AND CLOCK DELAY LINE



Figure 5.4.1. Block diagram of the data and clock delay line.



Figure 5.4.2. The operation example of the pull-down phase equalization.

A data and clock delay line controls the timing of the rising and falling edges of the data and clock signals according to the phase equalization control signals PD<3:0>. Figure 5.4.1 is a block diagram of the data and clock delay line, which contains four delay line units. The timing of the rising and falling edge is adjusted by controlling the strength of the transistor in each delay line unit. The extent of control on the timing is determined by the

 $Co_{PEQ}$  signal, which sets the resolution of the delay line, and the number of PD signals that are turned on. Figure 5.4.2 shows an operation example of the pull-down phase equalization. When three consecutive 0 data are transmitted to D1, D2, and D3, the control signals PD<1> and PD<2> are turned on sequentially, and the rising edge of the output signal DQ from D3 to D0 is pulled forward by  $Co_{PEQ}$ ×(PD<1>+PD<2>).

# **CHAPTER 6**

## **EXPERIMENTAL RESULTS**

6.1 QUADRATURE CLOCK CORRECTOR WITH A DUTY-CYCLE AND QUADRATURE PHASE DETECTOR BASED ON A RELAXATION OSCILLATOR



Figure 6.1.1. Die micrograph with a magnified layout.

A prototype was fabricated in a 55nm CMOS process with a supply voltage of 1.2V. Figure 6.1.1 shows the die micrograph with a magnified layout. The core area of the quadrature clock corrector is 0.003mm<sup>2</sup>. The measurement setup is shown in Figure 6.1.2. A single-ended clock signal from a CK source is converted to differential clock signals CK<sub>IN</sub> and CKB<sub>IN</sub> using a single-to-differential (StoD) converter. An IQ generator (Gen)



Figure 6.1.2. Measurement setup.

makes quadrature clock signals from  $CK_{IN}$  and  $CKB_{IN}$ , and their duty-cycle and quadrature phase is varied by control codes in duty-cycle adjusters (DCA<sub>IN</sub>) and phase adjusters (PA<sub>IN</sub>) before the quadrature clock corrector. Their control ranges, which also mean distortion ranges, are measured through ICK<sub>CH,IN</sub> and QCK<sub>CH,IN</sub>. They are transmitted to our quadrature clock corrector. Duty-cycle and quadrature phase errors of the output clock signals are measured by capturing ICK<sub>CH,OUT</sub> and QCK<sub>CH,OUT</sub> with an oscilloscope. Measurements were carried out over input clock frequencies from 1GHz to 3GHz.







(b)

Figure 6.1.3. Measured waveforms of uncorrected output clock signals and corrected output clock signals (a) at 1GHz and (b) at 3GHz.

The measured waveforms at the frequency of 1GHz are shown in Figure 6.1.3(a). The duty-cycle and the quadrature phase of uncorrected output clock signals  $ICK_{CH,OUT}$  and  $QCK_{CH,OUT}$  are 47.7% and 92.1°. When the quadrature clock correction is performed, the



Figure 6.1.4. Measured results of duty-cycle correction (a) at 1GHz and (b) at 3GHz.

duty-cycle and the quadrature phase of these clock signals are modified to 49.9% and 90.7°. At the frequency of 3GHz, the measured waveforms are shown in Figure 6.1.3(b). The duty-cycle and the quadrature phase of uncorrected output clock signals are 42.7% and 85.0°. After quadrature clock correction, the duty-cycle and the quadrature phase of these signals are modified to 50.6% and 90.4°.

The duty-cycle distorted clock signals, which are generated by manually adjusting the  $DCA_{IN}$  in Figure 6.1.2, are provided to the quadrature clock corrector. The measurements presented in Figure 6.1.4(a) and (b) show the effect of duty-cycle correction by the quadrature clock corrector. When the duty-cycle correction is performed, the duty-cycle of the corrected signal only varies between 50.1% and 50.4% at 1GHz, and between 49.4% and 50.8% at 3GHz.



Figure 6.1.5. Measured results of quadrature phase correction (a) at 1GHz and (b) at 3GHz.

In order to verify the effect of the quadrature phase correction, the quadrature phase of the input clock signals is manually distorted in the  $PA_{IN}$  in Figure 6.1.2. Figure 6.1.5(a) and (b) show the measured results of quadrature phase correction. At frequencies of 1GHz and 3GHz, the quadrature phase of the output clock signal is modified to a range of 89.8° to 90.7° and a range of 89.2° to 91.1°, after quadrature phase correction. Therefore, the maximum error of quadrature phase is 1.1° at 3GHz, which means 1.03ps. In Figure 6.1.4 and Figure 6.1.5, the correction range is determined by the control range of the duty-cycle and phase adjuster, as shown in Figure 3.5.1, in the quadrature clock corrector. As we mentioned in Chapter 3.5, the resolution of the duty-cycle and phase adjuster is smaller at 1GHz than 3GHz; thus, the correction range at 1GHz is narrower than 3GHz when the same



Figure 6.1.6. RMS and peak-to-peak jitter of (a) the uncorrected clock signal ICK<sub>CH,IN</sub> and (b) the corrected clock signal ICK<sub>CH,OUT</sub>, at 3GHz.

range of the control code is used.

The measured RMS and peak-to-peak (P-P) jitter of the uncorrected clock signal ICK<sub>CH,IN</sub> is 1.85ps and 15.75ps, as shown in Figure 6.1.6(a). The RMS and peak-to-peak jitter of the corrected clock signal ICK<sub>CH,OUT</sub> increases to 2.14ps and 19.75ps, as shown in Figure 6.1.6(b). Since the quadrature clock corrector continually responds to changes in the duty-cycle and quadrature phase, the control codes transmitted to the duty-cycle and phase adjuster can be dithered. Due to this dithering phenomenon, supply/ground noise, and local VT variation, the jitter of the corrected clock signal can be increased compared to the uncorrected clock signal. According to [1.2.6], the corrected clock signals are sufficient to be used in memory interfaces such as LPDDR4, whose data window sampled by the clock



Figure 6.1.7 Power breakdown of the quadrature clock corrector, running at 3GHz. At this frequency the total power consumption is 2.08mW.

signal is 0.7UI.

The power breakdown of the quadrature clock corrector is shown in Figure 6.1.7. The total power consumption of this clock corrector is 2.08mW at an input clock frequency of 3GHz. Most of the power is consumed by the clock buffer, and the power consumption of relaxation oscillators in the duty-cycle (ROSC<sub>Duty</sub>) and quadrature phase detectors (ROSC<sub>Ouad</sub>) is only 5.1% and 1.3% of total power consumption respectively.

The performance of our quadrature clock corrector is summarized and compared with other recent designs in Table 6.1.1.

| рg                                       |                                |
|------------------------------------------|--------------------------------|
| ock                                      |                                |
| cycle                                    |                                |
| e of t                                   |                                |
| he re                                    |                                |
| elaxa                                    | <sup>a</sup> Jit               |
| ution                                    | ter/P                          |
| osci                                     | erio                           |
| Clock cycle of the relaxation oscillator | <sup>a</sup> Jitter/Period [%] |
| •                                        |                                |

| • | litter/P |
|---|----------|
|   | eriod    |
|   | [%]      |

| Figure of<br>Merit | Power Cons.       | Area                  | Lock Time                | Max.<br>Quadrature<br>Phase Error | Max. Duty-<br>Cycle Error | P-P Jitter<br>( <sup>a</sup> Norm. Jitter) | RMS Jitter<br>( <sup>a</sup> Norm. Jitter) | Clock Source | Detector<br>Type     | Frequency  | Supply<br>Voltage | Process |                |
|--------------------|-------------------|-----------------------|--------------------------|-----------------------------------|---------------------------|--------------------------------------------|--------------------------------------------|--------------|----------------------|------------|-------------------|---------|----------------|
| 3.24mW/GHz         | 6.48mW<br>@2GHz   | 0.01 mm <sup>2</sup>  | N/A                      | 3.68ps<br>@2GHz                   | 3.2%<br>@1GHz             | N/A                                        | N/A                                        | External     | Analog<br>Integrator | 0.1-2GHz   | 1.2V              | 130nm   | TVLSI 17'      |
| 2.06mW/GHz         | 3.3mW<br>@1.6GHz  | 0.01 mm <sup>2</sup>  | N/A                      | 2.9ps<br>@1.6GHz                  | 1.2%<br>@1.6GHz           | 17.8ps<br>@1.6GHz<br>(2.85)                | 2.17ps<br>@1.6GHz<br>(0.35)                | N/A          | Digital PD           | 0.8-1.6GHz | 1.1V              | 45nm    | TCASI 12'      |
| 9.0mW/GHz          | 7.2mW<br>@0.8GHz  | $0.025 \text{mm}^2$   | <75-374<br>cycles        | 6.25ps<br>@0.8GHz                 | 0.97%<br>@0.8GHz          | 20.0ps<br>@0.8GHz<br>(1.6)                 | 2.3ps<br>@0.8GHz<br>(0.18)                 | N/A          | Sense<br>Amplifier   | 0.4-0.8GHz | 1.2V              | 130nm   | TCASII 14'     |
| 2.6mW/GHz          | 2.6mW<br>@1GHz    | 0.086mm <sup>2</sup>  | <56 cycles               | 5ps @1GHz                         | 1% @1GHz                  | 19.38ps<br>@1GHz<br>(1.94)                 | 2.98ps<br>@1GHz<br>(0.30)                  | N/A          | TDC                  | 0.9-1.1GHz | 1.2V              | 65nm    | Elec. Exp. 18' |
| 18.3mW/GHz         | 49.4mW<br>@2.7GHz | $0.089 \text{mm}^2$   | N/A                      | 3.71ps<br>@2.7GHz                 | 1.9%<br>@2.7GHz           | N/A                                        | N/A                                        | External     | TDC                  | 0.1-2.7GHz | 1.0V              | 90nm    | ISCAS 18'      |
| 0.79mW/GHz         | 2.08mW<br>@3GHz   | $0.003 \mathrm{mm^2}$ | <sup>b</sup> <275 cycles | 1.03ps<br>@3GHz                   | 0.8% @3GHz                | 19.75ps<br>@3GHz<br>(5.93)                 | 2.14ps<br>@3GHz<br>(0.64)                  | External     | Relaxation<br>Osc.   | 1-3GHz     | 1.2V              | 55nm    | This Work      |

Table 6.1.1 Performance Summary and Comparison with Other Recent Quadrature Clock Correctors

### 6.2 OVERLAPPED TIME-DIVISION MULTIPLEXING DRIVER COMBINED WITH A SERIALIZER TIMING ADJUSTER



Figure 6.2.1. Die micrograph with the magnified layout of our 4:1 OVTDM driver incorporated in the quarter-rate transmitter, measurement setup, and the channel loss of a 2.6" FR4 trace, an SMA connector, and a 35" SMA cable.

A prototype chip has been implemented in a 55nm CMOS technology with a supply voltage of 1.2V. Figure 6.2.1 shows the measurement setup, together with a die micrograph in which the layout of our 4:1 OVTDM driver incorporated in a quarter-rate transmitter is identified; the core area is 0.014mm<sup>2</sup>. The single-ended clock signal from the clock source is converted to the differential clock signals CK and CKB using a single-to-differential



Figure 6.2.2. Measured results of 4-phase clock correction at 12.8Gb/s.

(StoD) converter and then transmitted to the chip. The performance of the transmitter was tested using the channel shown in Figure 6.2.1, which consists of a 2.6" FR4 trace, an SMA connector, and a 35" SMA cable. The eye diagram of the DQ signal was displayed by an oscilloscope.

Figure 6.2.2 shows the measured results of 4-phase clock correction. At 12.8Gb/s, the eye diagrams with a 0101 pattern demonstrate that the duty-cycle and quadrature phase error is corrected.

Figure 6.2.3 shows measured eye diagrams for a 12.8Gb/s single-ended DQ with a



Figure 6.2.3. Measured eye diagrams of the single-ended data output signal at 12.8Gb/s.

PRBS-7 pattern. Without equalization of 1-tap de-emphasis, 4-phase clock correction, and adaptive clock phase alignment, the eye is closed. When equalization and 4-phase clock correction are applied, the eye opens with a width of 0.38UI and a height of 37.16mV. When the adaptive clock phase aligner is also activated, the eye opens further, to a width of 0.43UI and a height of 47.76mV.

Figure 6.2.4 shows how the measured eye width and height vary with data-rate, with and without the serializer timing adjuster. Although the eye can be varied irregularly as the data rate increases due to the channel loss characteristics, we can see that correcting the timing of serialization improves both the width and height of the eye.



Figure 6.2.4. Measured eye width and height at different data-rates, with and without the serializer timing adjuster.

Figure 6.2.5 provides a breakdown of power usage by the components on the data path in our quarter-rate transmitter. At a data-rate of 12.8Gb/s, the total power consumption is 23.04mW, most of which is consumed by the pre-driver and the 4:1 overlapped multiplexing driver. The adaptive clock phase aligner accounts for only 1.6% of total power consumption. Energy efficiency is 1.8pJ/bit at a data-rate of 12.8Gb/s.

The performance of our design is summarized and compared with previous quarterrate transmitter designs in Table 6.2.1. The power consumption of the transmitter is mostly in the output driver, which is proportional to the output swing. Because our architecture is designed for memory interface, the output swing is larger than the transmitter of [1.2.14]



Figure 6.2.5. Power breakdown of the data path in our quarter-rate transmitter at 12.8Gb/s.

and [1.2.17], which results in poorer energy efficiency than the two. However, our structure achieves higher data-rate with single-ended signaling unlike the others, which is vulnerable to supply/ground noise rather than the differential signaling used in [1.2.14] and [1.2.17].

|                         |                        | Designs                 |                        |                         |  |  |
|-------------------------|------------------------|-------------------------|------------------------|-------------------------|--|--|
|                         | ASSCC 16'              | TIE 18'                 | JSSC 13'               | This Work               |  |  |
| Technology              | 65nm                   | 65nm                    | 65nm                   | 55nm                    |  |  |
| Data-Rate (Gb/s)        | 5~8                    | 6~32                    | 4.8~8                  | 0.2~12.8                |  |  |
| Signaling               | Differential           | Differential            | Differential           | Single-Ended            |  |  |
| Equalization            | 2-Tap                  | 1-Tap                   | No                     | 1-Tap                   |  |  |
|                         | Pre-Emp.               | Pre-Emp.                | INO                    | De-Emp.                 |  |  |
| Multiplexing            | No                     | Yes                     | Yes                    | Yes                     |  |  |
| Driver                  | INU                    | 105                     | 105                    | (Overlapped)            |  |  |
| Serialization Time      | No                     | No                      | No                     | Yes                     |  |  |
| Adaptation              | 110                    | 110                     | NO                     |                         |  |  |
| Single-Ended            | 50~150mV <sub>pp</sub> | 250~600mV <sub>pp</sub> | 50~100mV <sub>pp</sub> | 400~600mV <sub>pp</sub> |  |  |
| Output Swing            | 50/~150mV pp           | 230/~000m <b>v</b> pp   | 50/~100m <b>v</b> pp   | 400~000111 v pp         |  |  |
| Channel Loss            | 10.7dB                 | N/A                     | N/A                    | 20.4dB                  |  |  |
| @ Nyquist Freq          | 10.700                 | 11/74                   | 11/74                  |                         |  |  |
| Area (mm <sup>2</sup> ) | 0.020                  | 0.173                   | 0.027                  | 0.014                   |  |  |
| Energy Efficiency       | 0.333                  | 2.6                     | 0.3                    | 1.8                     |  |  |
| (pJ/bit)                | @8Gb/s                 | @28Gb/s                 | @6.4Gb/s               | @12.8Gb/s               |  |  |

Table 6.2.1 Performance Summary and Comparison with Other Quarter-Rate Transmitter Designs

# 6.3 MIXED PULL-UP AMPLITUDE AND PULL-DOWN PHASE EQUALIZATION



Figure 6.3.1. Die micrograph.

A prototype chip was fabricated in a 65nm CMOS technology, and its die micrograph is shown in Figure 6.3.1; the total area is 0.25mm<sup>2</sup> including the decoupling capacitor. Figure 6.3.2 shows a measurement setup. The differential clock signals CK and CKB are supplied by the pattern generator of the J-BERT (Agilent N4903A). The eye diagram of the DQ signal was displayed by an oscilloscope (Tektronix MSO73304DX).

The measured insertion loss of the channel, which is 12.5 inch FR4 trace, is shown in Figure 6.3.3. At a Nyquist frequency of 8GHz the insertion loss of the ch annel is -14.7dB.



Figure 6.3.2. Measurement setup.



Figure 6.3.3. Measured channel loss characteristics.



Figure 6.3.4. Measured output of the transmitter after passing through the channel, using a fixed data pattern at 16Gb/s without equalization.

Figure 6.3.4 and Figure 6.3.5 show the measured transmitter output using a fixed data pattern at 16Gb/s, without any equalization and with mixed equalization. With the simultaneous application of pull-up amplitude and pull-down phase equalization at the data-rate of 16Gb/s on the channel, the inter-symbol interference is mitigated.



Figure 6.3.5. Measured output of the transmitter after passing through the channel, using a fixed data pattern at 16Gb/s with mixed equalization.



Figure 6.3.6. Measured eye diagrams using the PRBS7 pattern, without and with dutycycle and quadrature phase correction at 16Gb/s.

In the quarter-rate transmitter, the duty-cycle and quadrature phase error can distort the output eye. Figure 6.3.6 shows measured eye diagrams for a PRBS7 data pattern, without and with the duty-cycle correction and the quadrature phase correction at 16Gb/s. We can see how correction improves the regularity of the diagram.



Figure 6.3.7. Measured eye diagrams at 16Gb/s, without and with correction of 4:1 serialization timing by the serializer timing adjuster.

Figure 6.3.7 shows measured eye diagrams without and with the 4:1 serialization timing control by the serializer timing adjuster. Our adjuster corrects the timing of the 4:1 serialization to reduce inter-symbol interference, significantly opening the eye of the output signal.











(c)



Figure 6.3.8. Measured eye diagrams before and after channel, (a) without equalization,(b) with pull-up amplitude equalization, (c) with pull-down phase equalization, and (d) with mixed equalization.

Figure 6.3.8 shows the effect of pull-up amplitude equalization, pull-down phase equalization, and mixed equalization on eye diagrams for the transmitter output with a  $V_{DD}$  of 1V and a  $V_{DDQ}$  of 0.6V, using the PRBS7 data pattern at 16Gb/s before and after a channel with a loss of -14.7dB. When equalization is not used, the eye is closed shown in Figure 6.3.8(a). We can see that the inter-symbol interference for transitions from 1 to 0 is reduced by pull-up amplitude equalization shown in Figure 6.3.8(b). Pull-down phase equalization reduces the inter-symbol interference for 0 to 1 transitions, as shown in Figure 6.3.8(c). Figure 6.3.8(d) shows that mixed equalization reduces the inter-symbol interference for both types of transition, producing the largest eye by opening it in two directions at once.



Figure 6.3.9. Measured eye diagrams with a PRBS7 data pattern at 16Gb/s: without equalization, duty-cycle correction, and quadrature phase correction, when  $V_{DD}$ =1.0V and  $V_{DDQ}$ =0.6V, with mixed equalization, duty-cycle correction, and quadrature phase correction, at the same voltages, and at a reduced  $V_{DD}$  of 0.9V and  $V_{DDQ}$  of 0.3V.

Figure 6.3.9 shows eye diagrams of the transmitter output for a PRBS7 data pattern at 16Gb/s. Without both equalization and clock correction, the eye is closed. With mixed equalization and clock correction, the eye has a vertical opening of 25 mV and a horizontal

opening of 0.49 UI, for a  $V_{DD}$  of 1.0V and a  $V_{DDQ}$  of 0.6V. With mixed equalization the eye remains open at a reduced  $V_{DD}$  of 0.9V and  $V_{DDQ}$  of 0.3V: the vertical opening of the eye is 13mV and its horizontal opening is 0.47UI.



Figure 6.3.10. Power breakdown of our quarter-rate transmitter at 16Gb/s, using a  $V_{DD}$  of 0.9V and  $V_{DDQ}$  of 0.3V.

Figure 6.3.10 provides a power breakdown of our quarter-rate transmitter at a datarate of 16Gb/s, using a  $V_{DD}$  of 0.9V and  $V_{DDQ}$  of 0.3V. Most power is consumed by the predriver, which includes the buffer and delay line of the data and clock. The amplitude and phase equalization control block accounts for 9.5% of total power consumption, and the corresponding value of the 4:1 OVTDM driver is 19.3%.



Figure 6.3.11. Area breakdown of our quarter-rate transmitter.

Figure 6.3.11 shows an area breakdown of our transmitter, and its total active area is 0.0191mm<sup>2</sup>. The pre-driver including the buffer and delay line of the data and clock is the largest block. The output driver occupies only 14.5% of the total area due to the use of the 4:1 OVTDM LVSTL driver.

Table 6.3.1 summarizes the performance of our transmitter with mixed equalization and compares it with state-of-the-art transmitters using other methods of equalization. Using single-ended signaling at a VDD of 0.9V and a VDDQ of 0.3V, data can be transmitted at 16Gb/s over channel with a -14.7dB loss, which is comparable with the best results from other designs. Despite the fact that it uses both 1-tap pull-up amplitude equalization and 4-tap pull-down phase equalization, the figure-of-merit<sub>1</sub> (FoM<sub>1</sub>) and FoM<sub>2</sub> [6.3.1] are 1.04pJ/bit and 0.070pJ/bit/dB at 16Gb/s, which are significantly lower than that of other designs in Table I except the designs in [1.2.35], [6.3.2]. Although the transmitters presented in [1.2.35], [6.3.2] had lower the figure-of-merit than our design, they omitted the power consumption of the phase equalization encoder [1.2.35] or only design the output driver [6.3.2].

| 6                                                               |
|-----------------------------------------------------------------|
| Compares                                                        |
| the                                                             |
| power cor                                                       |
| nsumption                                                       |
| and                                                             |
| area                                                            |
| from                                                            |
| the                                                             |
| mpares the power consumption and area from the transmitter only |
| only.                                                           |

<sup>d</sup> From the power consumption of the output driver only.

<sup>a</sup> Includes the PLL. <sup>b</sup> Excludes power drawn by the PEQ encoder. <sup>c</sup> Transceiver area.

| C Transcontine area                               |                                | the DED of            | b Englished against durant but the DEO encoded | Evoludo: no             | a Includes the DII b               | a Teoludo            |                                                   |                                        |
|---------------------------------------------------|--------------------------------|-----------------------|------------------------------------------------|-------------------------|------------------------------------|----------------------|---------------------------------------------------|----------------------------------------|
| $^{ m c}$ 0.011mm <sup>2</sup>                    | 0.014mm <sup>2</sup>           | N/A                   | $^{ m c}$ 0.13mm <sup>2</sup>                  | 0.075mm <sup>2</sup>    | <sup>a</sup> 0.1728mm <sup>2</sup> | 0.056mm <sup>2</sup> | $0.0191 \mathrm{mm^2}$                            | <sup>e</sup> Active Area               |
| <sup>d</sup> 0.050                                | 0.088                          | 0.421                 | <sup>b</sup> 0.054                             | 0.128                   | N/A                                | 0.154                | 0.070                                             | ° FoM2<br>(pJ/bit/dB)                  |
| <sup>d</sup> 0.5                                  | 1.8                            | 5.05                  | <sup>b</sup> 1.18                              | 1.96                    | <sup>a</sup> 2.1-3.45              | 2.92                 | 1.04                                              | <sup>e</sup> FoM <sub>1</sub> (pJ/bit) |
| -10dB                                             | -20.4dB                        | -12dB                 | -22dB                                          | -15.35dB                | N/A                                | -19dB                | -14.7dB                                           | Channel<br>Loss                        |
|                                                   |                                | (3-tap)               |                                                |                         |                                    |                      | (4-tap)                                           |                                        |
| r-mb) کسخ                                         | (1-tap)                        | PEQ                   | (4-tap)                                        | (N/A)                   | (1-tap)                            | (4-tap)              | PD PEQ                                            | (Taps)                                 |
| AED (1 tan)                                       | AEQ                            | (1-tap)               | PEQ                                            | PEQ                     | AEQ                                | PEQ                  | (1-tap)                                           | Equalization                           |
|                                                   |                                | AEQ                   |                                                |                         |                                    |                      | PU AEQ                                            |                                        |
| $0.27 V_{SE}$                                     | $0.4 \mathrm{V}_{\mathrm{SE}}$ | $1.2 V_{\text{Diff}}$ | N/A                                            | $0.4 V_{\mathrm{Diff}}$ | 0.4-1.3V <sub>Diff</sub>           | $1.0 V_{\rm Diff}$   | $0.15 \mathrm{V}_\mathrm{SE}$                     | Output<br>Swing                        |
| 3.4Gb/s                                           | 12.8Gb/s                       | 2.1Gb/s               | 16Gb/s                                         | 5Gb/s                   | 5-32Gb/s                           | 16Gb/s               | 16Gb/s                                            | Data-Rate                              |
| SE                                                | SE                             | Diff.                 | Diff.                                          | Diff.                   | Diff.                              | Diff.                | SE                                                | Signaling                              |
| Half-rate                                         | Quarter-rate                   | Half-rate             | Quarter-rate                                   | Full-rate               | Quarter-rate                       | Quarter-<br>rate     | Quarter-rate                                      | Architecture                           |
| V <sub>DD</sub> : 0.9V<br>V <sub>DDL</sub> : 0.6V | 1.2V                           | 1.0V                  | 0.9V                                           | 1.1V                    | 1.5/1.1V                           | 0.9/1.0/1.1V         | V <sub>DD</sub> : 0.9V<br>V <sub>DDQ</sub> : 0.3V | Supply                                 |
| 45nm                                              | 55nm                           | 28nm                  | 65nm                                           | 40nm                    | 65nm                               | 65nm                 | 65nm                                              | Technology                             |
| EL 17'                                            | TCAS2 18'                      | JSSC 18'              | ISSCC 18'                                      | TCASI 17'               | TIE 18'                            | ISSCC 17'            | This Work                                         |                                        |
| (                                                 |                                |                       |                                                |                         |                                    | ſ                    |                                                   |                                        |

Table 6.3.1 Performance Summary and Comparison with Other State-Of-The-Art Transmitter Designs

## CHAPTER 7

## CONCLUSION

In this thesis, we have proposed a quarter-rate transmitter using single-ended signaling for memory interfaces. As the demand for the high-performance memory is increasing, it is necessary to increase the bandwidth of the memory. One way to increase memory bandwidth is to use a higher data-rate. However, this tendency deteriorates power efficiency of the memory interface and increases the frequency-dependent channel loss, making it difficult to increase the data-rate per pin. To alleviate these issues, following points have been presented in a design of a transmitter.

First, we have addressed a quarter-rate architecture. Although a quarter-rate design requires four clock phases, the quarter-rate transmitter has a more relaxed timing margin on its critical path, lower simultaneous switching noise, power consumption, and clock frequency, compared to full-rate and half-rate designs.

Second, we have presented a quadrature clock corrector to correct duty-cycle and quadrature phase errors, improving signal integrity in the quarter-rate architecture. It uses relaxation oscillators with a modified input stage to convert the duty-cycle and quadrature phase of the clock signal to pairs of frequencies, while reducing area and power consumption. By comparing the frequencies in each pair, our detector obtains an accurate value for the duty-cycle and quadrature phase error, allowing for good correction performance. The circuit detects a wide range of the duty-cycle and quadrature phase error, and operates over frequencies from 1GHz to 3GHz. The corrected clock signal has a duty-cycle between 49.4% and 50.8%, and a quadrature phase between 89.2° and 91.1°, at 3GHz. The power efficiency of the quadrature clock corrector is 0.79mW/GHz, and its area is 0.003mm<sup>2</sup>.

Third, we have presented a 4:1 overlapped time-division multiplexing driver combined with a serializer timing adjuster. The 4:1 serialization is performed by the overlapped time-division multiplexing driver containing four unit drivers. Two of the fourunit drivers operate simultaneously to output two identical 1UI full-rate DQ signals, and these signals are merged. This reduces the output capacitance of the output driver. The serializer timing adjuster corrects any misalignment in the phases of the clock and data signals, to ensure an adequate timing margin for final serialization. Our driver with the serializer timing adjuster has been incorporated in the quarter-rate transmitter. The transmitter achieves a single-ended output swing of 400 to 600mV<sub>pp</sub>, and its energy efficiency is 1.8pJ/bit at 12.8Gb/s.

Finally, we have presented a 16Gb/s quarter-rate transmitter with mixed equalization for memory interfaces. It combines 1-tap pull-up amplitude equalization and 4-tap pulldown phase equalization. This compensates for channel losses without a significant cost in terms of power consumption, because there is no current consumption in pull-down data transmission and the relaxed impedance matching saves the current in pull-up data transmission. A prototype chip fabricated in a 65nm CMOS process performed singleended signaling at a data-rate of 16Gb/s over a channel with a loss of -14.7dB. At 16Gb/s, its energy efficiency is 1.04pJ/bit and its figure-of-merit is 0.070pJ/bit/dB. Our quarter-rate transmitter, and its use of single-ended signaling and mixed equalization, make it suitable for memory interfaces.

## **BIBLIOGRAPHY**

- [1.1.1] M. Kim, "Design of low power memory controller with adaptive eye detection algorithm," Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, 2017.
- [1.1.2] J.-H. Chae, M. Kim, H. Ko, Y. Jeong, J. Park, G.-M. Hong, D.-K. Jeong, and S. Kim, "266-2133MHz phase shifter using all-digital delay-locked loop and triangular-modulated phase interpolator for LPDDR4X interface," *Electronic Letters*, vol. 53, no. 12, pp. 766-768, Jun. 2017.
- [1.1.3] J.-H. Kim, D. Oh, R. Kollipara, J. Wilson, S. Best, T. Giovannini, I. Shaeffer, M. Ching, and C. Yuan, "Challenges and solutions for next generation main memory systems," in *IEEE 18th Conference of Electrical Performance of Electronic Packaging and Systems*, Oct. 2009, pp. 93–96.
- [1.1.4] K. Sohn, T. Na, I. Song, Y. Shim, W. Bae, S. Kang, D. Lee, H. Jung, S. Hyun, H. Jeong, K.-W. Lee, J.-S. Park, J. Lee, B. Lee, I. Jun, J. Park, J. Park, H. Choi, S. Kim, H. Chung, Y. Choi, D.-H. Jung, B. Kim, J.-H. Choi, S.-J. Jang, C.-W. Kim, J.-B. Lee, and J. S. Choi, "A 1.2 V 30 nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme," *IEEE Journal of Solid- State Circuits*, vol. 48, no. 1, pp. 168–177, Jan. 2013.
- [1.1.5] J. Eble, M. Li, and W. Beyene, "An implementer's guide to low-power and highperformance memory solutions," in *DesignCon*, Jan. 2014.
- [1.1.6] W.-Y. Shin, "An impedance-matched bidirectional multi-drop memory interface," Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, 2013.
- [1.1.7] J.-H. Chae, G.-M. Hong, J. Park, M. Kim, H. Ko, W.-Y. Shin, H. Chi, D.-K. Jeong, and S. Kim, "A 1.74mW/GHz 0.11-2.5GHz fast-locking, jitter-reducing, 180° phase-shift digital DLL with a window phase detector for LPDDR4 memory controllers," in *IEEE Asian Solid-State Circuits Conf. (ASSCC)*, Nov. 2015,

pp.109-112.

- [1.1.8] H. Lee, K.-Y. K. Chang, J.-H. Chun, T. Wu, Y. Frans, B. Leibowitz, N. Nguyen, T. J. Chin, K. Kaviani, J. Shen, Z. Shi, Wendemagegnehu, T. Beyene, S. Li, R. Navid, M. Aleksic, F. S. Lee, F. Quan, J. Zerbe, R. Perego, and F. Assaderaghi, "A 16Gb/s/Link, 64GB/s bidirectional asymmetric memory interface," *IEEE Journal of Solid- State Circuits*, vol. 44, no. 4, pp. 1235–1247, Apr. 2009.
- [1.1.9] R. Inti, M. Mansuri, J. Kennedy, H. Venkatram, C.-M. Hsu, A. Martin, J. Jaussi, and B. Casper, "A digital-intensive 2-to-9.2Gb/s/pin memory controller I/O with fast-response LDO in 10nm CMOS," in *Symposium on VLSI Circuits*, Jun. 2018, pp. 151-152.
- [1.1.10] S. Lee, J. Seo, K. Lim, J. Ko, J.-Y. Sim, H.-J. Park, and B. Kim, "A 7.8Gb/s/pin 1.96pJ/b compact single-ended TRX and CDR with phase-difference modulation for highly reflective memory interfaces," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, Feb. 2018, pp. 272-274.
- [1.2.1] C.-K Lee, M. Ahn, D. Moon, K. Kim, Y.-J. Eom, W.-Y. Lee, J. Kim, S. Yoon, B. Choi, S. Kwon, J.-Y. Park, S.-J. Bae, Y.-C. Bae, J.-H. Choi, S.-J. Jang, and G. Jin, "A 6.4Gb/s/pin at sub-1V supply voltage TX-interleaving technique for mobile DRAM interface," in *Symposium on VLSI Circuits*, Jun. 2015, pp. 182-183.
- [1.2.2] W.-H. Shin, Y.-H. Jun, and B.-S. Kong, "A DFE receiver with equalized VREF for multidrop single-ended signaling," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 60, no. 7, pp. 412-416, Jul. 2013.
- [1.2.3] X. Zheng, C. Zhang, F. Lv, F. Zhao, S. Yuan, S. Yue, Z. Wang, F. Li, Z. Wang, and H. Jiang, "A 40-Gb/s quarter-rate SerDes transmitter and receiver chipset in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 11, pp. 2963-2978, Nov. 2017.
- [1.2.4] J. Kim, A. Balankutty, A. Elshazly, Y.-Y. Huang, H. Song, K. Yu, and F. O'Mahony, "A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, Feb. 2015, pp. 60-62.

- [1.2.5] M. Brox, M. Balakrishnan, M. Broschwitz, C. Chetreanu, S. Dietrich, F. Funfrock, M. A. Gonzalez, T. Hein, E. Huber, D. Lauber, M. Ivanov, M. Kuzmenka, C. N. Mohr, J. O. Garrido, S. Padaraju, S. Piatkowski, J. Pottgiesser, P. Pfefferl, M. Plan, J. Polney, S. Rau, M. Richter, R. Schneider, R. O. Seitter, W. Spirkl, M. Walter, J. Weller, and F. Vitale, "An 8-Gb 12-Gb/s/pin GDDR5X DRAM for cost-effective high-performance applications," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 1, pp. 134-143, Jan. 2018.
- [1.2.6] J.-H. Chae, H. Ko, J. Park, and S. Kim, "A 12.8Gb/s quarter-rate transmitter using a 4:1 overlapped multiplexing driver combined with an adaptive clock phase aligner," *IEEE Transactions on Circuits and Systems II: Express Briefs*, to be published, doi: 10.1109/TCSII.2018.2858810.
- [1.2.7] Y. Kim, K. Song, D. Kim, and S. Cho, "A 2.3-mW 0.01-mm<sup>2</sup> 1.25-GHz quadrature signal corrector with 1.1-ps error for mobile DRAM interface in 65nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 4, pp. 397-401, Apr. 2017.
- [1.2.8] S. Chen, H. Li, and P.Y. Chiang, "A robust energy/area-efficient forwarded-clock receiver with all-digital clock and data recovery in 28-nm CMOS for highdensity interconnects," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 24, no. 2, pp. 578-586, Feb. 2016.
- [1.2.9] I. Raja, V. Kharti, Z. Zahir, and G.Banerjee, "A 0.1-2-GHz quadrature correction loop for digital multiphase clock generation circuits in 130-nm CMOS," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 25, no. 3, pp. 1044-1053, Mar. 2017.
- [1.2.10] H. Kang, K. Ryu, D.-H. Jung, D. Lee, W. Lee, S. Kim, J. Choi, and S.-O, Jung, "Process variation tolerant all-digital 90° phase shift DLL for DDR3 interface," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 59, no. 10, pp. 2186-2196, Oct. 2012.
- [1.2.11] K. Ryu, D.-H. Jung, and S.-O. Jung, "Process-variation-calibrated multiphase delay locked loop with a loop-embedded duty-cycle corrector," *IEEE*

*Transactions on Circuits and Systems II: Express Briefs*, vol. 61, no. 1, pp. 1-5, Jan. 2014.

- [1.2.12] J. Cho and Y.-J. Min, "An all-digital duty-cycle and phase-skew correction circuit for QDR DRAMs," *IEICE Electronics Express*, vol. 15, no. 9, pp. 1-6, May. 2018.
- [1.2.13] C.-W. Tsai, Y.-T. Chiu, and K.-H. Cheng, "A wide-range all-digital delay-locked loop for double data rate synchronous dynamic random access memory application," in *IEEE International Symposium on Circuits and Systems (ISCAS)*, May. 2018.
- [1.2.14] C.-H. Jeong, A. Abdullah, Y.-J. Min, I.-C. Hwang, and S.-W. Kim, "All-digital duty-cycle corrector with a wide duty correction range for DRAM applications," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 24, no. 1, pp. 363-367, Jan. 2016.
- [1.2.15] S.-G. Kim, T. Kim, D.-H. Kwon, and W.-Y. Choi, "A 5-8Gb/s low-power transmitter with 2-tap pre-emphasis based on toggling serialization," in *IEEE Asian Solid-State Circuits Conf. (ASSCC)*, Nov. 2016, pp. 249-252.
- [1.2.16] K. Huang, Z. Wang, X. Zheng, C. Zhang, and Z. Wang, "An 80mW 40Gb/s transmitter with automatic serializing time window search and 2-tap preemphasis in 65nm CMOS technology," *IEEE Trans. on Circuits and Systems I: Regular Papers*, vol. 62, no. 5, pp. 1441-1450, May 2015.
- [1.2.17] W. Bae, H. Ju, K. Park, J. Han, and D.-K. Jeong, "A supply-scalable-serializing transmitter with controllable output swing and equalization for next-generation standards," *IEEE Trans. on Industrial Electronics*, vol. 65, no. 7, pp. 5979-5989, Jul. 2018.
- [1.2.18] Y.-H. Song, R. Bai, K. Hu, H.-W. Yang, P. Y. Chiang, and S. Palermo "A 0.47-0.66pJ/bit, 4.8-8Gb/s I/O transceiver in 65nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1276-1288, May 2013.
- [1.2.19] T. O. Dickson, Y. Liu, S. V. Rylov, A. Agrawal, S. Kim, P.-H. Hsieh, J. F. Bulzacchelli, M. Ferriss, H. A. Ainspan, A. Rylyakov, B. D. Parker, M. P. Beakes, C. Baks, L. Shan, Y. Kwark, J. A. Tierno, and D. J. Friedman "A 1.4pJ/bit, power-

scalable 16x12 Gb/s source-synchronous I/O with DFE receiver in 32nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1917-1931, Aug. 2015.

- [1.2.20] B. Raghavan, D. Cui, U. Singh, H. Maarefi, D. Pi, A. Vasani, Z. C. Huang, V. Carli, A. Momtaz, and J. Cao, "A sub-2 W 39.8-44.6 Gb/s transmitter and receiver chipset with SFI-5.2 interface in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3219-3228, Dec. 2013.
- [1.2.21] C. Gimeno, E. Guerrero, C. S.-Azqueta, G. Royo, C. Aldea, and S. Celma, "Continuous-time linear equalizer for multigigabit transmission through SI-POF in factory area networks," *IEEE Trans. on Industrial Electronics*, vol. 62, no. 10, pp. 6530-6532, Oct. 2015.
- [1.2.22] J. Aguirre, D. Bol, D. Flandre, C. S.-Azqueta, and S. Celma, "A robust 10-Gb/s duobinary transceiver in 0.13-µm SOI CMOS for short-haul optical networks," *IEEE Trans. on Industrial Electronics*, vol. 65, no. 2, pp. 1518-1525, Feb. 2018.
- [1.2.23] J. E. Proesel and T. O. Dickson, "A 20-Gb/s, 0.66-pJ/bit serial receiver with 2stage continuous-time linear equalizer and 1-tap decision feedback equalizer in 45nm SOI CMOS," in *Symp. on VLSI Circuits*, pp. 206-207, Jun. 2011.
- [1.2.24] M. Kossel, T. Toifl, P. A. Francese, M. Brandli, C. Menolfi, P. Buchmann, L. Kull, T. M. Andersen, and T. Morf, "A 10Gb/s 8-tap 6b 2-PAM/4-PAM Tomlinson-Harashima precoding transmitter for future memory-link applications in 22-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3268-3284, Dec. 2013.
- [1.2.25] J.-H. Chae, M. Kim, G.-M. Hong, J. Park, and S. Kim, "A 3.2Gb/s 16-channel transmitter for intra-panel interfaces, with independently controllable output swing, common-mode voltage, and equalization," *IEEE Access*, vol. 6, pp. 78055-78064, Dec. 2018.
- [1.2.26] J. M. Wilson, W. J. Turner, J. W. Poulton, B. Zimmer, X. Chen, S. S. Kudva, S. Song, S. G. Tell, N. Nedovic, W. Zhao, S. R. Sudhakaran, C. T. Gray, and W. J. Dally, "A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off-

and on- package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 276-278, Feb. 2018.

- [1.2.27] K.-S. Kwak, J.-H. Ra, H.-S. Moon, S.-K. Hong, and O.-K. Kwon, "A low-power two-tap voltage-mode transmitter with precisely matched output impedance using an embedded calibration circuit," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 63, no. 6, pp. 573-577, Jun. 2016.
- [1.2.28] A. Ramachandran, A. Natarajan, and T. Anand, "A 16Gb/s 3.6pJ/b Wireline Transceiver with Phase Domain Equalization Scheme: Integrated Pulse Width Modulation (iPWM) in 65nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 488-490, Feb. 2017.
- [1.2.29] W.-J. Su and S.-I. Liu, "A 5Gb/s Voltage-Mode Transmitter Using Adaptive Time-Based De-Emphasis," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 4, pp. 959-968, Apr. 2017.
- [1.2.30] A. Singh, D. Carnelli, A. Falay, K. Hofstra, F. Licciardello, K. Salimi, H. Santos,
   A. Shokrollahi, R. Ulrich, C. Walter, J. Fox, P. Hunt, J. Keay, R. Simpson, A.
   Stewart, G. Surace, and H. Cronie, "A pin- and power-efficient low-latency 8-to-12Gb/s/wire 8b8w-coded SerDes link for high-loss channels in 40nm technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest Tech. Papers*, pp. 442-444, Feb. 2014.
- [1.2.31] A. Shokrollahi, D. Carnelli, J. Fox, K. Hofstra, B. Holden, A. Hormati, P. Hunt, M. Johnston, J. Keay, S. Pesenti, R. Simpson, D. Stauffer, A. Stewart, G. Surace, A. Tajalli, O. T. Amiri, A. Tschank, R. Ulrich, C. Walter, F. Licciardello, Y. Mogentale, and A. Singh, "A pin-efficient 20.83Gb/s/wire 0.94pJ/bit forwarded clock CNRZ-5-Coded SerDes up to 12mm for MCM packages in 28nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 182-184, Feb. 2016.
- [1.2.32] H. Cheng and A. C. Carusone, "A 32/16Gb/s 4/2-PAM transmitter with PWM pre-emphasis and 1.2Vpp per side output swing in 0.13-μm CMOS," in *IEEE*

*Custom Integrated Circuits Conf. (CICC)*, DOI:10.1109/CICC.2008.4672165, pp. 635-638, Sep. 2008.

- [1.2.33] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, "A 5Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis", *IEEE Journal of Solid-State Circuits*, vol. 49, no. 8, pp. 1827-1836, Aug. 2014.
- [1.2.34] J. F. Buckwalter, M. Meghelli, D. J. Friedman, and A. Hajimiri, "Phase and amplitude pre-emphasis techniques for low-power serial links," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 6, pp. 1391-1399, Jun. 2006.
- [1.2.35] A. Ramachandran and T. Anand, "A 0.5-to-0.9V, 3-to-16Gb/s, 1.6-to-3.1pJ/b wireline transceiver equalizing 27dB loss at 10Gb/s with clock-domain encoding using integrated pulse-width modulation (iPWM) in 65nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 268-270, Feb. 2018.
- [1.2.36] J. Park, J.-H. Chae, Y.-U. Jeong, J.-W. Lee, and S. Kim., "A 2.1-Gb/s 12-channel transmitter with phase emphasis embedded serializer for 55-in UHD intra-panel interface," *IEEE J. Solid-State Circuits*, vol. 53, no. 10, pp. 2878-2888, Oct. 2018.
- [2.1.1] C.-K. Lee, Y.-J. Eom, J.-H. Park, J. Lee, H.-R. Kim, K. Kim, Y. Choi, H.-J. Chang, J. Kim, J.-M. Bang, S. Shin, H. Park, S. Park, Y.-R. Choi, H. Lee, K.-H. Jeon, J.-Y. Lee, H.-J. Ahn, K.-H. Kim, J.-S. Kim, S. Chang, H.-R. Hwang, D. Kim, Y.-H. Yoon, S.-H. Hyun, J.-Y. Park, Y.-G. Song, Y.-S. Park, H.-J. Kwon, S.-J. Bae, T.-Y. Oh, I.-D. Song, Y.-C. Bae, J.-H. Choi, K.-I. Park, S.-J. Jang, and G.-Y. Jin, "A 5Gb/s/pin 8Gb LPDDR4X SDRAM with Power-Isolated LVSTL and Split-Die Architecture with 2-Die ZQ Calibration Scheme," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 390-392, Feb. 2017.
- [3.2.1] Y. Zhang, W. Rhee, T. Kim, H. Park, and Z. Wang, "A 0.35-0.5-V 18-152MHz digitally controlled relaxation oscillator with adaptive threshold calibration in 65-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 62, n. 8, pp. 736-740, Aug. 2015.

- [3.2.2] Y. Tokunaga, S. Sakiyama, A. Matsumoto, and S. Dosho, "An on-chip CMOS relaxation oscillator with power averaging feedback using a reference proportional to supply voltage," in *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, Feb. 2009, pp. 404–405.
- [3.2.3] J. Park, G.-M. Hong, M. Kim, J.-H. Chae, and S. Kim, "A 0.13pJ/bit, referenceless transceiver with clock edge modulation for a wired intra-BAN communication," in *IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)*, Jul. 2017, pp. 1-6.
- [3.4.1] J. Yin, P.-I. Mak, F. Maloberti, R.P. Martins, "A time-interleaved ring-VCO with reduced 1/f<sup>3</sup> phase noise corner, extended tuning range and inherent divided output," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 2979-2016, Dec. 2016.
- [3.5.1] J.-H. Chae, M. Kim, G.-M. Hong, J. Park, H. Ko, W.-Y. Shin, H. Chi, D.-K. Jeong, and S. Kim "0.11-2.5GHz all-digital DLL for mobile memory interface with phase sampling window adaptation to reduce jitter accumulation," *Journal of Semiconductor Tech. and Sci.*, vol. 17, no. 3, pp. 411-424, Jun. 2017.
- [4.1.1] H. Jiang, D. Li and W. Li, "Performance analysis of overlapped multiplexing techniques," in *IEEE 3rd International Workshop on Signal Design and Its Applications in Communications*, pp. 233-237, Sep. 2007.
- [4.1.2] K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, N. Masuda, T. Takemoto, F. Yuki, and T. Saito, "A 12.3-mW 12.5-Gb/s complete transceiver in 65-nm CMOS process," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2838-2849, Dec. 2010.
- [4.2.1] C.-K. Lee, Y.-J. Eom, J.-H. Park, J. Lee, H.-R. Kim, K. Kim, Y. Choi, H.-J. Chang, J. Kim, J.-M. Bang, S. Shin, H. Park, S. Park, Y.-R. Choi, H. Lee, K.-H. Jeon, J.-Y. Lee, H.-J. Ahn, K.-H. Kim, J.-S. Kim, S. Chang, H.-R. Hwang, D. Kim, Y.-H. Yoon, S.-H. Hyun, J.-Y. Park, Y.-G. Song, Y.-S. Park, H.-J. Kwon, and S.-J. Bae, "A 5Gb/s/pin 8Gb LPDDR4X SDRAM with Power-Isolated LVSTL and Split-Die Architecture with 2-Die ZQ Calibration Scheme," in *IEEE Int. Solid-State*

Circuits Conf. (ISSCC) Digest Tech. Papers, pp. 390-392, Feb. 2017.

- [4.3.1] J.-H. Chae, H. Ko, J. Park, and S. Kim, "A quadrature clock corrector for DRAM interfaces, with a duty-cycle and quadrature phase detector based on a relaxation oscillator," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, to be published, doi: 10.1109/TVLSI.2018.2883730.
- [5.1.1] S. Wang and E. Ipek, "Reducing data movement energy via online data clustering and encoding," in *IEEE/ACM Int. Symp. on Microarchitecture (MICRO)*, pp. 32, Oct. 2016.
- [5.1.2] M. Choi, S. Lee, M. Lee, J.-H. Lee, J.-Y. Sim, H.-J. Park, and B. Kim, "An FFE transmitter which automatically and adaptively relaxes impedance matching," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1780-1792, Jun. 2018.
- [6.3.1] K.-Y. Chen, W.-Y. Chen, and S.-I. Liu, "A 0.31-pJ/bit 20-Gb/s DFE with 1 discrete tap and 2 IIR filters feedback in 40-nm-LP CMOS," IEEE Trans. on Circuits and Systems II: Express Briefs, vol. 64, no. 11, pp. 1282-1286, Nov. 2017.
- [6.3.2] E. Kim and T. Oh, "Single-ended 2 ch.×3.4 Gbit/s dual-mode near-ground transmitter IO driver in 45nm CMOS process," Electron. Lett., vol. 53, no. 5, pp. 308-310, Mar. 2017.

## 한글 초록

본 연구에서 메모리 인터페이스를 위한 단일 종단 신호 방식을 사용하는 쿼터 레이트 송신기가 제시되었다. 더 높은 메모리 대역폭에 대한 요구가 증 가함에 따라, 우리는 핀당 데이터 율을 높이기 위해 다음과 같은 점을 주장했 다.

첫째, 풀 레이트 및 하프 레이트 설계에 비해 고속 경로에서 타이밍 마진 이 보다 완화되고 동시 스위칭 잡음, 전력 소모 및 클록 주파수가 낮기 때문 에 쿼터 레이트 아키텍처를 채택했다.

둘째, 완화 발진기를 사용하는 직교 클록 보정기를 제안하여 듀티 사이클 및 직교 위상 오류를 디지털 쌍으로 변환하여 디지털화하고 비교함으로써 검 출하였다. 이는 우수한 검출 정확도를 달성하고 넓은 범위의 듀티 사이클 및 직각 위상 오류를 감지 할 수 있습니다. 프로토 타입은 공급 전압이 1.2V 인 55nm CMOS 공정으로 구현되었으며 0.003mm<sup>2</sup>의 면적을 차지한다. 측정 결과, 동작 범위는 1GHz~3GHz, 전력 효율은 0.79mW/GHz이며, 최대 듀티 사이클 오 차는 3GHz에서 0.8%, 최대 직교 위상 오차는 3GHz에서 1.1°이다.

셋째, 직렬화 타이밍 조정기와 결합된 4:1 중첩 시분할 다중화 드라이버를 제시했습니다. 쿼터 레이트 송신기에서 요구되는 마지막 4:1 직렬화는 4개의 유닛 드라이버를 포함하는 중첩 시분할 다중화 드라이버에 의해 수행된다.4개

108

의 유닛 드라이버 중 2개는 동일한 2개의 1UI 풀 레이트 DQ 신호를 동시에 출력하고 최종 직렬화를 수행하는 동안 병합됩니다. 이렇게 하면 출력 커패시 턴스가 감소합니다. 이러한 직렬화 프로세스의 정확한 타이밍은 클록 신호의 4 개 위상의 적응형 정렬에 의해 유지된다. 12.8Gb/s의 속도를 가지는 쿼터 레 이트 송신기에 통합된 프로토 타입은 55nm CMOS 기술로 구현되었습니다. 이 송신기의 단일 종단 출력 스윙은 400~600mVpp이며 1.8pJ/bit의 에너지 효율을 제공합니다.

마지막으로 1탭 풀업 진폭 등화와 4탭 풀다운 위상 균등화의 장점을 결합 하여 전력 손실을 크게 높이지 않고 채널 손실을 보상합니다. 이 방식은 메모 리 인터페이스를 위한 쿼터 레이트 송신기에 통합되었다. 65nm CMOS 공정으 로 제작된 우리의 프로토 타입은 -14.7dB의 채널 손실에서 16Gb/s의 데이터 속도로 단일 종단 신호를 송신할 수 있다. 두 가지 균등화 체계를 사용한 우 리의 구조는 1.04pJ/bit의 에너지 효율을 달성하였다.

**주요어** : 메모리 인터페이스, 쿼터 레이트 송신기, 직각 클럭 보정기, 중첩 시 분할 다중화 드라이버, 출력 드라이버, 등화

학 번:2012-20870