True Single-Phase Adiabatic Circuitry

Suhwan Kim, Student Member, IEEE, and Marios C. Papaefthymiou, Member, IEEE

Abstract—Dynamic logic families that rely on energy recovery to achieve low energy dissipation control the flow of data through gate cascades using multiphase clocks. Consequently, they typically use multiple clock generators and can exhibit increased energy consumption on their clock distribution networks. Moreover, they are not attractive for high-speed design due to their high complexity and clock skew management problems. In this paper, we present TSEL, the first energy-recovering (a.k.a. adiabatic) logic family that operates with a single-phase sinusoidal clocking scheme. We also present SCAL, a source-coupled variant of TSEL with improved supply voltage scalability and energy efficiency. Optimal performance under any operating conditions is achieved in SCAL using a tunable current source in each gate. TSEL and SCAL outperform previous adiabatic logic families in terms of energy efficiency and operating speed. In layout-based simulations with 0.5 μm standard CMOS process parameters, 8-bit carry-lookahead adders (CLAs) in TSEL and SCAL function correctly for operating frequencies exceeding 200 MHz. In comparison with corresponding CLAs in alternative logic styles that operate at minimum supply voltages, CLAs designed in our single-phase adiabatic logic families are more energy efficient across a broad range of operating frequencies. Specifically, for clock rates ranging from 10 to 200 MHz, our 8-bit SCAL CLAs are 1.5 to 2.5 times more energy efficient than corresponding adders developed in PAL and 2N2P and 2.0 to 5.0 times less dissipative than their purely combinational or pipelined CMOS counterparts.

Index Terms—Adiabatic circuits, carry-lookahead adder, dynamic circuits, energy recovery logic, low-energy computing, true single-phase clocking.

I. INTRODUCTION

Energy recovery is a promising approach to the design of VLSI circuits with extremely low energy dissipation. Energy-recovering circuits achieve reduced energy consumption by steering currents across devices with low voltage differences and by gradually recycling the energy stored in their capacitive loads [11]–[5]. They are thus often referred to as adiabatic circuits, due to the resemblance of their operation to adiabatic state changes in physical systems. In general, adiabatic circuits that operate very efficiently at low operating frequencies stop functioning at high data rates. Conversely, adiabatic circuits that can function correctly across a broad range of operating frequencies tend to be dissipative at low frequencies. These circuit families rely on multiphase clocks to control cascaded gates. They are consequently unattractive for high-speed and low-power system design due to the plethora of design problems associated with multiphase clocking such as increased energy dissipation, layout complexity in clock distribution, clock skew, and multiple power-clock generators.

Various adiabatic circuit families have been proposed over the past few years. They all use at least two power-clock phases to control gate cascades, however. A scheme with asymptotically zero dissipation that requires reversible computations was described in [3]. Several relatively simple adiabatic logic styles that use diodes to avoid reversibility were proposed in [6]–[9]. Circuit families that use a pair of cross-coupled transistors to adiabatically charge and discharge their loads were introduced and evaluated in [10]–[14], including 2N2P, 2N-2N2P, pass-transistor adiabatic logic (PAL), and clocked CMOS adiabatic logic (CAL).

This paper presents the first ever true single-phase adiabatic circuit family that is geared toward high-speed and low-energy VLSI design. The simplest member of our family is TSEL. In TSEL, high speed operation is ensured by nonadiabatically activating a pair of cross-coupled transistors and by adjusting two dc reference voltages. Low energy consumption is achieved by adiabatically charging and discharging capacitive loads through the cross-coupled transistors. TSEL cascades are implemented straightforwardly in an NP-domino style.

The second member of our family is SCAL, an enhancement of TSEL with improved scalability and energy efficiency. SCAL is a source-coupled adiabatic logic that achieves lower energy dissipation than TSEL for a broad range of operating frequencies. SCAL gates are derived from TSEL gates by replacing each dc reference voltage line by a current source. Each current source can be individually tuned by transistor sizing to achieve optimal nonadiabatic charging/discharging rates for its operating conditions. Both TSEL and SCAL gates are dual-rail and always present a balanced load to the clock generator, regardless of the particular data computed. Moreover, they are both functionally complete.

We have designed and simulated in HSPICE a variety of arithmetic circuits, including carry-lookahead adders (CLAs) and Hadamard transform (HT) modules for wireless communication [15], [16]. This paper describes our findings with a collection of 8-bit CLAs that we designed in TSEL, SCAL, other adiabatic logic styles, and static CMOS. In layout-based simulations with 0.5 μm standard CMOS process parameters, the 8-bit CLAs in TSEL and SCAL function correctly for operating frequencies exceeding 200 MHz with minimum-size transistors used in the evaluation trees of each gate. In comparison with corresponding adders in alternative logic styles and minimum possible supply voltages, TSEL and SCAL are more energy efficient across a broad range of operating frequencies. Specifically, for clock frequencies ranging from 10 to 200 MHz, our 8-bit SCAL CLAs are 1.5 to 2.5 times more energy efficient than corresponding...
adders developed in PAL and 2N2P and 2.0 to 5.0 times less dissipative than their purely combinational or pipelined static CMOS counterparts. For clock frequencies ranging from 100 to 200 MHz, TSEL adders are as energy efficient as their corresponding designs in SCAL.

The remainder of this paper has five sections. In Section II, we review the adiabatic logic families most closely related to TSEL and SCAL. The structure and operation of TSEL gates and cascades are described in Section III. The structure and operation of SCAL gates and cascades are presented in Section IV. Our adder designs are described in Section V. Section VI presents simulation results from layouts of our 8-bit carry-lookahead adders that were designed using a 0.5 μm standard CMOS technology. Our contributions and ongoing research are summarized in Section VII.

II. OVERVIEW OF ADIABATIC LOGIC FAMILIES

This section highlights the operation of static CMOS and several adiabatic logic families related to TSEL and SCAL. We focus on the characteristics of the adiabatic logic families 2N2P, 2N-2N2P, PAL, and CAL that use a cross-coupled transistor structure for adiabatic operation similar to TSEL and SCAL.

The dominant factor in the dissipation of static CMOS logic is the power required to charge capacitive nodes. In the CMOS inverter of Fig. 1, when the power supply drives the output node high, the voltage \( V_{dd} \) is applied abruptly resulting in a high voltage drop across the PMOS switching device \( MP \). The total energy drawn from the power supply is \( E_c = CV_{dd}^2 \), where \( C \) is the capacitance of the output node and \( V_{dd} \) is the supply voltage. Half of this energy is dissipated on the on-resistance \( R \) of the device \( MP \), and the other half is stored in the output capacitance. When the output is driven low at a later cycle, the energy stored in the output capacitor is dissipated on the on-resistance \( R \) of the device \( MN \).

In contrast to conventional CMOS circuits, adiabatic circuits charge or discharge capacitors while striving to keep the potential drop across their switching devices small. Thus, only a fraction of the energy supplied during each cycle is dissipated on their resistive components. The bulk of the supplied energy is returned to the power supply and can be reused in subsequent cycles. Typically, power supplies for adiabatic circuits provide a time-varying periodic output signal, or power-clock, that gradually sways between 0 V and \( V_{dd} \). The period of this signal is long enough to maintain a small potential drop \( V_{R} \). A simple form of adiabatic charging can be accomplished using a power supply with a ramp output. Such a signal can be approximated using a resonant \( RLC \) oscillating structure. Other adiabatic generator designs have been presented in [3], [17], [18]. The number of power-clocks required to control cascaded gates is an important consideration in adiabatic logic design since it affects energy dissipation, operating speed, and design complexity.

Figs. 2 and 3 show inverters from 2N2P and 2N-2N2P, two related adiabatic logic families that have no diodes and use four-phase clocks to control cascades [11]. These families exhibit a nonadiabatic dissipation proportional to \( CV_{th}^2 \), where \( V_{th} \) is the threshold voltage of PMOS devices in a pair of cross-coupled transistors. The nonadiabatic switching event occurs during a brief interval in the beginning of the evaluation phase and provides the voltage differential that activates a cross-coupled transistor structure similar to the one used in the sample-set differential logic (SSDL) [19]. Here, 2N2P and 2N-2N2P present a balanced capacitive load and possess superior speed characteristics than other adiabatic families in this section. The primary advantage of 2N-2N2P over 2N2P is that the pair of cross-coupled NMOS switches results in nonfloating outputs.

Fig. 4 shows an inverter in PAL, an adiabatic logic family similar to 2N2P [13]. In PAL, the ground node of 2N2P is connected to the power supply in order to eliminate nonadiabatic energy consumption. PAL achieves fully adiabatic operation at the cost of high-speed operation. Cascaded PAL gates are controlled by a two-phase clock.

Fig. 5 shows an inverter designed in CAL, an adiabatic logic related to 2N-2N2P [14]. The main structural difference between CAL and 2N-2N2P is the path control switches in the pull-down tree. In CAL, cascaded structures are controlled by a single-phase clock and two auxiliary square-wave clocks. Thus,
even though this logic is referred to as single-phase, its cascades are controlled by three waveforms. In terms of their operating speed, CAL gates are comparable to 2N-2N2P gates. CAL circuits achieve half the throughput of corresponding 2N-2N2P circuits, however, because they enable logic evaluation in alternate clock cycles. Moreover, CAL designs tend to be more dissipative than 2N-2N2P, due to their higher device count.

III. TRUE SINGLE-PHASE ENERGY-RECOVERING LOGIC

TSEL is a partially adiabatic circuit family akin to 2N2P, 2N-2N2P, and CAL. Power is supplied to TSEL gates by a single-phase sinusoidal power-clock. Cascades are composed of alternating PMOS and NMOS gates. Two dc reference voltages ensure high-speed and high-efficiency operation. They also enable the cascading of TSEL gates in an NP-domino style. This section describes the structure and operation of TSEL.

A. TSEL Gates

The basic structure of a TSEL PMOS gate is shown in the PMOS inverter of Fig. 6(a). This inverter comprises a pair of cross-coupled transistors (MP1 and MP2), a pair of current control switches (MP3 and MP4), and two function blocks (MP5 and MP6). The port PC supplies the sinusoidal power-clock $V_{PC}$. The port RP supplies a constant reference voltage $V_{RP}$ to the PMOS gate. Inputs and outputs are dual-rail encoded. The current control switches and the reference voltages are the structural characteristics that differentiate TSEL from other adiabatic logic families.

The operation of a TSEL PMOS gate has two phases: discharge ($D_P$) and evaluate ($E_P$). Fig. 7 shows these phases with respect to the power-clock $V_{PC}$. During $D_P$, the energy stored in the capacitance of nodes out or out$_{out}$ is recovered. In the beginning of this phase, $V_{PC}$ is high. As $V_{PC}$ starts ramping down toward low, it pulls both $V_{out}$ and $V_{out}_{out}$ down toward the PMOS threshold voltage $|V_{thp}|$. This event is adiabatic until $V_{PC}$ drops below $V_{RP} - |V_{thp}|$.

The output of the gate is evaluated during $E_P$. Let us assume that $V_{in}$ is high and $V_{in}$ is low. Initially, $V_{PC}$ is low. As $V_{PC}$ starts rising, MP1 and MP2 turn on. While $V_{PC} < V_{RP} - |V_{thp}|$, MP3 and MP4 are conducting. Since $V_{RP}$ exceeds $V_{PC}$, a pull-up path is created from $RP$ to out$_{out}$, and the voltage at out$_{out}$ starts rising toward $V_{RP}$. The pair of cross-coupled transistors MP1 and MP2 functions as a sense-amplifier and boosts the voltage difference of the two output nodes. As soon as this difference exceeds $|V_{thp}|$, MP1 turns off and out$_{out}$ is charged adiabatically from that point on. When $V_{PC} \geq V_{RP} - |V_{thp}|$, MP3 and MP4 turn off and disconnect the function blocks from the outputs out and out$_{out}$. Hence, any further changes in the inputs do not propagate to the outputs. out$_{out}$ stays at $V_{out}_{out}$ throughout $E_P$, and out$_{out}$ is charged up to the peak of $V_{PC}$ at the end of $E_P$. The output values are ready to be sampled near the peak of $V_{PC}$.

The TSEL inverter in Fig. 6(b) shows the basic structure of an NMOS gate in TSEL. This gate includes only NMOS devices that are interconnected in a manner identical to its corresponding PMOS gate. The reference voltage level $V_{RN}$ can be selected to maximize the gate’s energy efficiency. The gate will still function correctly, however, even if $V_{RN}$ is set equal to $V_{RP}$.

The operation of an NMOS gate is complementary to PMOS. During each clock cycle, each NMOS gate goes through a charge ($C_N$) phase and an evaluate ($E_N$) phase. During $E_N$, a pull-down path is formed from an output to $RN$ as long as $V_{PC} > V_{RN} + V_{in}$, where $V_{RN}$ and $V_{in}$ are the reference voltage and the threshold voltage for NMOS gates, respectively. When $V_{PC} \leq V_{RN} + V_{in}$, the current control switches disconnect
Fig. 7. TSEL timing. $E_P$: PMOS discharge phase; $C_N$: NMOS charge phase; $E_N$: NMOS evaluate phase; $I_P$, $I_N$: PMOS and NMOS output held stable; $I_P$, $I_N$: adiabatic switching for PMOS and NMOS; $I_P$, $I_N$: nonadiabatic switching for PMOS and NMOS.

Fig. 8. A four-stage pipeline of TSEL inverters with two dc reference voltages.

the function blocks from the outputs, and the recovery of the energy stored in one of the output nodes is initiated. Any further changes to the inputs do not affect the final output values which are sampled near the negative peak of $V_{PC}$.

B. TSEL Energetics

The reference voltage levels $V_{RP}$ and $V_{RN}$ enable the cascading of TSEL gates without any intermediate inverters. These voltages affect the energy dissipation of the TSEL structures, and their optimal values depend primarily on the operating frequency and the output loads of the gates. As $V_{RP}$ decreases, the duration of the nonadiabatic event $N_P$ in the PMOS gates is becoming shorter and the voltage difference between the discharged output nodes and $RP$ decreases. Symmetrically, as $V_{RN}$ increases, the duration of the nonadiabatic event $N_N$ and the voltage difference between the charged output nodes and $RN$ decreases. Consequently, the voltage levels $V_{RP}$ and $V_{RN}$ control both the duration of the evaluation phase and the value of the charging or discharging current, thus affecting the energy dissipation associated with the nonadiabatic intervals during evaluation.

Increasing $V_{RP}$ or decreasing $V_{RN}$ speeds up operation without incurring the area overhead associated with sizing. On the other hand, the closer $V_{RP}$ and $V_{RN}$ are to 0 V and $V_{dd}$, respectively, the lower the energy dissipation of the TSEL cascade is. If $V_{RP}$ drops below a certain value, however, or if $V_{RN}$ exceeds a certain value, the hold stages become too short, and the cascade fails. The theoretical minimum value for $V_{RP}$ is $2 \cdot |I_{UP}|$, whereas the theoretical maximum for $V_{RN}$ is $V_{dd} - 2 \cdot |I_{IN}|$. Due to this restriction on the scaling of $V_{RP}$ and $V_{RN}$, the energy efficiency of TSEL at low frequencies degrades. This limitation, which we discuss in more detail with the help of simulation results in Section VI, is remedied in SCAL by the introduction of a tunable current source.

C. TSEL Cascades

TSEL cascades are built by stringing together alternating PMOS and NMOS gates, as shown in Fig. 8. The only signal required to control a TSEL cascade is a single phase of a sinusoidal power-clock. Even though a single reference voltage suffices to ensure correct operation, speed and energy efficiency can improve by using separate PMOS and NMOS reference voltages.

The relative timing of the gates in a TSEL cascade is shown in Fig. 7. At any time during the circuit’s operation, either all PMOS gates evaluate and all NMOS gates charge or all PMOS gates discharge and all NMOS gates evaluate. The brief time interval between evaluate/discharge or evaluate/charge during which the outputs of a gate are stable is called the hold ($H_P$ or $H_N$) phase in that gate’s operation. While the outputs of the odd stages are stable, their current switches are off. At the same time, the function blocks of the even stages are connected to their reference voltage and can safely evaluate their outputs. After half a cycle, while the current switches of the even stages are off, the function blocks of the odd stages are connected to their reference voltage. Thus, the outputs of the even stages are stable and the outputs of the odd stages can be evaluated.

IV. SOURCE-COUPLED ADIABATIC LOGIC

This section describes SCAL, a partially adiabatic, dynamic logic family. SCAL retains all of TSELs positive features,
including single-phase power-clock operation. Moreover, it achieves energy efficient operation across a broad range of operating frequencies by using an individually tunable current source at each gate.

### A. SCAL Gates

Fig. 9 shows the structure of a PMOS and an NMOS inverter in SCAL. The PMOS inverter in Fig. 9(a) comprises a pair of cross-coupled transistors (MP1 and MP2), a pair of current control switches (MP3 and MP4), two function blocks (MP5 and MP6), and a current source (MP7) that is biased by a voltage $V_{2P}$. A sinusoidal power-clock $V_{PC}$ is applied through the port PC. A constant supply voltage $V_{dd}$ is required for activating the pair of cross-coupled transistors. The rate of the charge flow through the current source is controlled by the $W/L$ ratio of MP7.

A PMOS SCAL gate operates in two phases: discharge ($D_{P}$) and evaluate ($E_{P}$). Fig. 10 shows these phases with respect to the waveform $V_{PC}$, which is denoted by the bold waveform. The thin waveforms above and below $V_{PC}$ indicate two possible choices for the biasing voltages $V_{2P}$ and $V_{BN}$, respectively. These waveforms are obtained by level shifting the power-clock $V_{PC}$ and result in increased energy efficiency. SCAL gates will still work with a constant biasing voltage, although at a reduced efficiency level.

The energy stored in the node out or out$^-$ is recovered during $D_{P}$. In this phase, $V_{PC}$ starts from high and ramps down toward low, pulling both out$^+$ and out$^-$ down toward the PMOS threshold voltage $|V_{th}|$. This state change occurs adiabatically until $V_{PC}$ drops below $|V_{th}|$.

Each PMOS gate computes a new output during the $E_{P}$ phase. In the beginning of this phase, $V_{PC}$ is low. As $V_{PC}$ starts rising, $V_{2P}$ follows. As long as the gate-to-source voltage $V_{2P} - |V_{th}|$ of the PMOS current source MP7 exceeds $|V_{th}|$, MP7 is turned on and the current through MP7 starts raising the voltage $V_{XP}$ of the internal node XP. While $V_{PC} < V_{XP} - |V_{th}|$, MP3 and MP4 are conducting. Therefore, assuming that $V_{in}$ is high and $V_{m}$ is low, a pull-up path is created from $V_{DD}$ to out$^-$ via XP, and $V_{out}$ starts rising toward $V_{XP}$ as $V_{PC}$ rises. The cross-coupled transistors MP1 and MP2 turn on and boost the voltage difference between the two output nodes. As soon as this difference exceeds $|V_{th}|$, MP1 turns off and out is charged adiabatically through MP2. When $V_{PC} > V_{XP} - |V_{th}|$, MP3 and MP4 turn off and disconnect the function blocks from the output nodes out$^+$ and out$^-$. Hence, any further changes in the inputs do not propagate to the outputs. At the end of the evaluation, out is charged up to the peak of $V_{PC}$. The voltage swing of the output is from $V_{dd}$ to $|V_{th}|$. The output logic values can be sampled near the peak of $V_{PC}$.

The inverter in Fig. 9(b) shows the basic structure of a SCAL NMOS gate. The operation of the NMOS gate is similar to PMOS. The two phases in its operation are charge ($E_{N}$) and evaluate ($E_{N}$).

### B. SCAL Energetics

The impact of the various circuit parameters on the operation of SCAL can be best understood by examining the behavior of an simplest MOS model in the linear, saturation, and cutoff region. The following equations describe the $I$–$V$ characteristics in these regions, respectively

$\begin{align*}
I_{ds} &= \frac{k}{2}(V_{gs} - V_{t})V_{ds} - V_{th}^2 \\
V_{gs} &\geq V_{t}, \quad V_{ds} \leq V_{gs} - V_{t} \\
&= \frac{k}{2}(V_{gs} - V_{t})^2 \\
&\geq V_{gs} \\
&> V_{gs} - V_{t} \\
&= 0
\end{align*}$

where

$\begin{align*}
I_{ds} &\text{ drain-to-source current;} \\
V_{gs} &\text{ gate-to-source voltage or biasing voltage;} \\
V_{th} &\text{ threshold voltage of device;} \\
k &\text{ (so-called device transconductance parameter) defined as } k = \mu C_{ox}(W/L); \\
\mu &\text{ effective surface mobility of the carrier in the channel;} \\
C_{ox} &\text{ gate-oxide capacitance per unit area.}
\end{align*}$

From this equation, it follows that the turning of a current source on or off is controlled by the biasing voltage. Moreover, the amount of current through a current source is proportional to the biasing voltage and the $W/L$ ratio of the current source.

In SCAL, the dissipative current through the current source of each gate is used to activate the cross-couple transistors. Even though all the gates of the same type have the same biasing voltage, the amount of current through each gate’s current source is individually controlled by the $W/L$ ratio of the current source. Thus, minimum dissipation can be achieved under any given operating conditions, such as output load or operating frequency, by adjusting the individual $W/L$ ratios of the SCAL gates.
The main difference in the operation of SCAL and TSEL is in the method that controls the current activation of the cross-coupled transistors during the evaluation phase. The energy efficiency of TSEL gates at low frequencies is limited by the limit to the scaling of the dc reference voltages. SCAL has no such limitation, however, because the magnitude of the current flow at the beginning of the $E_P$ (or $E_N$) phase can be controlled by adjusting the $W/L$ ratio of $MP7$ (or $MN7$) in each gate. At the same time, the duration of the $E_P$ (or $E_N$) phase is determined independently of the current flow by selecting a biasing voltage that is shared by all PMOS (or NMOS) gates. Thus, for any operating frequency, each SCAL gate can be individually tuned to achieve minimum dissipation under its output load.

### C. SCAL Cascades

To build SCAL cascades, PMOS and NMOS gates are chained alternatively. The only signal required to control a SCAL cascade is a power clock $V_{PC}$. To improve the energy efficiency of the cascade, two biasing voltages $V_{DP}$ and $V_{DN}$ can be used that are obtained by shifting the level of $V_{PC}$. The speed and energy efficiency of a SCAL cascade can be tuned by optimally sizing the current source in each gate. Since individual gates can be tuned independently, efficient operation can be achieved for a broad range of operating frequencies. Energy consumption is minimized by setting the $W/L$ ratio of each current source equal to the minimum possible value and by bringing the biasing voltage of PMOS and NMOS current sources close to the PMOS and NMOS threshold voltage, respectively. This section describes the operation of SCAL cascades and explains the dependence of their efficiency on the size of the current source through HSPICE simulation results.

Fig. 10 shows the timing of the signals in a SCAL cascade. At any time during the circuit’s operation, either all PMOS gates evaluate and all NMOS gates charge or all PMOS gates discharge and all NMOS gates evaluate. The brief time interval between evaluate/discharge or evaluate/charge during which the outputs of a gate are stable is called the hold ($HP$ or $HN$) phase in that gate’s operation. While the current switches of the odd stages are off, their outputs are stable. At the same time, the function blocks of the even stages are connected to $VDD$ (or $VSS$) through the current sources and can safely evaluate their outputs. After half a cycle, while the current switches of the even stages are off, the inputs of the odd stages are stable, and their function blocks are connected to $VSS$ (or $VDD$) through their current sources.

A PNPN cascade of SCAL inverters and its biasing circuitry for the current sources are shown in Fig. 11. The blocks denoted by $LSP$ and $LSN$ are voltage level shifters that generate the biasing voltage for the PMOS and NMOS gates, respectively.
Fig. 12. Waveforms obtained from HSPICE simulations of the four-stage SCAL pipeline from Fig. 11. From top to bottom: power-clock \( V_{PC} \); input of 1st stage; output of first stage; output of second stage; output of third stage; and output of fourth stage.

Fig. 13. Waveforms obtained from HSPICE simulations of the four-stage SCAL pipeline from Fig. 11. From top to bottom: power-clock \( V_{PC} \); internal voltages \( V_{XP1} \) and \( V_{XP2} \); internal voltages \( V_{XN2} \) and \( V_{XN4} \); and currents through the current sources \( I_{XP1} \), \( I_{XP3} \), \( I_{XN2} \), and \( I_{XN4} \).

Fig. 12 shows the input and output waveforms obtained from HSPICE simulations of the 4-stage pipeline from Fig. 11. These simulation results were obtained when a periodic sequence \( \cdots 0111 \cdots \) was propagated through the inverter chain at
100 MHz. The $W/L$ ratio for the pair of cross-coupled transistors was $5/2$. For the function blocks, we used minimum-size transistors from a 0.5 $\mu$m standard CMOS technology with a $W/L$ ratio equal to $3/2$. For the current sources $MP17$, $MP27$, $MP37$ and $MP47$, the $W/L$ ratios were $1/2$, $1/4$, $2/1$, and $1/1$, respectively.

Fig. 13 shows the internal voltages and source currents for each stage in the four-stage pipeline of SCAL inverters from Fig. 11. The internal voltages $V_{XP1}$, $V_{XN2}$, $V_{XP3}$, and $V_{XN4}$ and the internal source currents $I_{XP1}$, $I_{XN2}$, $I_{XP3}$, and $I_{XN4}$ are proportional to the transistor ratios in the corresponding current sources. These voltages and currents affect the speed and energy dissipation of the SCAL structures. As the $W/L$ ratio of a current source transistor in a gate decreases, the current that activates its cross-coupled transistors also decreases. Conversely, as the $W/L$ ratio increases, current flow during the nonadiabatic event of the gate increases. Individual tuning of the current source sizes at each gate can thus decrease the energy dissipation of the cascade.

V. ADDER DESIGNS

In order to evaluate the energy efficiency of TSEL and SCAL, we designed a collection of CLAs using static CMOS, TSEL, SCAL, PAL, and 2N2P. All CLAs had the same gate-level structure but differed in aspects that were specific to the logic styles used in their design. This section describes our various adder designs that were all developed in 0.5 $\mu$m standard CMOS technology.

The gate-level schematic diagram used to develop the 8-bit CLA in static CMOS, PAL, 2N2P, TSEL, and SCAL is shown in Fig. 14. The buffers shown are used to propagate the correct logic values in PAL, 2N2P, TSEL, and SCAL and are not included in static CMOS. Dummy cells, indicated by dots in the schematic diagram, are included to regularize the full-custom layouts of the adiabatic logic families.

The full-custom layouts of the 8-bit CLAs we designed in TSEL and SCAL are shown in Figs. 15 and 16. Both designs have almost the same number of devices: 778 transistors for TSEL and 877 transistors for SCAL. The slight difference in the transistor counts of the two designs stems from the inclusion of a current source at each SCAL gate. The area of each of these CLAs is approximately 0.33 mm$^2$.

Both the TSEL and the SCAL CLA generate a new output on each clock cycle. Each adder has a latency of 3.0 cycles since data propagation through each gate takes half a cycle. In the SCAL adder, the $W/L$ ratio of each current source was selected individually among the values $1/8$, $1/4$, $1/2$, $4/1$, and $8/1$, according to the operating frequency, the number of inputs to the gate, and the capacitive load at the output of the gate. For the function blocks of each gate, minimum-size transistors...
were used with \( W/L \) ratio equal to 3/2. The \( W/L \) ratios of the cross-coupled transistors was set to 10/2 in every gate.

The 2N2P and PAL CLAs were obtained directly from the TSEL/SCAL designs by introducing additional clock lines and replacing the TSEL/SCAL gates by their corresponding gates in the other adiabatic styles. The area taken by their full-custom layouts was comparable with that of the TSEL/SCAL designs. The \( W/L \) ratios of the function blocks and the cross-coupled transistors were the same as in the TSEL/SCAL CLAs. All primary outputs were connected to a 60 fF load.

Both the 2N2P and the PAL adders have the same throughput as the TSEL/SCAL design, generating a new output on each clock cycle. The latency of the PAL adder is 3.0 cycles. The latency of the 2N2P CLA is 1.5 cycles, however, since data propagation through each 2N2P gate takes 0.25 cycle.

In addition to the adiabatic CLAs, we developed two CLAs in static CMOS based on the gate-level schematic diagram shown in Fig. 14. The first CMOS design was a purely combinational CLA. The second CMOS adder was a pipelined version of the fully combinational one with three pipeline stages. The standard-cell layouts of these two designs were generated using EPOCH in a 0.5 \( \mu \text{m} \) standard CMOS technology. All transistors in the evaluation trees of both CMOS adders were minimum size. To meet the timing constraints, a buffer was introduced at the output of each gate in the fully combinational CLA. Similarly, a buffer was introduced after each register in the pipelined CLA. In both cases, the \( W/L \) ratio of each buffer was 10/2.

VI. SIMULATION RESULTS

In this section, we present HSPICE simulation results for the 8-bit CLAs we developed in static CMOS, PAL, 2N2P, TSEL, and SCAL. Our circuits were designed in a 0.5 \( \mu \text{m} \) standard CMOS technology and were simulated with distributed-RC parameters extracted from layout. The simulations accounted for the dissipation of the gates and internal clock lines, assuming a 100% energy-efficient clock generator.

Fig. 17 gives the per-cycle energy consumption of our adders operating at 10, 100, and 200 MHz. For each operating frequency, the minimum energy dissipation of each adder was obtained using the smallest supply voltage that ensured its correct function at that frequency. Reference and biasing voltages were determined by trial and error. The supply voltages used are shown next to their corresponding data points. The minimum possible supply voltage dictated by the process parameters was 1.5 V. In general, SCAL possesses excellent voltage scaling properties. For every operating point, it can operate with the lowest supply voltage among all the adders we have designed.
Our results show that SCAL is more energy efficient than the other adiabatic logic families and both static CMOS designs across the entire frequency range of our simulations. At 10 MHz, SCAL is 1.5 to 2 times more energy efficient than the other adiabatic designs. Moreover, it is two times more energy efficient than the purely combinational CMOS CLA (denoted by cCMOS) and three times more energy efficient than the three-stage pipelined CMOS CLA (denoted by pCMOS). At 100 MHz, the SCAL CLA is at least two times more energy efficient than the other adiabatic designs and two to three times more energy efficient...
efficient than the two CMOS designs. At 200 MHz, the SCAL CLA is about 2.5 times more energy efficient than 2N2P and almost five times more energy efficient than pCMOS. The PAL and cCMOS CLAs do not function correctly at that frequency.

The TSEL adder is less dissipative than the PAL, 2N2P, and both CMOS designs for operating frequencies above 100 MHz. Its dissipation increases sharply at 10 MHz, however, because at such a low operating frequency the two dc reference voltages used in the adder reach their limits, as discussed in Section III-B. Consequently, the duration of the evaluation phase increases, resulting in higher overall energy consumption.

Neither the fully adiabatic PAL adder nor the pipelined static CMOS adder perform better than SCAL. The PAL adder, which is geared for very energy efficient operation at low frequencies, is more dissipative than SCAL even at 10 MHz and stops functioning above 100 MHz. The pipelined CMOS adder is more dissipative than every other design, except for TSEL at 10 MHz. The flip-flops used to reduce path delays and decrease the required supply voltage in pCMOS end up increasing the circuit’s effective capacitance and thus limit energy savings. Thus, SCAL presents a promising approach to further reducing the dissipation of CMOS designs that have reached their voltage scaling limits.

The internal node capacitances of the adiabatic designs we used in our experiments were roughly equal. Thus, a clock generator with efficiency less than 100% would merely shift the dissipation values of the adiabatic designs without changing their relative order or ratio. The relative energy savings with respect to CMOS would be lower, however, for all adiabatic families.

Fig. 18 reveals a paradox in the operation of TSEL: Energy consumption decreases with increasing voltage swing! The graphs in this figure show the energy consumption of the TSEL adders for 10, 100, and 200 MHz as a function of the peak power-clock voltage. Each point was obtained with optimal reference voltages that were computed by trial and error. At 200 and 100 MHz, as the peak power-clock voltage decreases, the energy dissipation also decreases. At 10 MHz, however, energy consumption decreases by increasing the peak power-clock voltage.

This seemingly counterintuitive behavior can be explained with the help of the waveforms in Fig. 19. Both power-clock waveforms in this figure have a 10-MHz frequency. At such a low operating frequency, the dc reference voltages have already reached the minimum values that allow correct circuit function. Therefore, the duration of the evaluation phases cannot be controlled by adjusting the reference voltage values and can thus be excessively long, resulting in higher overall energy consumption. By increasing the peak power-clock voltage, the evaluation phases become shorter. This fact is straightforward to verify by comparing the lengths of the intervals $N_{P_{2.0V}}$ and $N_{P_{1.8V}}$ that correspond to the nonadiabatic switching stages for PMOS at 3.0 V and 1.8 V, respectively. The reduction in the length of the nonadiabatic stage offsets any increased energy losses due to the higher voltage swing. Thus, the overall energy efficiency of the TSEL gates increases.

Contrary to TSEL, the operation of the SCAL adder under voltage scaling follows common wisdom. Fig. 20 shows the energy dissipation of three SCAL adders as a function of the power-clock voltage. Each of the three adders was derived by tuning the current sources of the original adder design for minimum energy dissipation at 10, 100, and 200 MHz. For example, the SCAL adder at 10 MHz was optimized for a 1.5 V power-clock voltage. Similarly, the adders simulated at 100 MHz and 200 MHz were optimized for supply voltages of 1.8 and 2.1 V, respectively. As expected, the energy consumption of the adders increases monotonically as their peak power-clock voltage increases. Moreover, as their operating frequencies increase, their dissipation becomes more sensitive to changes in the power-clock voltage.

VII. CONCLUSION

We presented the first ever true single-phase adiabatic logic family with a broad operating range. The simplest member of our family is TSEL. The other member of our family is SCAL,
a source-coupled variant of TSEL that achieves increased energy efficiency by using a tunable current source to control the rate of charge flow into or out of each gate. Our adiabatic circuitry avoids a number of problems associated with multiple power-clock schemes, including increased energy dissipation, layout complexity in clock distribution, clock skew, and multiple power-clock generators.

In HSPICE simulations of layouts in a standard 0.5 μm CMOS technology, TSEL and SCAL adders outperformed corresponding designs in static CMOS, PAL and 2N2P that were operating with power-clock voltages scaled for minimum energy dissipation. In comparison with static CMOS, PAL, and 2N2P, SCAL was 1.5 times more energy efficient at 10 MHz and at least two times more energy efficient at higher operating frequencies. Moreover, our SCAL adders were two to five times more energy-efficient than corresponding combinational and pipelined CMOS designs in the 10–200 MHz range. TSEL was less dissipative than PAL, 2N2P, and both CMOS designs for operating frequencies above 100 MHz. Although our single-phase designs were tuned manually by trial and error, the results of our investigation suggest that TSEL and SCAL are excellent candidates for high-speed and low-energy VLSI design.

We have recently designed an 8 × 8 SCAL multiplier with built-in self-test (BIST) and a single-phase sinusoidal power-clock generator with a surface mount inductor. Although the resonant clock drivers proposed by most previous papers can be used to generate a single-phase power clock, we designed our own single-phase generator that resembles a zero-voltage switched resonant power converter. A key feature of our generator is that it only conducts a small fraction of the entire inductor current. We have also developed a set of CAD tools for automating the verification and optimization of large adiabatic designs. Our chip has been fabricated in a standard 0.5 μm CMOS process and is currently being tested.

REFERENCES


[21] Suhwan Kim (S’97) received the B.S. and M.S. degrees in electrical engineering and computer science from Korea University, Korea, in 1990 and 1992, respectively. He is currently working toward the Ph.D. degree in electrical engineering and computer science at the University of Michigan, Ann Arbor. From 1993 to 1997, he was with LG Electronics, Korea, where he designed several multimedia systems-on-chip (SOCs) including an MPEG2 codec for audio, video, and system. His research interests encompass high-performance and low-power circuits, low-energy design methodologies for high-performance VLSI signal processing, and CAD tools for VLSI.

Mr. Kim received the 1994 Best Paper Award from the IEEE Korea Section.

Marios C. Papaefthymiou (M’93) received the B.S. degree in electrical engineering from the California Institute of Technology, Pasadena, in 1988 and the S.M. and Ph.D. degrees in computer science from the Massachusetts Institute of Technology, Cambridge, in 1991 and 1993, respectively.

After a three-year term as Assistant Professor at Yale University, New Haven, CT, he joined the University of Michigan, Ann Arbor, where he is currently an Associate Professor of Electrical Engineering and Computer Science and Director of the Advanced Computer Architecture Laboratory. His research interests include algorithmic, architectural, and circuit issues in the design of VLSI systems, with a primary focus on timing and energy optimization. He is also active in the field of parallel and distributed computing. Dr. Papaefthymiou received an ARO Young Investigator Award, an NSF CAREER Award, and several IBM Partnership Awards. He is an Associate Editor of the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS, IEEE TRANSACTIONS ON COMPUTERS, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS. He has served as the General Chair and as the Technical Program Chair for the ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems. He has also participated several times in the Technical Program Committee of the IEEE/ACM International Conference on Computer-Aided Design.