OK, yesterday we learned how the order CV-squared dissipation per operation of normal irreversible switching techniques (using constant-voltage power supplies) could be avoided by instead using power supplies that generate a variable-voltage waveform, and following the rule that we never turn on a transistor when there is a voltage across it. (Of course, these power supplies have to be designed specially to avoid internal dissipation; we'll see how to do this next week.)
But now, it's sensible to ask whether that "dry switching" constraint means that we can no longer do digital logic. For a long time, no one was able to figure out a way to do logic entirely using this adiabatic switching style. But in just the last decade, a number of advancements occured, which culminated, in a sense, with the discovery by our research group at MIT in about 1994 of a reversible circuit style capable of performing arbitrary pipelined, sequential (that is, with-feedback) Boolean logic, entirely using adiabatic switching. It can be implemented using perfectly ordinary CMOS transistors. Today, as far as I know, it's still the simplest known circuit style having all these properties.
That technique is the subject of today's lecture.
But after the circuit reaches equilibrium, there are only 2 possible logical states: those where the output is the opposite of the input. So the "transition function" of the device, mapping "before" states to "after" states, is a many-to-one function, losing information, and so we know from Landauer's principle that the energy associated with the lost bits (at least kT ln 2 of it) has to be dissipated.
just before after
transition transition
in out in
out
-- --- --
---
0 0 ___
0 1 ___\___> 0 1
1 0 _______> 1 0
1 1 ___/
In fact, in this device, the dissipation occurs exactly in the two "before" cases where the input has just changed, as the output settles to equilibrium to match the inverse of the new input, thereby losing the information about the output's previous state. In fact, the entire 1/2 CV-squared energy associated with the lost bit is dissipated, since nothing clever is done that might bring the dissipation closer to the kT ln 2 limit.
To do this, we will arrange that always, whenever a new input comes in, the output node of the inverter will always be in some known, standardized state representing "no information." (We'll see later exactly how to arrange that.)
In the particular technique we'll be talking about today, we choose for our standard state a voltage level halfway that is between the 0 and 1 levels, which we'll call 1/2 or Vdd/2. But, other adiabatic logic schemes are possible that would use other levels for the standard state.
Given the standardized initial state of the inverter's output, its transition function when a new input comes in will look like this:
just
before after
------ -----
in out in out
-- --- -- ---
0 1/2 --> 0 1
1 1/2 --> 1 0
As you can see, this new transition function is one-to-one (invertible) and therefore does not suffer from the Landauer argument requiring that the energy of some bits be dissipated. Let's see how to do it electrically.
"Charge recovery" is kind of a misnomer because all circuits recover charge; after all, charge is a fundamental conserved quantity. However, what's happening in SCRL is that each chunk of charge can be viewed as being returned to the power supply at almost the same voltage level that it was at when it entered the circuit, in contrast to ordinary CMOS with constant-voltage supplies where the charge enters at Vdd and leaves at 0V. Therefore almost all of the energy associated with that charge is recovered. The technique might better be termed "split-level energy recovery logic", but anyway, the old name is there and we're stuck with it... We'll just call it SCRL and be done with it.
Anyway, here's how it works (see slide). Note that the structure of the device is exactly the same as a normal CMOS inverter, except that the power/ground terminals are connected to some variable-voltage power supplies, whose waveforms we'll describe in a minute, and also that there are some constraints levied on how this device may be operated.
Here's what happens: Initially, both the input at the output, as well as the power supplies, are at the standard, neutral level Vdd/2, representing "no information." We'll just call it "1/2." Then, some previous gate adiabatically changes the input level from 1/2 to 0 or 1, corresponding to a new input bit that's coming in. This causes one of the two transistors to turn on. (For this to work, the threshold of the transistors has to be somewhat less than Vdd/2.) The other transistor, which was already off initially, is just turned off even more strongly.
Note that when this turning on and off of transistors happens, it doesn't trigger any dissipation within this structure, because there is 0 voltage across the source/drain terminals of both of these transistors.
Next, the power supplies "split" adiabatically, the top one heading up towards Vdd, and the bottom one heading down towards ground. The output follows whichever supply it is connected to, adiabatically transitioning from the neutral level to the level representing the logical negation of the input bit.
Then, the output is made use of by some later gate - we'll see how that's done in a minute - and once no one needs that output any more, we are now free to return the output to the neutral state, in order to get the gate ready to accept a new input. The rails "merge", adiabatically, returning from their zero or one states back to the central, neutral level, and the output continues to follow the rail it's connected to, and so returns to that level regardless.
Now, we are free to return the input to the neutral level. (Note that we could not have done this earlier, because that would have turned on whichever transistor was off, causing dissipation across it.) This is done by some other part of the circuit. With the input wire restored to a neutral state, it is ready to accept a new input bit, and the whole cycle can repeat.
Now, you may have noticed that there is a problem when you try to just directly cascade these gates in a sequence, which is that no gate can discharge, that is return to its neutral state, until the next gate in the sequence is finished using its result. So, the very first gate in the sequence has to wait until the entire sequence has finished before you can put a new value in. This is inefficient because in the meantime all this hardware is just sitting there doing nothing.
Well, this was a problem in ordinary old combinational logic as well - back then the reason was that you didn't want to change the original input too soon, because it might propagate through the logic and change the output before you were finished using that output - but the solution, which was discovered a long time ago, was pipelining - that is, designing circuits such that you can stuff in a new input while the results of old inputs are still propagating down the pipeline. That way, you can keep all of your logic gates busy all of the time.
So the big problem is how to pipeline these reversible circuits. We'll see in a minute how that can be done.
This works pretty much the same way in SCRL, except that actually you have to be a little bit careful because there can be dissipation in charging up some internal nodes of one of these networks to a level that the transistors in that network aren't suited to handle... But fortunately the problem is easily fixed; anyway we'll get to that later. For now, just take it for granted that you can build NAND or NOR or whatever complex inverting Boolean function you desire. Now, let's get to the pipelines.
This structure is made out of transmission gates. A transmission gate is just a pFET and and nFET wired in parallel, controlled by complementary input signals. It simply functions as a good approximation to an ideal switch. When on, the nFET's channel contains electrons, and can transmit a low voltage, and the pFET's channel contains holes, and can transmit a high voltage. Between the two of them, they are good at transmitting whatever voltage level (at least, within the 0-1 logic range) might be desired.
Anyway, initially the transmission gates are off, and the latch's input (on the upper left) and output nodes are at the neutral level.
Then, this "forward" transmission gate turns on. (Its controls are driven adiabatically.) There's no dissipation across it because the voltages match.
Then, the input, which can come from a gate like the SCRL inverter we saw earlier, adiabatically ramps from neutral to a 0 or 1 level representing an input bit. This change is transmitted through to the output.
Then, the transmission gate is turned off (gradually) and now these two nodes are isolated from each other. Now, whatever charged our input is free to discharge it back to the neutral level.
Sometime later, we will need to reset this latch to the neutral level so that it can accept a new input. This is done through this separate "reverse" pathway over here. The reason is that the original input is no longer available to match the voltage on the latch along this forward path. We'll see in a minute what provides the matched voltage on the reverse path.
Anyway, the reverse path is changed up to a voltage matching the stored voltage. Then, the reverse T-gate is opened, causing no dissipation if the voltages match. Then, the reverse path is returned to the neutral level, pulling the latch output back with it. Now the reverse T-gate closes and the latch is ready to accept a new input.
The bidirectional latch can also be operated in a "static" mode where the output is at all times connected to one or the other input. This is useful when operating close to the minimum speed that is allowed by leakage currents, because it ensures that leakage won't destroy the logic value stored on the node.
Now, let's see how to put it all together.
Then, between these stages are bidirectional latches to hold onto the values so that the earlier stages can begin processing a new input while an old value is still being processed in later stages.
The logic blocks are operated with different phases of power supply signals, so that waves of inputs can propagate down the pipeline with appropriate timing. We'll see in detail how this works in a minute.
But the key interesting new thing is these lower blocks down here. These are constructed so as to compute exactly the inverse function to the function computed in this forward block. This is used to generate a copy of the input to the logic gate based on its output, and thereby match the voltage stored on this prior bidirectional latch, and discharge the value stored on it, restoring that input to the neutral level.
Note that for this to work, the forward function f has to be invertible, because otherwise there's no way to uncompute its input value based on its output.
One interesting feature of these pipelines is that if you were to simply run all the power supply signals in reverse, the data would flow backwards through the pipeline.
However, it takes 6 complete stages, or a multiple of 6, to make a complete feedback loop (sequential circuit) and have the timing of everything line up properly.
Rather than try to run through all those slides here, which is difficult without being able to point to the slides, let me just say to please yourself go through the slides (on reserve) to observe carefully what happens on each step.
Note that the output of the logic stage is available just 1 tick after it appears at the input. So essentially one can have just 6 gate delays per clock cycle, which compares well with what's done today in aggressively pipelined conventional circuits.
The next slide shows a detailed timing diagram which goes with the illustrated sequence. Please compare the timing diagram with the illustrations as you follow along with what happens.
The next slide shows a more complex timing diagram for three consecutive logic stages of another SCRL timing scheme. This version is not as highly pipelined; it has two logic stages per pipeline stage, rather than only one. (However, this makes it easier to compute non-inverting functions; otherwise, you'd have to use dual-rail signalling.) It has 24 ticks per cycle, with an 8-tick "propagation delay" per pipeline stage, which means there must be a multiple of 3 pipeline stages in any feedback loop.
If you use the standard structure, a problem can occur at internal nodes of the pull-down or pull-up network. (See the next slide.) The problem is that if the network is parially but not fully conducting, then these internal nodes will become charged towards the opposite level of the level that the network is able to conduct well. For example, in the slide, the internal node of the pull-down network is pulled high. Except that it cannot be pulled all the way high, because the nFETs in the pull-down network don't conduct high voltage levels - they will turn off once the voltage exceeds a threshold.
This would not be a problem except that the turn-off is not immediate - there is a region in which the nFET conducts more and more poorly, and during this time there will be significant energy dissipation by the charge crossing the increasing IR voltage drop.
This points at another fundamental rule that must always be maintained in adiabatic circuits: Not only must we never turn ON a transistor when there is voltage across it, but we must never turn OFF a transistor when there is a current running through it. Either event leads to dissipation that cannot scale down with speed, and in particular cannot get below kT.
Fortunately, the problem in this case is easy to fix (as shown on the next slide) by adding one extra transistor to the pull-down network. A more general solution, for arbitrary generalized inverters, is to use dual-rail signals and transmission gates everywhere in lieu of single transistors. However, this approach gives yet another area penalty, of about 4. (Twice as many gates to generate the dual-rail signals, and twice as many transistors per gate.)
OK, that's it for now on the low-level aspects of adiabatic transistor logic. Later we'll see some other adiabatic circuit styles, but first, next lecture, we'll talk about what exactly you can do using these invertible logic stages that are required for fully-adiabatic operation.