The subject of this lecture is to go over some miscellaneous points about adiabatic circuits that weren't covered in last week's lectures.
This style is entirely based on transmission gates, and only requires 4 clock signals, and doesn't require the 3rd intermediate voltage level. The slide shows the detailed sequence of events in a simple buffer (identity-delay) gate over the course of a cycle, the timing diagram for the clock signals, the structure of a shift register (buffer pipeline), how to build more complex functions such as AND, and the logic encoding in this scheme.
Unfortunately, this scheme requires 4 electrical signals per logic signal. Bits are encoded by pulses, and in order for an inverter to be possible in this scheme, there has to be a pulse for a zero, in order to trigger the generation of a pulse for a 1. So to distinguish 0 and 1, the pulses have to be on separate lines. Furthermore, each pulse has to consist of one wire that pulses high and a complementary wire that pulses low, in order to be able to turn on the CMOS transmission gates. So, 4 wires per signal. This basically means each logic gate is really 4 separate gates, so there's a considerable increase in area.
Last summer a student and I spent a month on an intensive search for a fully-adiabatic, pipelinable circuit style based on CMOS transistors that would be capable of fewer "transistor-ticks" per inverter operation than is either SCRL or this 4-tick circuit style.
It turned out that an exhaustive search of all the possibilities was intractable even with a large number of optimizations that went into the search algorithm. However, we did manage to eliminate all circuits containing only 5 or fewer transistors.
So, it is possible that there is still some 6 or 7 transistor circuit that beats SCRL (which has 8 transistors per inverter), but at this point I doubt it. I think that with CMOS transistors, SCRL is probably the best you can do.
Actually, you can get to 6 transistors, by replacing the CMOS transmission gates in the SCRL inverter with nMOS pass gates, but although this saves area, it doesn't save power due to the high voltage swings that must be applied to the nFET gate in order to turn it on sufficiently strongly.
So anyway, what I mean is that there's probably no way to beat SCRL using only 0-Vdd voltage ranges and CMOS transistors.
First (slide), it's easy to do a simple ripple-carry N-bit add with a cascaded sequence of 1-bit adder elements, just as is often done in ordinary irreversible processors. A copy of the two input numbers is produced, but you can completely uncompute one of the two inputs (leaving behind just the other input and the sum) by following the add with a near mirror-image circuit, an N-bit subtract.
Unfortunately, when this sort of structure is fully pipelined, it requires order N-squared delay elements so that the inputs are provided at the time they are needed. Therefore it uses order N-squared area, and order N energy per bit processed (because the average bit goes through order N delay stages).
Irreversible circuits don't usually bother pipelining individual bits within the add, but in the strict fully-pipelined SCRL circuit style, you have to pipeline each bit of the add, in order to get the timing right for the uncomputing.
The solution to this problem is to NOT use a strict fully-pipelined scheme like SCRL within your adders, but instead to use an alternative fully-adiabatic mechanism called a "retractile cascade" structure (described by Hall in 1992).
This structure removes the latches between logic stages, applies the inputs, then charges up each of the logic stages in turn, waiting till the last bit of output is produced before reversing the process and discharging all the stages in reverse order.
This is much better because it uses only order-N area and order-1 energy per bit processed (neglecting leakage energy).
The penalty is that the structure isn't pipelined, so you have to wait order-N clock cycles before it is available to be reused for a new add operation. So in terms of transistors * time per N-bit add operation, it is still order N squared. Oh well, it still might be considered better.
The only problem is: How to generate the N nested clock signals? This seems like it could be a pain if N is large; for example, 128 bit adds are common in some modern architectures. (long doubles or long long ints in C are often this size, especially on 64-bit architectures).
You just need 4 full-swing signals with different phases as in the 4-tick-per-cycle pipelined logic I showed earlier, and 4 split-level signals, with their complements, in different phases. (See slide.)
Then, you need a shift-register for pulse signals, exactly as I showed in my slide for the 4-tick-per-cycle logic. A pulse can travel down this pipeline (only dissipating energy where it is present, because elsewhere all transistors are turned off all the time).
Then, you tap lines out of this pipeline at selected points, to gate the split-level signals, so as to generate split/merge transitions at desired times on selected nodes. The pass transistors to those nodes will only be turned on when the pulse is present, then the transition occurs through those transistors, then they are turned off.
The exact timing of a whole long sequence of split/merge events on any number of nodes can be controlled by simply tapping off control signals from the shift register at appropriate points. Essentially, the shift register is counting off time for you.
Anyway, this lets you generate adiabatic circuits with very flexible timing scheme. You can draw as many clock signals as you want and know that you can generate them all easily whenever desired.
I guess the only drawback is the large parasitic capacitance of all the shift register stages onto which the full-swing clock signals are impinging. But this will be roughly comparable to the capacitance you'd have anyway on all the internal nodes that you're driving.
Anyway, armed with the knowledge that such an easy signal generation scheme is available, I also designed a cell for a fully-adiabatic DRAM array, but I haven't drawn up a slide on that one yet.