OK, in the last two weeks we summarized present-day semiconductor technology and some of the factors affecting its future development. We saw that one barrier to further improvement was the fact that power dissipation per unit area (for a single layer of circuits) to first order does not decrease as transistors shrink. As a result, one cannot really take advantage of the third dimension very well when packing circuits together. Just one or a few layers of circuits already dissipates enough power to tax the capabilities of the cooling systems that are reasonably inexpensive today. Although hundreds of layers of circuits could physically fit within a small package (using wafer-thinning and die-stacking techniques, or silicon-on-insulator fabrication technology), the power dissipation problem seems to preclude us being able to get any advantage in computing power from such densely-packed 3-D circuitry.
Moreover, the thermodynamic limits we saw in week 2 imply that any possible computing technology that is based on logically irreversible operations - that erases bits with every operation - will necessarily suffer from the same scaling problem. Namely, it will produce some constant entropy on every logic operation, and this entropy cannot be destroyed but must be physically removed from the machine. And since entropy flux is itself physically limited, assuming limits on temperatures, pressures, and so forth, this means that the total rate of operation of any irreversible computing technology is ultimately limited in proportion to the machine's minimum outer-surface area. We'll go over theoretical scaling limits such as this in more detail later in the course.
But now, let's take a detailed look at an example of a computing mechanism that can actually get around the above problems, at least to some extent. The idea behind this mechanism is to compute in a logically reversible manner, that is without erasing bits, which avoids the fundamental thermodynamic argument requiring entropy generation and order-kT energy dissipation. Further, in virtue of the reversibility of its operation, we'll see that the technique also avoids dissipating the (potentially much larger) order-CV2 dissipation per operation incurred by any ordinary voltage-based switching logic such as standard CMOS. Therefore, the technique can be used to improve the energy efficiency of computation even today; we do not have to wait until circuit energies approach the thermodynamic limit in order for this technique to be useful.
I will describe this technique in terms of present-day CMOS transistor technology, but we'll see later that many of the basic principles of reversible operation can be applied within the context of just about any conceivable technology for implementing low-level computing devices. In particular, the use of transistors, or switches, or voltage-encoding of logic levels are not requirements for the use of reversible principles. Later we'll briefly go through a variety of different reversible technologies based on different mechanism. But we'll start with the CMOS-based reversible technology because it is the most familiar.
Note also that the operation of this circuit is logically irreversible: When the switch isn't connected to either pole, the node over here could have been last connected to either pole; we don't know. It could either hold a high voltage (logic 1) or a low voltage (logic 0). It carries 1 bit of information in its state. But after we connect it to one of the supplies, it is forced to a single level - 1 or 0. This is therefore an example of bit erasure, and thus at least kT ln 2 average dissipation is inevitable for any mechanism that performs such a many-to-one transition. So we will need to somehow abandon many-to-one transitions if we want to get below kT dissipation. But we'll put that requirement aside for the moment.
Now, the more immediate question is, is there any way to change this node's voltage level without CV-squared energy dissipation, by using a different mechanism? The answer is yes. Note that this derivation depends on the assumption that the power is delivered directly from constant-voltage supply. Let's abandon that assumption.
You electrical engineers know there is another frequently-described type of power supply, besides the constant-voltage power supply: namely the constant-current power supply. Let's look at what the dissipation would be in that case. (Slide, see fig. 7.3, p. 172 in course text.) We have a constant-current supply charging this node through some resistance R over a time t. (We imagine that we turn on the supply at time 0, when the node is at voltage 0, and turn it off when the node voltage reaches V, which we define as time t.)
Well, I won't go through the math in detail, but you can see that the dissipation comes out to be this: We still have a CV2 term, but it is multiplied by this factor RC/t. You electrical people know that RC is a rough measure of the time it would take to charge this capacitor through this resistance from a constant-voltage supply. With a constant-current supply, depending on the current, it might take a different time t. If t is much larger than RC, note that this whole expression is much smaller than the CV-squared dissipation in the constant-voltage case. Note that the energy stored on the node itself is still CV-squared. The difference is that in this case, not all the energy is dissipated.
Note also that the operation performed is not a many-to-one transition: If the node starts out at logic level 0, the operation will take it to logic level 1. But if the node were to start out at level 1, the constant current would take it to some higher level! Probably we want to rule this case out and insist that we will never apply this operation to a node if it already contains a 1 (although we might connect the node to an opposite-current power supply if we want to discharge it). In any case, the transformation performed is one-to-one. This takes away the thermodynamic requirement for dissipation.
In fact, although the CV-squared stored energy of the node is still required to be greater than kT for reliability reasons (otherwise thermal noise on the node would randomize its logic level), there is, as far as we have been able to determine, no such fundamental constraint on this dissipation formula CV2RC/t. The dissipation can be less than kT! Of course it is difficult to confirm this experimentally with our present macro-scale measuring tools, but all the theoretical analyses seem to bear this out, for this and for a wide variety of other reversible mechanisms.
Now, let's make this idea a little more concrete. (Slide, fig. 7.4, p. 173 of text.) One way to approximate an ideal constant-current power supply in practice is to use a variable-voltage power supply. Initially the supply voltage matches the voltage on the node, and then increases to the target voltage over time t. The supply voltage will be constant before and after the transition, to approximate connecting and disconnecting the constant-current supply. And we'll replace the resistor with a transistor so that we can control whether the node gets charged up or not, to give us some hope of information processing.
These changes do not radically affect the energy dissipation, and in fact we can show that for large transition times, the dissipation rapidly converges on the ideal current-source dissipation. For small transition times, the dissipation approaches the dissipation in the constant-voltage case, which makes sense, because it is as if we suddenly connected the node to a high-voltage supply.
(One caveat is that if we want to be able to turn this transistor strongly on and off, the voltage range of the signal passed through it cannot be as great as the voltage controlling the gate of the transistor. This is a fundamental property of field-effect transistors but is not a fatal problem.)
In some more detail (slide, fig 7.5, p. 173 of text), when t is large compared to RC, the voltage on the node lags behind the supply voltage by some small amount Vds which is roughly inversely proportional to t. For longer and longer charging times, the node voltage tracks the supply voltage more and more closely, leading to smaller and smaller dissipation. The general name for such situations is "quasistatic"; at any given time the system is close to equilibrium. When t is small, most of the charge is delivered from the constant level V and the situation reduces to the constant-voltage switching case.
Another term that is frequently seen is "adiabatic". Literally speaking, "adiabatic" means "without flow of heat," but it is also commonly used to refer to quasistatic or reversible processes that dissipate very little of their energy - that generate very little entropy.
Again, as with an ideal constant-current supply we have to disallow the case where the node is connected to this supply when it already contained a logic 1. We have to ensure this never happens. If we didn't, we would have a many-to-one transition from "before" states to "after" states, and therefore we couldn't avoid dissipation.
The general rule we have to follow in these circuits is "never turn on a transistor when there is a voltage across it." We might call this "the fundamental rule of adiabatic switching."
A historical note: In the old, old days of relay-based electronics, in the early part of this century, there was a technique known as "dry switching," I believe, which meant, "not closing a relay when there is a voltage across it". This was desirable in some contexts because it avoided sparks and high contact-currents which caused corrosive degradation of the relay contact surfaces.
Today, we know that the "dry switching" concept is also beneficial for avoiding energy dissipation.
However, there is hope that alternative, non-transistor-based electronic techniques based on superconductors, which have essentially zero resistance, could be used to radically decrease dissipation in reversible circuits. We'll see a little bit about superconducting techniques a little later in the course.
This leads to a minimum point for total energy dissipation per operation (slide, fig. 7.12, p. 192). The minimum occurs when the dissipation per op from switching and from leakage are equal.
Unfortunately this becomes more of a problem with present semiconductor scaling trends, because voltages (and thus the height of potential barriers) are decreasing while the temperature remains constant, and leakage currents are as a result increasing rapidly. (When voltages were relatively large, leakage was completely insignificant component of power; with voltages now decreasing, leakage is rapidly becoming much less insignificant.)
One could counter this effect by lowering the operating temperature of the devices, but this is mechanically expensive and incurs its own energy costs. At this point it's still unclear how the pros and cons of reversible techniques will shape up as we approach the limits of semiconductor technology.
Another approach might be to use a different sort of switch that suffers less from leakage. For example, if micro-electro-mechanical switches (MEMS switches) could be made small enough, that might work. They also tend to have low resistance. However at very small scales even they would suffer from tunneling.
But anyway, in the meantime, the reversible techniques offer some degree
of potential energy savings over conventional approaches; we have estimated
today it's on the order of 1000x lower energy dissipation per operation.
This translates into 1000x lower power dissipation per unit performance,
or 1000x higher performance per unit power consumption. But unfortunately
there is a cost in terms of hardware to achieve this higher performance.
Let's look at this more carefully.
Anyway, as we slow down the clock, conventional power decreases simply because we are performing fewer operations per second, while the CV-squared energy per operation stays the same. But in the adiabatic approach, the energy dissipated per operation scales down as well, so the power scales down with the square of the frequency. So at some point, the adiabatic circuit dissipates less power than the conventional one. Eventually it will hit a limit determined by leakage currents, but it ought to get two or three orders of magnitude more efficient than the conventional circuit before then.
(Next slide.) So you might say, it's bad to slow things down,
but if the task being performed can be parallelized more than it was originally,
you can make up for the decreased performance by increasing the amount
of hardware. So the important metric is power per unit performance.
In the conventional circuit it stays constant, in the adiabatic circuit
it decreases. But note that to maintain a fixed power level we have
to increase the amount of hardware in proportion to the clock slowdown.
The overall tradeoff between power, performance, and hardware cost offered by the adiabatic approach is that power P is proportional to R-squared over H, where R is the rate of computation, and H is the hardware cost. So for example, one can reduce total power by 100x by either decreasing total performance by 10x, or by increasing hardware by 100x and maintaining the same performance, or something in between. Or, one can increase performance by 10x within a fixed power budget by increasing hardware by 100x.
This scaling is not ideal, but in extreme situations where the power and/or cooling costs are completely dominant over the cost of the computational components themselves, one might be willing to increase one's budget for transistors in order to pack in a higher total rate of computation for less power.
(a) A very limited power budget, due to limitations on the power supply or cooling system,
(b) The problem that conventional techniques are incapable of achieving the required performance within the given power budget, and
(c) The computational part of the system is a small enough fraction of the total system cost so that the adiabatic hardware multiplier is not a problem.
For increasing performance, the cost of simply supplying 10x as much power has to be greater than the cost of providing 100x more computational components, in order for the adiabatic approach to be beneficial. This is a pretty severe constraint.
Still, I and others are looking around for potential applications in areas such as: Hand-held / wearable / portable devices, implanted medical devices, embedded systems in field equipment, and spaceborne systems, where both power and cooling systems are particularly expensive due to the high weight-related costs of space vehicles.
There are some groups working on applying adiabatic circuit techniques in the near term, but the problem of increasing leakage currents probably means that the technique won't remain viable as we approach the limits of semiconductor technology, at least, not unless low-temperature operation starts to become required for other reasons as well.
But, for alternative types of technology, beyond semiconductors, we will see later that reversible approaches seem to be beneficial.
Anyway, in the next couple of lectures we will show how it is possible to build an entire computer based entirely on these adiabatic switching principles.