Stochastic bitstreams 101
A stochastic bitstreams, \(X_t\), is a sequence of samples from a Bernoulli distribution:
where \(p\) is the underlying value being encoded. Stochastic bitstreams are the data format used in stochastic computing.
So, why stochastic bitstreams? You can refer to Gaines’ original work on the topic, but in short, stochastic computing systems tend to be much more area and power efficient compared to traditional binary systems. The reason for this is that any hardware operation on stochastic bitstreams must only process a single bit at a time, and this can often be done in a “streaming” fashion (eliminating the need for sequential logic).
Take multiplication as an example. For numbers represented in floating point binary, multiplication can be an expensive hardware operation. In many ultra-low power systems, floating-point multiplication is evaluated over multiple clock cycles with integer arithemetic units. In contrast, a stochastic computing multiplier is a single AND gate. In the next section, you’ll see why this is the case.
Basics of stochastic bitstreams
First, let’s try creating a stochastic bitstream.
using BitSAD
x = SBitstream(0.3)
SBitstream{Float64}(value = 0.3)
with 0 bits.
Here, we created a SBitstream
(the type in BitSAD for stochastic bitstreams) encoding the real value 0.3. SBitstream
will keep track of the mean of the Bernoulli distribution, which we can recover with float
.
float(x)
0.3
You’ll also notice that there were “0 bits enqueue” in x
. This refers to the fact that the bitstream, x
, is a sequence of samples. Currently, we have not drawn any samples from x
. We can try that now:
xt = pop!(x)
SBit(pos = true, neg = false)
Now, we have a single sample, xt
, which is of type SBit
. An SBit
is a “stochastic bit” which is just a convenient alias for a NamedTuple
with two parts — the positive part (pos
) and the negative part (neg
).
Wait, I thought stochastic bitstreams were a single bit sequence?
— You (probably)
Yes, in theory, but this definition means that we can only represent real numbers \(p \in [0, 1]\). In practice, we would like to represent signed numbers (though we still normalize them to \(p \in [-1, 1]\)). BitSAD uses a two-channel format for encoding signed numbers as two underlying bitstreams. One channel is the positive part and the other is the negative part, such that
Samples from these two separate channels are neatly packaged into a single SBit
so that we can think of SBitstream
s as a sequence of SBit
s without having to worry too much about the underlying signed encoding scheme.
If we want, we can even add SBit
s onto a SBitstream
.
push!(x, xt)
x
SBitstream{Float64}(value = 0.3)
with 1 bits.
We see that x
now has a single bit in queue. For convenience, BitSAD provides generate!
to pre-load a SBitstream
with samples from the underlying distributions.
generate!(x) # add a single sample
@show length(x)
generate!(x, 1000)
x
length(x) = 2
SBitstream{Float64}(value = 0.3)
with 1002 bits.
Finally, we can see that the empirical average over the SBit
s in queue matches to encoded value quite closely.
abs(estimate(x) - float(x))
0.0015968063872255356
Operations on SBitstream
s
So far, we have not computed any meaningful results with BitSAD. Let’s go back to the multiplication example and try to multiply two SBitstream
s.
y = SBitstream(0.5)
z = x * y
SBitstream{Float64}(value = 0.15)
with 0 bits.
The result, z
, has an encoded value of 0.15 = 0.3 * 0.5
. Recall that stochastic bitstreams encode the value in the mean of their underlying distributions. Any function on applied to SBitstream
s is implying a function over their means. Thus,
We can verify this in BitSAD too.
float(z) == float(x) * float(y)
true
So far, we haven’t described how this multiplication is actually executed on hardware. Certainly, multiplying the floating point means then drawing from the resulting distribution would be no better than traditional arithemetic. Stochastic computing takes advantage of the fact that \(X_t\) and \(Y_t\) are independent to note that
In other words, we can multiply the samples at step t
from each sequence to create a new sequence. The mean of this new sequence should match \(\mathbb{E} [Z_t]\). Let’s see it in action.
multiply_sbit(x, y) = SBit((pos(x) * pos(y), neg(x) * neg(y)))
num_samples = 1000
for t in 1:num_samples
xbit, ybit = pop!(x), pop!(y)
zbit = multiply_sbit(xbit, ybit)
push!(z, zbit)
end
abs(estimate(z) - float(z))
0.01100000000000001
We used a helper function, multiply_sbit
to multiply the positive and negative channel of each SBit
separately. This resulted in a new SBit
, zbit
, which we pushed onto z
. When we take the empirical average of all these zbit
s, we see that it is close to the true mean of z
.
Hopefully, you can now see why stochastic computing can be so resource efficient. Each channel of multiply_sbit
only needed to multiply two 1-bit numbers. This can be done with a single AND gate.
In the next tutorial, you’ll see how to automate the SBit
-level simulation we did above, and how to generate synthesizable hardware from a Julia function.