Simulating and generating hardware in BitSAD
In stochastic bitstreams 101, we multiplied two SBitstream
s manually in a loop. This can be cumbersome for complex functions involving many inputs and outputs. A key feature of BitSAD is the automation of this step which we will explore in the following tutorial. As a bonus, you’ll see how the same principles enable automatic Verilog generation to create hardware for your functions.
Simulating functions on SBitstream
s
Suppose we have the following function, f
, which multiplies two SBitstream
s.
using BitSAD
f(x, y) = x * y
x, y = SBitstream(0.3), SBitstream(0.5)
z = f(x, y)
SBitstream{Float64}(value = 0.15)
with 0 bits.
We see that the output, z
, is similar to the previous tutorial. Instead of manually simulating the bit-level multiplication in f
, we can use simulatable
.
fsim = simulatable(f, x, y)
fsim(f, x, y)
SBitstream{Float64}(value = 0.15)
with 1 bits.
fsim
is a Julia function that can be called similar to f
(the exception being that fsim
expects the first argument to be the function to simulate, f
).
Tip
For static functions like f
, it may see redundant to pass f
in. But the simulated function can be a callable struct as well. This means that you can modify the struct between invocations of the simulation object if you desire.
BitSAD generates fsim
by executing f(x, y)
once and storing the program execution on a trace. This trace gets transformed into a similar program except calls to operations are replaced by calls to simulators. These simulators emulate the bit-level execution, similar to multiply_sbit
from the previous tutorial.
Let’s verify that fsim
works like our manual simulation from before.
num_samples = 1000
foreach(1:num_samples) do t
push!(z, pop!(fsim(f, x, y)))
end
abs(estimate(z) - float(z))
0.009000000000000008
What’s actually happening inside fsim
? We can take a peek under the hood with show_simulatable
which will print out the Julia function being compiled by BitSAD.
BitSAD.show_simulatable(f, x, y)
:(function var"##tape_f#376"(x1::typeof(Main.anonymous.f), x2::SBitstream{Float64}, x4::SBitstream{Float64})
x3 = (BitSAD.getbit)(x2)
x5 = (BitSAD.getbit)(x4)
x6 = (*)(x2, x4)
x7 = (SSignedMultiplier(...))(x3, x5)
x8 = (BitSAD.setbit!)(x6, x7)
return x6
end)
Here, we see that fsim
is a function that accepts two SBitstream{Float64}
s as input. Walking through each step, we see:
x3 = getbit(x2)
pops a sample from the first input (similarly,x5 = getbit(x4)
).- The regular
*(x2, x4)
is called on our inputSBitstream
s to produce the outputSBitstream
,x6
. - A simulator,
SSignedMultiplier
is called on the popped bits,x3
andx4
. - The resulting
SBit
,x7
, is pushed onto the output bitstream withsetbit!(x6, x7)
.
These four steps are the basic transformation applied to any simulatable operation on the trace.
Single evalutaion
When writing software, it is reasonable to execute the same function twice on the same set of inputs.
g(a, b) = a + b
h(x, y) = g(x, y) * g(x, y)
z = h(x, y)
SBitstream{Float64}(value = 0.6400000000000001)
with 0 bits.
h
calls g(x, y)
twice, and as we can see it causes no issues when running the code. In hardware, g
is a stateful operator, so it cannot be called twice, since multiple invocations will produce different outputs. Instead, we want to re-use the first evaluation of g(x, y)
. BitSAD does this automatically.
BitSAD.show_simulatable(h, x, y)
:(function var"##tape_h#377"(x1::typeof(Main.anonymous.h), x2::SBitstream{Float64}, x4::SBitstream{Float64})
x3 = (BitSAD.getbit)(x2)
x5 = (BitSAD.getbit)(x4)
x6 = (+)(x2, x4)
x7 = (SSignedAdder(...))(x3, x5)
x8 = (*)(x6, x6)
x9 = (SSignedMultiplier(...))(x7, x7)
x10 = (BitSAD.setbit!)(x8, x9)
return x8
end)
Examining the compiled function, we see that only a single SSignedAdder
is invoked on the inputs. The same resulting bit, x7
, is passed to the final SSignedMultiplier
.
Applying decorrelation
Recall from stochastic bitstreams 101 that stochastic computing operators exploit the statistical independence of their inputs. But in the previous section, we can see clearly that hsim
does not pass independent inputs to the SSignedMultiplier
(it’s the exact same bit!). So, we should expect incorrect results
z = h(x, y)
hsim = simulatable(h, x, y)
for t in 1:num_samples
push!(z, pop!(hsim(h, x, y)))
end
abs(estimate(z) - float(z))
0.16299999999999992
Note that the algorithmic-level of h
had no issues (float(z) == 0.64
), but the bit-level output has measurable error. BitSAD was designed to make spotting issues that appear at the hardware-level easier. How can we fix this? In stochastic computing circuits, we can decorrelate
bitstreams to make them independent.
hfixed(x, y) = g(x, y) * decorrelate(g(x, y))
z = hfixed(x, y)
hfixed_sim = simulatable(hfixed, x, y)
for t in 1:num_samples
push!(z, pop!(hfixed_sim(hfixed, x, y)))
end
@show BitSAD.show_simulatable(hfixed, x, y)
abs(estimate(z) - float(z))
BitSAD.show_simulatable(hfixed, x, y) = :(function var"##tape_hfixed#380"(x1::typeof(Main.anonymous.hfixed), x2::SBitstream{Float64}, x4::SBitstream{Float64})
x3 = (BitSAD.getbit)(x2)
x5 = (BitSAD.getbit)(x4)
x6 = (+)(x2, x4)
x7 = (SSignedAdder(...))(x3, x5)
x8 = (BitSAD.decorrelate)(x6)
x9 = (SSignedDecorrelator(...))(x7)
x10 = (*)(x6, x8)
x11 = (SSignedMultiplier(...))(x7, x9)
x12 = (BitSAD.setbit!)(x10, x11)
return x10
end)
0.051000000000000156
Generating hardware
With BitSAD, we’ve been able to create functions on stochastic bitstreams, and we verified that they should work at the bit-level. The next step is to generate hardware for these functions! BitSAD can take any Julia function and generate synthesizable Verilog code.
Let’s start by creating hardware for f
.
f_verilog, f_circuit = generatehw(f, x, y)
We do this by calling generatehw
which has a similar syntax to simulatable
. It returned two values, f_verilog
and f_circuit
. f_verilog
is a String
of the Verilog code. You can write this to disk or examine it in the Julia REPL.
print(f_verilog)
module f (
input logic CLK,
input logic nRST,
input logic net_2_p, net_2_m,
input logic net_3_p, net_3_m,
output logic net_4_p, net_4_m
);
// Autogenerated by BitSAD
// BEGIN mult0
stoch_signed_elem_mult_mat #(
.NUM_ROWS(1),
.NUM_COL(1)
) mult0 (
.CLK(CLK),
.nRST(nRST),
.A_p(net_2_p),
.A_m(net_2_m),
.B_p(net_3_p),
.B_m(net_3_m),
.Y_p(net_4_p),
.Y_m(net_4_m)
);
// END mult0
endmodule
We see that each net has a “_p” and “_m” appended for “plus” and “minus.” Recall, this is because SBitstream
s are signed and represented by two channels. Handling these channels correctly to produce a single SBitstream
as the output is why our hardware is so much more complex than a single AND gate. BitSAD was created to automate this complexity away.