Simulating and generating hardware in BitSAD

In stochastic bitstreams 101, we multiplied two SBitstreams manually in a loop. This can be cumbersome for complex functions involving many inputs and outputs. A key feature of BitSAD is the automation of this step which we will explore in the following tutorial. As a bonus, you’ll see how the same principles enable automatic Verilog generation to create hardware for your functions.

Simulating functions on `SBitstream`s

Suppose we have the following function, f, which multiplies two SBitstreams.

using BitSAD

f(x, y) = x * y

x, y = SBitstream(0.3), SBitstream(0.5)
z = f(x, y)

SBitstream{Float64}(value = 0.15)
    with 0 bits.

We see that the output, z, is similar to the previous tutorial. Instead of manually simulating the bit-level multiplication in f, we can use simulatable.

fsim = simulatable(f, x, y)
fsim(f, x, y)

SBitstream{Float64}(value = 0.15)
    with 1 bits.

fsim is a Julia function that can be called similar to f (the exception being that fsim expects the first argument to be the function to simulate, f).

Tip

For static functions like f, it may see redundant to pass f in. But the simulated function can be a callable struct as well. This means that you can modify the struct between invocations of the simulation object if you desire.

BitSAD generates fsim by executing f(x, y) once and storing the program execution on a trace. This trace gets transformed into a similar program except calls to operations are replaced by calls to simulators. These simulators emulate the bit-level execution, similar to multiply_sbit from the previous tutorial.

Let’s verify that fsim works like our manual simulation from before.

num_samples = 1000
foreach(1:num_samples) do t
    push!(z, pop!(fsim(f, x, y)))
end

abs(estimate(z) - float(z))

0.009000000000000008

What’s actually happening inside fsim? We can take a peek under the hood with show_simulatable which will print out the Julia function being compiled by BitSAD.

BitSAD.show_simulatable(f, x, y)

:(function var"##tape_f#376"(x1::typeof(Main.anonymous.f), x2::SBitstream{Float64}, x4::SBitstream{Float64})
      x3 = (BitSAD.getbit)(x2)
      x5 = (BitSAD.getbit)(x4)
      x6 = (*)(x2, x4)
      x7 = (SSignedMultiplier(...))(x3, x5)
      x8 = (BitSAD.setbit!)(x6, x7)
      return x6
  end)

Here, we see that fsim is a function that accepts two SBitstream{Float64}s as input. Walking through each step, we see:

x3 = getbit(x2) pops a sample from the first input (similarly, x5 = getbit(x4)).
The regular *(x2, x4) is called on our input SBitstreams to produce the output SBitstream, x6.
A simulator, SSignedMultiplier is called on the popped bits, x3 and x4.
The resulting SBit, x7, is pushed onto the output bitstream with setbit!(x6, x7).

These four steps are the basic transformation applied to any simulatable operation on the trace.

Single evalutaion

When writing software, it is reasonable to execute the same function twice on the same set of inputs.

g(a, b) = a + b
h(x, y) = g(x, y) * g(x, y)
z = h(x, y)

SBitstream{Float64}(value = 0.6400000000000001)
    with 0 bits.

h calls g(x, y) twice, and as we can see it causes no issues when running the code. In hardware, g is a stateful operator, so it cannot be called twice, since multiple invocations will produce different outputs. Instead, we want to re-use the first evaluation of g(x, y). BitSAD does this automatically.

BitSAD.show_simulatable(h, x, y)

:(function var"##tape_h#377"(x1::typeof(Main.anonymous.h), x2::SBitstream{Float64}, x4::SBitstream{Float64})
      x3 = (BitSAD.getbit)(x2)
      x5 = (BitSAD.getbit)(x4)
      x6 = (+)(x2, x4)
      x7 = (SSignedAdder(...))(x3, x5)
      x8 = (*)(x6, x6)
      x9 = (SSignedMultiplier(...))(x7, x7)
      x10 = (BitSAD.setbit!)(x8, x9)
      return x8
  end)

Examining the compiled function, we see that only a single SSignedAdder is invoked on the inputs. The same resulting bit, x7, is passed to the final SSignedMultiplier.

Applying decorrelation

Recall from stochastic bitstreams 101 that stochastic computing operators exploit the statistical independence of their inputs. But in the previous section, we can see clearly that hsim does not pass independent inputs to the SSignedMultiplier (it’s the exact same bit!). So, we should expect incorrect results

z = h(x, y)
hsim = simulatable(h, x, y)
for t in 1:num_samples
    push!(z, pop!(hsim(h, x, y)))
end

abs(estimate(z) - float(z))

0.16299999999999992

Note that the algorithmic-level of h had no issues (float(z) == 0.64), but the bit-level output has measurable error. BitSAD was designed to make spotting issues that appear at the hardware-level easier. How can we fix this? In stochastic computing circuits, we can decorrelate bitstreams to make them independent.

hfixed(x, y) = g(x, y) * decorrelate(g(x, y))
z = hfixed(x, y)
hfixed_sim = simulatable(hfixed, x, y)
for t in 1:num_samples
    push!(z, pop!(hfixed_sim(hfixed, x, y)))
end

@show BitSAD.show_simulatable(hfixed, x, y)
abs(estimate(z) - float(z))

BitSAD.show_simulatable(hfixed, x, y) = :(function var"##tape_hfixed#380"(x1::typeof(Main.anonymous.hfixed), x2::SBitstream{Float64}, x4::SBitstream{Float64})
      x3 = (BitSAD.getbit)(x2)
      x5 = (BitSAD.getbit)(x4)
      x6 = (+)(x2, x4)
      x7 = (SSignedAdder(...))(x3, x5)
      x8 = (BitSAD.decorrelate)(x6)
      x9 = (SSignedDecorrelator(...))(x7)
      x10 = (*)(x6, x8)
      x11 = (SSignedMultiplier(...))(x7, x9)
      x12 = (BitSAD.setbit!)(x10, x11)
      return x10
  end)

0.051000000000000156

Generating hardware

With BitSAD, we’ve been able to create functions on stochastic bitstreams, and we verified that they should work at the bit-level. The next step is to generate hardware for these functions! BitSAD can take any Julia function and generate synthesizable Verilog code.

Let’s start by creating hardware for f.

f_verilog, f_circuit = generatehw(f, x, y)

We do this by calling generatehw which has a similar syntax to simulatable. It returned two values, f_verilog and f_circuit. f_verilog is a String of the Verilog code. You can write this to disk or examine it in the Julia REPL.

print(f_verilog)

module f (
    input logic CLK,
    input logic nRST,
    input logic   net_2_p, net_2_m,
    input logic   net_3_p, net_3_m,
    output logic   net_4_p, net_4_m
);



// Autogenerated by BitSAD
// BEGIN mult0
stoch_signed_elem_mult_mat #(
        .NUM_ROWS(1),
        .NUM_COL(1)
    ) mult0 (
        .CLK(CLK),
        .nRST(nRST),
        .A_p(net_2_p),
        .A_m(net_2_m),
        .B_p(net_3_p),
        .B_m(net_3_m),
        .Y_p(net_4_p),
        .Y_m(net_4_m)
    );
// END mult0


endmodule

We see that each net has a “_p” and “_m” appended for “plus” and “minus.” Recall, this is because SBitstreams are signed and represented by two channels. Handling these channels correctly to produce a single SBitstream as the output is why our hardware is so much more complex than a single AND gate. BitSAD was created to automate this complexity away.

Tutorials

How To …

Developer Guide

Simulating and generating hardware in BitSAD

Simulating functions on `SBitstream`s

Single evalutaion

Applying decorrelation

Generating hardware

Tutorials

How To …

Developer Guide

Simulating and generating hardware in BitSAD

Simulating functions on SBitstreams

Single evalutaion

Applying decorrelation

Generating hardware

Simulating functions on `SBitstream`s