home / posts


Low-depth Circuits for Computing the Hamming Weight

Suppose we have a binary string with $n$ bits. In this post, I will look at constructing (Boolean) arithmetic circuits that compute the hamming weight of a bitstring such that the circuits have minimal multiplicative depth. More precisely, we will develop algorithms handling arbitrary $n$ that generate circuits to compute the hamming weight of a bitstring with $n$ bits. Our circuits will consist of XOR and AND gates i.e. the Boolean case of arithmetic circuits. I have updated SCDF recently to include the algorithms described here and the code can be found on the SCDF github repository. So the code here is in Python, leveraging the SCDF framwork.

The asymmtotic depth of an optimal hamming weight circuit is $O(\log n)$. Both approaches I take here achieve this but have different constants. The first approach, which I will now describe, is less efficient in terms of depth but more efficient in terms of circuit size.

First Approach

Our first approach is probably the natural top-down construction. We use a Wallace-like tree structure that consists of adders. For simplicity, in this first approach, we will only handle an input bitstring whose length is a power of 2. Here is the code.

from base import *
import arith
import utils

def hamming_weight_add(v):
    n = len(v)
    if not utils.is_power_of_2(n):
        raise 'Length of vector is not a power of 2'
    elems = map(lambda x: [x], v)
    while n >= 2:
        n /= 2
        for i in range(n):
            elems[i] = arith.ripple_add(elems[2*i], elems[2*i + 1])
    w = elems[0]

    return w

As usual with SCDF, we import from the base module, but here we also rely on the arith module and utils module which you can find in the SCDF repo. Now before computing the depth, we reduce the depth of the circuit by running scdf.reduce_depth. We will consider the depth of the circuit computing the most significant bit of the hamming weight. Invariably, we see that this is $2\log{n} - 1$ where $n$ is the number of input bits. However, we can do better than that, and this brings us to our next algorithm.

Second Approach

This next approach is based on the idea that the $i$-th bit of the hamming weight can be computed as the XOR of all AND combinations of $2^i$ input bits. To see this, suppose we have 3 bits; $a, b$ and $c$. Now the hamming weight is the sum of the first 2 bits added to the third bit. Easily we see that bit 0 is the XOR of the 3 bits. Now suppose we add the first two bits. Bit 0 of this sum is the XOR of the first two bits i.e. $a + b$. Bit 1 of this sum is the carry of these first two bits i.e. both bits ANDed, $ab$. Now we add the third bit. Bit 1 of the final sum is $ab + c\cdot (a + b) = ab + ac + bc$, hence all AND combinations of the input bits. This naturally extends for a greater number of input bits.

Our algorithm builds upon and generalizes an approach by Cetin, Doroz, Sunar and Savas and handles vectors of any length. It builds up the products three at a time, attempting to minimize gates. Here is the code.

from base import *
import arith
from math import log
import utils

def hamming_weight_direct(v):
    c = v
    result_len = int(log(len(v), 2)) + 1
    print len(v), result_len
    result = [zero] * result_len
    for i in range(result_len):
        s = []
        n = len(c)
        rem = n % 3
        left_over = c[-rem:]
        for j in range(0, n - rem, 3):
              s.append(c[j] + c[j + 1] + c[j + 2])
        if rem != 0:
            s.append(sum(left_over, zero))

        result[i] = sum(s, zero)

        next_c = []

        for j in range(0, n - rem, 3):
            next_c.append(c[j]*c[j + 1] + c[j]*c[j + 2] + c[j + 1]*c[j + 2])
        if rem == 2:
            next_c.append(c[-1]*c[-2])
        for j in range(len(s)):
            t = zero
            for k in range(j + 1, len(s)):
                t += s[j]*s[k]
            next_c.append(t)

        c = next_c

    return result 

If you run scdf.reduce_depth on this circuit followed by scdf.deduplicate to remove duplicate subcircuits then obtain the gate count, you will find that this approach has a higher gate count than the previous approach for the same $n$. However, this approach yields a smaller depth of exactly $\log n$.

Raw form (ExtMarkdown)