# Bitcounting

• Dec 29th 2007, 11:06 PM
SoftwareTester
Bitcounting
I'm looking for a very fast (hopefully 'fastest') function counting bits in a UIn64 (using C#). I read about a routine called 'Hackman 169' but can't find it.

What I do have is the routine for a UInt32 :
static ushort CountBits(UInt32 BitMask)
{ // Counting bits set for UInt32
UInt32 uCount = BitMask;
uCount = (uCount & 0x55555555) + (uCount >> 1 & 0x55555555);
uCount = (uCount & 0x33333333) + (uCount >> 2 & 0x33333333);
uCount = (uCount + (uCount >> 4)) & 0x0F0F0F0F;
uCount = uCount + (uCount >> 8);
uCount = (uCount + (uCount >> 16)) & 0x0000003F;
return (ushort) uCount;
}

I prefer a similar one for UInt64 .

Currently fastest rountine I found is:
static ushort CountBits(UInt64 BitMask)
{
ushort Count = 0;
while (BitMask != 0)
{
Count++;
}
return Count;
}

The number of bits set in my UInt64 values vary from 0 to 8 so the function above will loop 1-9 times.
• Jan 12th 2008, 03:42 AM
SoftwareTester
Solution for fast bitcounting UInt64
Eventually I found the solution myself and as far as i know (I didn't find anything alike on the internet) this now is fastest bitcount routine for UInt64

public static ushort Count64(UInt64 BitMask)
{ // UInt64 (uint) counting :
// Complexity : O(1) constant number of actions
// Algorithm : 64-bit recursive reduction using SWAR
const UInt64 MaskMult = 0x0101010101010101;
const UInt64 mask1h = (~0UL) / 3 << 1;
const UInt64 mask2l = (~0UL) / 5;
const UInt64 mask4l = (~0UL) / 17;
// return (ushort)(BitMask & 0xff);

// Replacement for the >> 8 and >> 16 AND return (BitMask & 0xff)

// Timings 1 << 26 calculations AMD XP2700+ Windows XP
// BitsSet CountLoop Count SpeedUp public static ushort CountLoop(UInt64 BitMask)
// 8 4.75 2.62 81% {
// 16 7.48 2.65 182% ushort Count = 0;
// 24 10.78 2.56 321% while (BitMask != 0)
// 32 13.42 2.68 401% {
// 40 16.44 2.60 532% BitMask &= (BitMask - 1);
// 48 19.51 2.59 653% Count++;
// 56 22.33 2.67 736% }
// 64 25.56 2.61 879% return Count;
// Average 15.03 2.62 473% }

}

If anybody finds a faster function or can improve the one above, please let me know.

Jack Renet