
/* Copyright (c) 2012, Ben Cantrick All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.  Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ // // intmod(y, x): return y % x, or 1 on error // #include <stdint.h> // Requires C99. int32_t intmod(uint32_t y, uint32_t x) { if(y < x) { return 1; // Dumbass caller } if(y == x) { return 0; // Smartass caller } if(x == 2) { return (y & (uint32_t)0x1); // Easy case } uint32_t guess, temp; guess = x; while(guess < y) { guess = guess << 1; // guess = guess * 2; } if(guess == y) { return 0; } // Lucky! // guess must be > y guess = guess >> 1; // newguess = guess / 2 temp = guess >> 1; // temp = newguess / 2 while(temp > x) { if(guess + temp < y) { guess = guess + temp; } temp = temp >> 1; // temp = temp / 2 } return (y  guess); }
A fully working sample program can be downloaded at http://www.gully.org/~mackys/lj/integer
I created this algorithm as a result of reading a post on Reddit that showed some shockingly poor performance for intrinsic software "%" operator in some C compilers for microcontrollers which don't have hardware divide.
How does the algorithm work? Well, the mathematical definition of modulo is:
y % x = y  (x * int(y/x))
The trick this algorithm employs is to do a kind of weirdo binary search to find
(x * int(y/x))
, without ever directly computing int(y/x)
, or using division.The first loop keeps multiplying
x
by 2 until it's larger than y
. We call this guess
. Once guess
becomes larger than y
, we stop and divide guess
by 2, essentially backing up one step. At this point we know that (x * int(y/x))
is between guess
and guess * 2
.To actually compute
(x * int(y/x))
, the second loop goes backward and keeps adding smaller poweroftwo multiples of x
(held in temp
) to guess
. But it checks each time to make sure that (guess + (x * 2^n)) < y
, and does not do the addition if this would be the case. It keeps going until n = 1, at which point we have added all appropriate (2^n * x)
terms to guess. The resulting number in guess
is exactly (x * int(y/x))
. And now it's a simple matter to get the modulus by doing (y  guess)
. Quickly Ends Dat.All aforementioned divisions and multiplications are accomplished with bit shifts, which are fast operations implemented by hardware on every CPU I've ever seen. (Unlike integer div or modulo.) They also have the nice property that most CPUs can bit shift through a CPU flag or register, allowing the use of multipleCPUword size integers without a large performance penalty.
The code does have some notable limitations. First of all, it expects unsigned ints only. Mathematically it is allowable to take the modulo of negative numbers, but this code won't do that. Since the function's arguments are defined as unsigned, your compiler should throw a warning or error if you attempt to pass in negative numbers. I'm sure the function could be extended to handle negative numbers, but I don't think I've done a modulo on a negative number in 20 years of writing code, so I'm not going to bother. Secondly, the function doesn't work with floats, doubles or any other exponent/mantissa numeric representation, because bitshifting those doesn't necessarily result in a multiply or divide by 2.
Performance analysis: The first while loop will run at most log2(y/3) times (when x=3), and the same is true of the second loop. 2 * log2(y/3) translates to O(log2(y)). The average case will be much faster than this worst case. Analysis with
gprof
and gcov
on the sample program shows 51 lines of code executed to compute 31952 % 99. I don't have valgrind installed so I can't give better numbers. If someone wants to do a real "number of instructions executed" analysis, send me a copy so I can put it up here.Is it possible to do this faster? I strongly suspect so. As the Wikipedia link given below notes, it is possible to reduce (y % x) to (y & (x1)) when x is a power of 2. Seems to me that trick should be exploitable to bring the time to compute modulo down even farther. But in any event, the above unsophisticated and unoptimized algorithm should still be a very large improvement over modulo implemented via a software divide routine... as some people's compilers are probably doing now.
See also:
http://en.wikipedia.org/wiki/Modulo_ope
http://graphics.stanford.edu/~seander/b
http://stackoverflow.com/questions/2566