
Say you're trying to use a computer to factor a number, N. N is large and you want to stop as soon as possible in the case that it's a prime.
If you don't have a lot of memory to work with, then the obvious thing is some kind of optimized trial division. (Which is not the same as  and not to be confused with  the Sieve of Eratosthenes. SoE has a lower time complexity, but requires far more space.)
The first key point about such trial division algorithms is that you should never divide by an even number other than 2. The reasoning for this is, I hope, obvious. In short, after trying to divide N by 2, thereafter only try dividing by odd numbers. (Of course in theory we would like to try only dividing by primes. But if we're in a space constrained situation, we probably don't have a list of all primes between 0 and N sitting around.)
The second key point about trial division algorithms is that they need to stop trying possible factors when they reach sqrt(N). If you've tried every (odd) number up to sqrt(N), and N isn't evenly divisible by any of them, then give up: N is prime.
You can compute sqrt(N) outright, but honestly I wouldn't even do that. Some compilers/languages have an isqrt(N) builtin that gives you the nearest integer larger than, or equal to, sqrt(N). This is exactly what you want. In addition, most isqrt()s are constant run time. (It's usually a small constant, even.)
If for some bizarre reason you don't have isqrt(n), you can synthesize it by using NewtonRapheson (see Wikipedia page on isqrt). However, even doing that may not be worth the trouble. Why? Because to do NewtonRapheson, you need to start the NR process with a value near the root you're trying to find. So one way or another, you're going to need a reasonable estimate of isqrt(N).
So, what are other ways to get a decent estimate of sqrt(N)? Again subject to the constraint that the estimate must not in any circumstances be less than sqrt(N). Hacker's Delight (which you should read) suggests just using the next power of two greater than sqrt(n), and gives four lines of branchless C code to compute it. However, I think we can do better...
Observation: If N contains D digits, then sqrt(N) contains D/2 digits. This gives us order of magnitude.
Then, we can do a table lookup on the first two digits of N:
0003: 2
0408: 3
915: 4
1624: 5
2535: 6
3648: 7
4963: 8
6480: 9
8199: 10
(If you actually implement this algorithm, I suggest you express the above table as a nine item array. 210 are your array indices and 3, 8, 15, 24, etc are the values in the corresponding array slot. Walk the array and compare the value in each slot with your first two digits of N. Stop when first two digits of N are greater, and then go back one index. The index value itself is your table lookup number. Why do it this way? Because we're on a spaceconstrained system. So a worstcase twenty instruction lookup is less bad than a hundred item array.)
Now we take the table lookup number, slap on D/2 zeros, and voila  we have our estimate. And due to the way the table is structured, it's guaranteed to be an overestimate. Exactly what we want.
Example: Estimate the square root of N = 4,294,967,197 which is a prime number near 2^32. sqrt(N) = 65535.999244689937, so our estimate must not be below 65536.
There are 10 digits in N, so there will be 5 digits in the estimate. The first two digits of N are 42, so our lookup is 7. Thus our estimate is 7 * 10e5 = 70000.
How good an estimate is that? 70000 / 65536 = 1.068115..., so our estimate is within 7%. And, as required, not less than the real square root.
If you're still sweating the difference between the estimate and the correct number, then go ahead and do a couple iterations of NewtonRapheson with the estimate. NR converges quadratically, so even three iterations should get you almost an order of magnitude improvement.
There are many ways to optimize the above for binary representation, but I leave that as an exercise to the reader.
(Eh? You want to know an efficient way to determine the number of digits in N? Fine, ya lazy bum, I'll google it for you: http://stackoverflow.com/questions/1489830/efficientwaytodeterminenumberofdigitsinaninteger
What? You can't figure out an efficient way to determine the first two digits of a binary number on a CPU without hardware divide? 2^4 = 16, mang! Find the most significant 1 bit, and the next 3 after that. Those four bits determine the first two baseten digits of the number. Finding most significant 1 bit? Binary search by masking half the bits (the least significant half) and comparing with zero.
Jeez, you ask a lot of questions!)
If you don't have a lot of memory to work with, then the obvious thing is some kind of optimized trial division. (Which is not the same as  and not to be confused with  the Sieve of Eratosthenes. SoE has a lower time complexity, but requires far more space.)
The first key point about such trial division algorithms is that you should never divide by an even number other than 2. The reasoning for this is, I hope, obvious. In short, after trying to divide N by 2, thereafter only try dividing by odd numbers. (Of course in theory we would like to try only dividing by primes. But if we're in a space constrained situation, we probably don't have a list of all primes between 0 and N sitting around.)
The second key point about trial division algorithms is that they need to stop trying possible factors when they reach sqrt(N). If you've tried every (odd) number up to sqrt(N), and N isn't evenly divisible by any of them, then give up: N is prime.
You can compute sqrt(N) outright, but honestly I wouldn't even do that. Some compilers/languages have an isqrt(N) builtin that gives you the nearest integer larger than, or equal to, sqrt(N). This is exactly what you want. In addition, most isqrt()s are constant run time. (It's usually a small constant, even.)
If for some bizarre reason you don't have isqrt(n), you can synthesize it by using NewtonRapheson (see Wikipedia page on isqrt). However, even doing that may not be worth the trouble. Why? Because to do NewtonRapheson, you need to start the NR process with a value near the root you're trying to find. So one way or another, you're going to need a reasonable estimate of isqrt(N).
So, what are other ways to get a decent estimate of sqrt(N)? Again subject to the constraint that the estimate must not in any circumstances be less than sqrt(N). Hacker's Delight (which you should read) suggests just using the next power of two greater than sqrt(n), and gives four lines of branchless C code to compute it. However, I think we can do better...
Observation: If N contains D digits, then sqrt(N) contains D/2 digits. This gives us order of magnitude.
Then, we can do a table lookup on the first two digits of N:
0003: 2
0408: 3
915: 4
1624: 5
2535: 6
3648: 7
4963: 8
6480: 9
8199: 10
(If you actually implement this algorithm, I suggest you express the above table as a nine item array. 210 are your array indices and 3, 8, 15, 24, etc are the values in the corresponding array slot. Walk the array and compare the value in each slot with your first two digits of N. Stop when first two digits of N are greater, and then go back one index. The index value itself is your table lookup number. Why do it this way? Because we're on a spaceconstrained system. So a worstcase twenty instruction lookup is less bad than a hundred item array.)
Now we take the table lookup number, slap on D/2 zeros, and voila  we have our estimate. And due to the way the table is structured, it's guaranteed to be an overestimate. Exactly what we want.
Example: Estimate the square root of N = 4,294,967,197 which is a prime number near 2^32. sqrt(N) = 65535.999244689937, so our estimate must not be below 65536.
There are 10 digits in N, so there will be 5 digits in the estimate. The first two digits of N are 42, so our lookup is 7. Thus our estimate is 7 * 10e5 = 70000.
How good an estimate is that? 70000 / 65536 = 1.068115..., so our estimate is within 7%. And, as required, not less than the real square root.
If you're still sweating the difference between the estimate and the correct number, then go ahead and do a couple iterations of NewtonRapheson with the estimate. NR converges quadratically, so even three iterations should get you almost an order of magnitude improvement.
There are many ways to optimize the above for binary representation, but I leave that as an exercise to the reader.
(Eh? You want to know an efficient way to determine the number of digits in N? Fine, ya lazy bum, I'll google it for you: http://stackoverflow.com/questions/1489
What? You can't figure out an efficient way to determine the first two digits of a binary number on a CPU without hardware divide? 2^4 = 16, mang! Find the most significant 1 bit, and the next 3 after that. Those four bits determine the first two baseten digits of the number. Finding most significant 1 bit? Binary search by masking half the bits (the least significant half) and comparing with zero.
Jeez, you ask a lot of questions!)