Ben Cantrick (mackys) wrote,
Ben Cantrick

  • Mood:
  • Music:

JWS 0.3 - now with dynamic thread pools!

I've added dynamic thread pools to the JWS, using the java.util.concurrent thread pool APIs. The way these APIs works is not necessarily intuitive, so let me explain a bit. Basically, there's a distinction made between threads (which actually execute tasks), and Runnable objects (which are the tasks to be executed). In the JWS, the main thread creates a Runnable object for each incoming connection. That object goes into a thread-safe queue (the task queue) and awaits a thread that will run it.

And this is where the thread pool comes in. The thread pool (ThreadPoolExecutor) has N threads in it. Its job is to find an idle thread, pull a Runnable object from the task queue, and use the thread to execute the object's run() method. The thread continues to execute run() until it completes and exits. That object is then presumed to be completely processed, so it is thrown away. Also, the thread is idle again. So now the thread pool can recycle that thread to process the next object from the task queue. And so it goes.

Now, I chose to use a dynamic thread pool, which can also do things like noticing if there aren't enough threads to keep up with the objects coming in, and start up more. Or notice that there are way too many threads, and kill some of them. These are all variables you can control when you construct the ThreadPoolExecutor.

(Incidentally, I also used an ArrayBlockingQueue for the JWS's thread-safe task queue, because I wanted a hard upper bound on the number of incoming connections that could be queued before the JWS decided it was too busy and began rejecting new connections.)

When I set out to add dynamic thread pools to the JWS, I didn't think it would be that hard. I was familiar with the Runnable interface from previous programs, and was pretty sure thread pools wouldn't be too much more difficult. As it turns out thread pools aren't hard, but you'd never know that because the docs on Java thread pools really aren't very good! Putting "java thread tutorial" into Google will give you Sun's concurrency tutorial page. This is a great tutorial as far as concepts go, but it's rather light on actual code examples. It also glosses over a lot of practical issues that you run into when you actually start writing thread pools in Java. Particularly wronghead IMO is its recommendation to use the ExecutorService.newCachedThreadPool() to create your dynamic thread pool. While newCachedThreadPool() is a great convenience method, but I think it does a couple of things very wrong...

First and foremost, when you create a dynamic thread pool with newCachedThreadPool(), you don't get to specify either the minimum or maximum number of threads in the pool. The lack of a minimum is not so annoying - it might mean a few hundred ms delay to kick off new threads when the load suddenly spikes. But that lack of an upper bound on the number of threads that can be dynamically created? Absolutely unacceptable in my mind. If someone DoS's your web server and it's written using newCachedThreadPool(), then it could theoretically spawn off an unbounded number of threads! And that's a recipe for thrash 'n crash disaster.

Secondly, there's a related resource starvation issue. newCachedThreadPool() must create a ThreadPoolExecutor internally, but the docs don't say what kind of task queue it gives to that ThreadPoolExecutor. Since newCachedThreadPool() doesn't limit the maximum number of threads, I think it probably doesn't limit the task queue size either. In other words, I suspect it's using a SynchronousQueue which is a dynamic data structure that can grow without bound. So now you're looking at potential unbounded thread growth AND unbounded memory growth!

Now, all this wouldn't be so bad if the docs would just give an example of how to create your own ThreadPoolExecutor objects, instead of relying on newCachedThreadPool(). But no such example is in Sun's tutorial! And such examples are also pretty rare elsewhere. It's not hard to construct your own ThreadPoolExecutor if you understand the concepts. But with no real documentation on what those concepts are... that learning curve gets steep fast. Fortunately, there are a few sources that give examples of how to use ThreadPoolExecutor directly. Including (now) the JWS.

I have one last objection about the thread pool APIs. It's one that will get me called grey haired and crotchety, but that's alright. I enjoy yelling at those darn kids to get off my lawn.

As a resource-conscious embedded systems programmer, I don't like the idea of allocating a new Runnable for each incoming connection. Now, I realize that the way the Socket API works, you have to make a new object for each incoming connection anyway. So complaining that we make not just one but (gasp!) two objects per connection is a little silly. Like complaining that some welds in the hull of the Titanic were weak, after the iceberg had already hit. Still, I feel the need to argue the point. And I'll tell you why: I believe I have a more efficient architecture for doing multi-threaded computing in Java.

Basically, what I'd advocate is that you make one Runnable class. Its run() method is an infinite loop. This run() method grabs objects to process from a BlockingQueue of some sort - easy with the java.util.concurrent classes. It will keep running and processing whatever it's supposed to process, until it receives a ThreadTerminationException or similiar. Then, the thread pool starts up N threads, and M instances of your Runnable class. Finally, the thread pool shares the threads among the runnables. And that's all there is to it. This is, in fact, the exact architecture that I used when I wrote a version of the JWS between .2 and .3, which used a static array of threads and a BlockingQueue of Sockets to do its work.

I like this approach better for two reasons. One is that you're not constantly new'ing and then throwing away any more objects than you absolutely have to. If you new two objects for every connection and the connection rate gets high, garbage collection churn will start to take away cycles from servicing incoming connections. The other thing I like about having a bunch of infinite loops all pulling off the same queue is that it improves locality of cache reference. And that's often a big key to execution speed.

Yeah, yeah, I know - only an old C programmer would be so distrustful of TEH OBJEKT ORIENTATED PARADIGMZ OMG!!!!1!!! to suggest that, hey, maybe we don't have to create a new Runnable object for every single connection just because we can. Just because Java can be used inefficiently doesn't mean we have to use it that way. (Yeah, I know - "That's crazy talk!")

So just be quiet and bring me my walker, sonny! When I wuz your age, we didn't have none of these fancy garbage collectors! We hand-tweaked the microcrode in our floating point co-processors! With a paper clip! And we liked it!!! I remember back in the summer of '87... we had just gotten in a shiny new 386-DX2 40 MHz...


Edit 2007/06/05: Yay, LJ finally beat the DDoSers, so this post now contains the full text as I originally intended!
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.