Log in

No account? Create an account
JWS 0.3 - now with dynamic thread pools! - Adventures in Engineering — LiveJournal
The wanderings of a modern ronin.

Ben Cantrick
  Date: 2007-06-01 18:53
  Subject:   JWS 0.3 - now with dynamic thread pools!
  Mood:der uber-nerd
  Music:MC Plus+ - Syntax and Semantics


I've added dynamic thread pools to the JWS, using the java.util.concurrent thread pool APIs. The way these APIs works is not necessarily intuitive, so let me explain a bit. Basically, there's a distinction made between threads (which actually execute tasks), and Runnable objects (which are the tasks to be executed). In the JWS, the main thread creates a Runnable object for each incoming connection. That object goes into a thread-safe queue (the task queue) and awaits a thread that will run it.

And this is where the thread pool comes in. The thread pool (ThreadPoolExecutor) has N threads in it. Its job is to find an idle thread, pull a Runnable object from the task queue, and use the thread to execute the object's run() method. The thread continues to execute run() until it completes and exits. That object is then presumed to be completely processed, so it is thrown away. Also, the thread is idle again. So now the thread pool can recycle that thread to process the next object from the task queue. And so it goes.

Now, I chose to use a dynamic thread pool, which can also do things like noticing if there aren't enough threads to keep up with the objects coming in, and start up more. Or notice that there are way too many threads, and kill some of them. These are all variables you can control when you construct the ThreadPoolExecutor.

(Incidentally, I also used an ArrayBlockingQueue for the JWS's thread-safe task queue, because I wanted a hard upper bound on the number of incoming connections that could be queued before the JWS decided it was too busy and began rejecting new connections.)

When I set out to add dynamic thread pools to the JWS, I didn't think it would be that hard. I was familiar with the Runnable interface from previous programs, and was pretty sure thread pools wouldn't be too much more difficult. As it turns out thread pools aren't hard, but you'd never know that because the docs on Java thread pools really aren't very good! Putting "java thread tutorial" into Google will give you Sun's concurrency tutorial page. This is a great tutorial as far as concepts go, but it's rather light on actual code examples. It also glosses over a lot of practical issues that you run into when you actually start writing thread pools in Java. Particularly wronghead IMO is its recommendation to use the ExecutorService.newCachedThreadPool() to create your dynamic thread pool. While newCachedThreadPool() is a great convenience method, but I think it does a couple of things very wrong...

First and foremost, when you create a dynamic thread pool with newCachedThreadPool(), you don't get to specify either the minimum or maximum number of threads in the pool. The lack of a minimum is not so annoying - it might mean a few hundred ms delay to kick off new threads when the load suddenly spikes. But that lack of an upper bound on the number of threads that can be dynamically created? Absolutely unacceptable in my mind. If someone DoS's your web server and it's written using newCachedThreadPool(), then it could theoretically spawn off an unbounded number of threads! And that's a recipe for thrash 'n crash disaster.

Secondly, there's a related resource starvation issue. newCachedThreadPool() must create a ThreadPoolExecutor internally, but the docs don't say what kind of task queue it gives to that ThreadPoolExecutor. Since newCachedThreadPool() doesn't limit the maximum number of threads, I think it probably doesn't limit the task queue size either. In other words, I suspect it's using a SynchronousQueue which is a dynamic data structure that can grow without bound. So now you're looking at potential unbounded thread growth AND unbounded memory growth!

Now, all this wouldn't be so bad if the docs would just give an example of how to create your own ThreadPoolExecutor objects, instead of relying on newCachedThreadPool(). But no such example is in Sun's tutorial! And such examples are also pretty rare elsewhere. It's not hard to construct your own ThreadPoolExecutor if you understand the concepts. But with no real documentation on what those concepts are... that learning curve gets steep fast. Fortunately, there are a few sources that give examples of how to use ThreadPoolExecutor directly. Including (now) the JWS.

I have one last objection about the thread pool APIs. It's one that will get me called grey haired and crotchety, but that's alright. I enjoy yelling at those darn kids to get off my lawn.

As a resource-conscious embedded systems programmer, I don't like the idea of allocating a new Runnable for each incoming connection. Now, I realize that the way the Socket API works, you have to make a new object for each incoming connection anyway. So complaining that we make not just one but (gasp!) two objects per connection is a little silly. Like complaining that some welds in the hull of the Titanic were weak, after the iceberg had already hit. Still, I feel the need to argue the point. And I'll tell you why: I believe I have a more efficient architecture for doing multi-threaded computing in Java.

Basically, what I'd advocate is that you make one Runnable class. Its run() method is an infinite loop. This run() method grabs objects to process from a BlockingQueue of some sort - easy with the java.util.concurrent classes. It will keep running and processing whatever it's supposed to process, until it receives a ThreadTerminationException or similiar. Then, the thread pool starts up N threads, and M instances of your Runnable class. Finally, the thread pool shares the threads among the runnables. And that's all there is to it. This is, in fact, the exact architecture that I used when I wrote a version of the JWS between .2 and .3, which used a static array of threads and a BlockingQueue of Sockets to do its work.

I like this approach better for two reasons. One is that you're not constantly new'ing and then throwing away any more objects than you absolutely have to. If you new two objects for every connection and the connection rate gets high, garbage collection churn will start to take away cycles from servicing incoming connections. The other thing I like about having a bunch of infinite loops all pulling off the same queue is that it improves locality of cache reference. And that's often a big key to execution speed.

Yeah, yeah, I know - only an old C programmer would be so distrustful of TEH OBJEKT ORIENTATED PARADIGMZ OMG!!!!1!!! to suggest that, hey, maybe we don't have to create a new Runnable object for every single connection just because we can. Just because Java can be used inefficiently doesn't mean we have to use it that way. (Yeah, I know - "That's crazy talk!")

So just be quiet and bring me my walker, sonny! When I wuz your age, we didn't have none of these fancy garbage collectors! We hand-tweaked the microcrode in our floating point co-processors! With a paper clip! And we liked it!!! I remember back in the summer of '87... we had just gotten in a shiny new 386-DX2 40 MHz...


Edit 2007/06/05: Yay, LJ finally beat the DDoSers, so this post now contains the full text as I originally intended!
Post A Comment | 5 Comments | | Link

Ben Cantrick
  User: mackys
  Date: 2007-06-02 05:01 (UTC)
  Subject:   (no subject)
Dunno. Too many possibilities.
Reply | Parent | Thread | Link

  User: (Anonymous)
  Date: 2010-04-28 23:47 (UTC)
  Subject:   Reg CachedThreadPool

I like your post and analysis of the CachedThreadPool problem.
I have a similar scenario where we are trying to build a tool for load testing web services.

So to implement concurrent processing I am using Cachedthreadpool to handle the no of request coming in.

Say this request number may varies : from few 100 to few thousand...
So do you think if the request goes to the upper limit...the system may crash due to the no of thread being created by CachedthreadPool ?

Waiting for your response...
Reply | Thread | Link

Ben Cantrick
  User: mackys
  Date: 2010-05-03 21:23 (UTC)
  Subject:   Re: Reg CachedThreadPool
> request number may varies : from few 100 to few thousand...
> So do you think if the request goes to the upper limit...the system may crash due to the no of thread being created by CachedthreadPool ?

Depends almost entirely on the hardware. There are systems out there that would handle a couple thousand threads. There are also systems that would crash bad with only a couple hundred threads.

So the question is too vague for me to be able to answer with any accuracy.
Reply | Parent | Thread | Link

Ben Cantrick
  User: mackys
  Date: 2010-05-03 21:34 (UTC)
  Subject:   Re: Reg CachedThreadPool
The relevant code from JWS.java is:

private final int WORKERPOOL_MINTHREADS = 5;
private final int WORKERPOOL_MAXTHREADS = 50;

// Create and initialize the worker thread pool.
try {
workerPool = new ThreadPoolExecutor(
new ArrayBlockingQueue(WORKERPOOL_TASKQ_MAXSIZE));
catch (Exception e)
System.err.println("Can't init thread pool!!");

If you want to limit how many threads there can be at once, just make WORKERPOOL_MAXTHREADS smaller.

If you're concerned about memory usage, make WORKERPOOL_TASKQ_MAXSIZE smaller.
Reply | Parent | Thread | Link

May 2015