The coding project has definitely not been dull.
One of the difficult things about doing AI-ish stuff is that as human beings we have a couple of hundred million years worth of evolutionary wetware in our brains to do tasks that are useful to a being that walks around on a plane amongst a profusion of other objects and beings.
We do edge detection on objects in our field of view so quickly we don't even know we do it. Or if we look at a box from above and off to one side, before you can even think about it your brain has already imagined in the sides and bottom that you can't see but somehow "know" are there.
A great example of this for me was last night when I stared for hours, over and over again, at this image on my computer screen:
Our wetware operates at such a low level that it's sometimes hard for us to even access it consciously. I mean, you can probably see what's going on in this graph. Simple, right? But try and put it into words. You have to think for a minute.
So here was my task last night, and it turned out to be a rather difficult one. Make a computer recognize the fact that there's no real activity until about 25, then sustained activity with no pauses until about 105, and then (relative) silence again. Or, to put it another way, chop the graph up into 5 or 10 sample intervals and then tell me if there's activity happening in each interval or not. Easy for the human brain. Considerably harder for a computer program. How do you even start attacking this problem?
I tried a whole bunch of different things. I kept a running average. No dice. I ran several kinds of Finite Impulse Response filters over the thing. (A FIR is a kind of digital filter where you take several past data points' Y values and multiply each of them times a different magic constant, then sum the results.) When that didn't work, I ran an IIR filter on them. (An Infinite Impulse Response filter is a lot like an FIR filter, but you can use the results of previous filter calculations (instead of just data points and weights) to come up with the next filter output value.) Then I tried just a simple running average over the last ten points (which is really just a high-order FIR). All of these failed somewhere. And the reason they did is interesting...
Basically, I'd compute this running average/filter value/etc for the last N points.(Or more in the case of the IIR, since an IIR can theoretically keep adding smaller and smaller pieces of every past data point to every filter output value.) And then I'd take the next data point, and compare its Y value to the value of the average/etc. The idea here being that if the next point in the data stream is far away (for some arbitrarily chosen value of "far away") from the average of points in the recent past, that means the signal is changing and thus active. This worked a lot of the time, but it seemed that there was always eventually a place where bad luck intervened. Where by chance, the value of the next point was within my defined distance of the average/filter output value. For instance, where a cluster of three points grouped up near the peak or trough of a sine-like wave. Or where the points were evenly distributed around the peak or trough of such, and the next point wasn't very far down the slope.
I tweaked and tuned the algorithms. I tried extending the number of points considered back a ridiculous amount - as far as two seconds into the past, which is 40 samples. I tried equal weighting of the samples, weighting to emphasize far and/or recent past samples, randomly throwing away data that I thought looked weird, etc, etc, etc. Damn thing would STILL randomly tell me, once in every ten or fifteen data sets, that there was no activity during a five or ten sample period when the graph was jumping around more enthusiastically than a 5th grader on meth.
(Back to work for another six hours. Feels like it's another day...)
So, long story short, I decided this averaging stuff was stupid and getting me nowhere fast. Eventually I decided that I needed to look at the slope of the curve during the sampling period. If the average slope was greater than some delta, there was activity. If not, no activity. This method has a couple of advantages. First, it's dead easy to compute. Normally you'd compute slope as dx / dy, but in this case dy was always = 1. Buncha math saved right there. Also, since I didn't care about whether the slope was going up fast or down fast, as long as it was going somewhere fast, I could do most of the math in whatever order I wanted and just take the absolute value of things at the end and it'd all come out okay.
And the last consequence of using a method like this that I like is that it seems to be pretty well bi-stable. Meaning, it always goes one way or the other, never lingering in the middle. In the intervals where there's activity, the computed "activity factor" is in the low thousands. In intervals where there's no or only very trivial little jitter activity, the activity factor generally doesn't crack a couple hundred or so. This is pretty good considering the relatively low sampling rate I have to work with. (Only 20 samples a second, and I want to be able to recognize the onset of motion within a quarter second. E.g.: Here's 5 numbers from a sensor. Now tell me if the person holding the sensor is going from still to motion, or just kinda standing there.)
There are still some problems. The accelos are maxing out at certain times when a movement is too strong and/or fast. When the accelo pegs, you get a line of several consecutive samples nailed to the maximum (or minimum) sample value. This looks like a straight line to the algo. A human mind sort of throws a smoother, less square on this "chopped off" part. But the program is still too literal and if the accelo is pegged for too many samples, it says that the accelo is inactive for that period. But hey, that's a hardware problem. ;]
Anyway, all that accomplished, I could get down to making a state machine to watch which trace went active where and when, and react accordingly. The state machine isn't perfect yet. In particular I stupidly assumed that both hands would start moving at the same time when someone did a two-handed gesture. This would mean that the traces start exhibiting "activity" at very nearly the same time. Turns out not to be true. Indeed there can be as much as half a second lag between the start of one hand's movement and the start of the other hand's movement.
Also, the slope/"active" calculation code needs to be put into a seperate module. No reason for mixing that up with the stuff around main(). And if we're going to debug this and extend it to more commands in any resonable time frame, I think I need to make it so that the state machine can be written as some kind of text file external to the program, and parsed and loaded at startup. Right now if you want to tweak the states, you have to go into actual source code and tinker with a big, nasty bunch of crap like this:
case THIS_STATE: do blah;
case THAT_STATE: if(active) do blah else go back to THIS_STATE
But these are relatively trivial problems. Another ten hours of development (no more than another week for sure) and I bet I'll have code to recognize the five most easily recognized gestures. And that'll be pretty darn cool, the use to which the code is going to be put nonwithstanding. And, of course, the money I will get for doing the work is very, very welcome. Retail hell is taking up 40 hours a week of my life, and it's just not paying my bills. Time to get the resume in serious circulation again...