Math question - filtering set of input

woogie · 2006-10-24 01:45:25

Here's my question, which may or may not have an appropriate answer available:

I have a repeatable experiment which gives back a simple integer value greater than zero. Sometimes (and I don't know when) this experiment won't work correctly, and I'll get back a value much higher (order of magnitude higher) than the "correct" result.

Given a set of values that come from repeating this experiment, how would you go about getting the mean of the "correct" values?

My thoughts on this: Calculate the mean and the STD of the given set, then use these values to pare down my full set to a "correct" set of results. Problem here is with a sufficient number of bad results, you could end up including the bad and good results using this method.

I'm assuming that this is a common enough problem there's accepted mathematical means of tackling this that I'm simply not aware of. Can anyone enlighten me?

EDIT: Because I can't spell

Snowman · 2006-10-24 05:50:27

Unless I misunderstood the problem, you could just remove the bad values before calculating the mean. Give yourself a threshold. Then just assume that the results above it are incorrect so don't use them when calculating the mean.

Mandor · 2006-10-24 22:01:52

Well, I am really in programming (other than matlab, duh), but filtering is a thing I often I have to do.

If you don't want to use complicated statistical workout and the bad values are really near an order of magnitude higher, you can make something like a primitive digital filter - reject values that change too much from the previous (or from the mean of the last N correct values). If the bad values are so high some logarithm on all values (just for the criteria) will help you - this introduces too much calculations however, since you'll have to make floating point operations. It depends on the resources you have, but nowdays even embedded systems have no hard time with calculations. In order to add some more confidence that you will not skip a valid value, you can (besides the logarithm) add a constant to the N past correct values (the greatest of them, for example) and the one checked before applying logarithm.

I hope that under 'order of magnitude' you mean to the power of 10, because if it's binary, than this is a quite a white-ish noise (bad thing). You should experiment with one set (apply the algoritm to the whole set) and see what increase is the criterion for invalidity. I suppose something between 1.5 to 1.8 (i.e. 0.5 - 0.8 increase) the mean of the logarithms, but it is too subjective.

This is online algorithms - you can apply it real time, but the poor side is that you need to know at least the order of the valid values, because you need a staring criteria.

Kopsis · 2006-10-31 13:37:14

If you can characterize the "expected" output, then a Kalman Filter (http://www.cs.unc.edu/~welch/kalman/) is often an excellent way to remove noise. Easy to implement, and very computationally efficient.

Arch Linux

#1 2006-10-24 01:45:25

Math question - filtering set of input

#2 2006-10-24 05:50:27

Re: Math question - filtering set of input

#3 2006-10-24 22:01:52

Re: Math question - filtering set of input

#4 2006-10-31 13:37:14

Re: Math question - filtering set of input

Board footer