Maximum entropy! Is this "sufficiently" random?

Xyne · 2012-03-28 03:17:58

note: Yeah, it's a long post. I tend to do that when tired. The first part just gives some context. Skip down to the code if you're not interested.

Threads keep popping up about pacman-key "not working" when it's waiting for entropy from /dev/random to generate keys. The advice in those threads is to move the mouse around (if you have Xorg), run some cpu intensive apps, and wait.

I started to wonder if the advice about running apps was actually good. I really don't know how the kernel collects entropy but the little that I've read only mentions mouse, keyboard and disk IO. I did some really basic testing with different commands (CPU intensive, disk IO intensive) to see if they increased the available entropy (/proc/sys/kernel/random/entropy_avail). They didn't. At least not enough to have an immediate impact from what I could tell.

So, what else can we do? The first thing that came to mind was using the microphone as an entropy source. I found audio-entropyd (old and new) but there was no package for it. Instead of creating one I decided to test some some ideas. I spent some time reading up on how to actually inject bits into the entropy pool before I ended up installing rng-tools as recommended in the wiki.

At first I tried piping the microphone output from arecord into rngtest but that failed the FIPS test (the first few times it failed because my microphone was plugged into the headphones jack, but it still failed after that ).

I read something about hashing inputs while searching for ways to increase entropy, so I wrote the following script:

#!/usr/bin/env python3

from time import time
from hashlib import sha512
from sys import stdin, stdout, stderr, argv


h = sha512()

if argv[1:]:
  RATIO = int(argv[1])
else:
  RATIO = h.block_size

buf = stdin.buffer.read(h.block_size)
n = 1
while buf:
  h.update(buf)
  buf = stdin.buffer.read(h.block_size)
  n += 1
  if n >= RATIO:
    stdout.buffer.write(h.digest())
    stderr.write('\r%f' % time())
    h = sha512()
    n = 0

It takes an input stream and hashes it with sha512 at a configurable ratio. Let's call it "hashpipe" for now. Piping the microphone output through it passes rng's FIPS test. Great.

So I tried this:

mkfifo foo
# change hw as needed
arecord -c 1 -f cd -t raw -D hw:1,0 -  |  hashpipe  > foo 

# in another terminal
sudo rngd -f -r foo -W 4096 -t 5

Lo and behold, maximum entropy! The entire pool is refreshed as soon as it's depleted. 4096-bit gpg keys take no time at all.

Now, accepting that my crypto-fu is weak, I ask you... does passing rng-tool's FIPS test mean that the data is "sufficiently" random for secure cryptography? If not, what about if the input/output hash ratio is increased?

I'm thinking that hashing the input stream is either masking its regularity or condensing its entropy. I'm expecting the former as the authors of audio-entropyd do much more to extract the entropy of the audio input, and that project has been around long enough to receive scrutiny. I haven't actually looked at what they do though. I've just seen some messages around the internet. I didn't see any hashing in randomsound, but there is a debias function. Of course, I have no idea if either of those generate "good" entropy to begin with.

Gcool · 2012-03-28 06:03:43

Interesting idea.

I would expect/assume that the input stream from the microphone is random enough to be considered safe/sufficient for collecting entropy. But that's all that is, an assumption. Will leave it up to someone with more insight to provide a definitive answer on that.

Keeping my eye on this thread.

Blµb · 2012-03-28 12:53:58

Definitely an interesting idea.

Whether it is sufficient or not, you can definitely be improved, for example, by only running it when you need new entropy, and then humming, hitting keys on the keyboard, or doing random other things your microphone can detect. After all, even if you keep hitting the same key, the audio data will be different due to noise. The "small" difference caused by noise (and the fact that you don't hit the exact same spot at the exact same speed etc. all the time) will become a big difference in the hashpipe.

You could also use an actual hashpipe and start singing into your mic

A human *can* probably produce better randomness than a HW device built for just that. Unless you ask people to type random stuff... then people start becoming predictable... even if it's only on a big scale. Or they keep typing 'beer beer beer beer beer beer beer WINE beer beer beer beer SCHNAPS'...

I'd generally feel rather bad if it was running in the background all the time, since even the noise in the audio data isn't perfect, after all, you can filter it out "pretty well".

The real question is though: if the actual input fails the tests, does using any modifying functions, like a hash, increase security *at all*?
If your input is only zeros, and you use a hash, the data will still have a pattern, but it's a bigger pattern, and a signle repetition by itself could look like random data, and if the hash is big enough, you might just not *see* the pattern anymore, and your tests fail to recognize it, but it will still be there.

Xyne · 2012-03-28 13:57:53

Blµb wrote:

The real question is though: if the actual input fails the tests, does using any modifying functions, like a hash, increase security *at all*?
If your input is only zeros, and you use a hash, the data will still have a pattern, but it's a bigger pattern, and a signle repetition by itself could look like random data, and if the hash is big enough, you might just not *see* the pattern anymore, and your tests fail to recognize it, but it will still be there.

This is what I'm thinking. I've just tried turning the microphone off and reducing the ratio to 1. Even then it only fails about 1/1000 tests. The rng-tools FIPS test uses a 20000-bit window so it is probably unable to detect any combined periodicity of the hash algorithm and the input.

At the same time, the hashed output bits should all be effected by every entry bit, so as long as there are about as many random entry bits as output bits, it should be relatively random, even if you have to blow on the microphone, hum, etc.

There are probably some improvements that can be made with arecord options too.

I could even use the mouse device to seen a random number generator and use that to permute the input.

If that doesn't work, I'll create a GUI interface using visual basic...

keenerd · 2012-03-28 17:59:31

Cute, but not good enough for generating keys. The bits that go into the entropy pool need to have 100% entropy. While you are collecting entropy from the microphone, simply hashing it does not improve the quality of the entropy.

Usually you go about this by removing the non-random components, making fewer high quality bits. When the input has been pared down to a few bits of 100% quality, these go into the entropy pool.

Instead this is taking those few good bits (mixed in with a large number of worthless bits) and hashing them into more bits, dividing the quality. Even if the hash ratio is high, the quality is still being divided. Let's say out of every 1024 bits there was one bit of high quality entropy. If you collect 128k bits and feed this into sha512, you've taken 128 bits of good entropy and smeared it among 512 bits.

A better source of pure entropy would be the least significant bit of the audio input. That throws out 15 of the 16 bits right there and greatly improves the quality. But depending on the mic's filtering and the resolution of the ADC, there might be long runs of ones and zeros. Maybe there is some bias and 0 comes up more common than 1. So run length encode this bit stream, and use the least significant bit from the RLE. This should probably be a good source of entropy, but it would need to be tested.

Testing true RNGs is kind of involved. The simplest way is to collect a very large number of samples, then make histograms at various bit widths. Given enough samples, the standard deviation of the histograms should be very tight. So at the 1 bit level, 0 and 1 should occur in equal amounts. At the 2 bit level, 0/1/2/3 should occur equally. Taking raw mic input, there would be a very strong peak at 0b00000000 in the 8 bit width (all the leading zeros created by quiet surrounding).

But if you are lazy and want a fairly good quality RNG, just grab a webcam.

alphaniner · 2012-03-28 18:47:23

I just thought anyone interested in this topic might be interested in the new random number generator in the Intel Ivy Bridge line. I never would have imagined that this is something Intel would be spending research dollars on. Live and learn!

Xyne · 2012-03-28 20:12:44

@keenerd
Oh well. Do you have any opinions on the approach taken by audio-entropy or randomsound?

@alphaniner
That is interesting.

Barrucadu · 2012-03-28 23:48:20

You could always source your utterly random data from random.org, they get it from atmospheric noise.

keenerd · 2012-03-29 16:21:02

Both of those apps seem to be doing the correct thing. I'd have to study their source code more closely and get a PhD to give a definitive answer (-:

Xyne · 2012-03-29 17:18:28

@Barracadu
The data itself may be random, but getting it form a third party is insecure. For all we know, it could be an NSA puppet site.

@keenerd
Well, you'd better hurry up then... you know the necrobumping policy. ++

a7 · 2012-04-01 09:47:02

I like seeing clever work like this but it would be remiss of me not to mention havege; specifically, haveged (in community) for this.

Tes-- · 2012-04-01 12:59:52

There actually is an audio entropy daemon: http://www.vanheusden.com/aed/
It uses some other tricks to remove bias, but I couldn't find any papers on it when I searched for them a few years ago. I know it's at least available for Debian (not that that means anything).

Last edited by Tes-- (2012-04-01 13:00:48)

wes · 2012-04-01 20:53:37

keenerd wrote:

The bits that go into the entropy pool need to have 100% entropy.

Not sure what 100% entropy would mean. You mean that each bit is sampled according to the uniform distribution on {0,1}? That's a lot to ask, but I guess there are ways to get close.

keenerd wrote:

While you are collecting entropy from the microphone, simply hashing it does not improve the quality of the entropy.

I don't think "quality" quite applies to entropy. Replace with "amount"? It is an objective measurement like mass. Grams are grams. We just need to figure out how many grams we have. But you are absolutely right hashing does not increase the entropy- it can only reduce it. And thus (as was conjectured), it is not increasing security. It is only fooling some simple statistical tests.

I think the thing you are after is quality of randomness, meaning how far away the distribution (say of the mic data) is from the uniform distribution. In this case, hashing actually does come into play! (Cf. the leftover hash lemma.) But this doesn't really answer our question.

keenerd wrote:

A better source of pure entropy would be the least significant bit of the audio input. That throws out 15 of the 16 bits right there and greatly improves the quality.

It sounds like you know way more about the format than I do, but I would be wary of this statement. Example: for some insane reason, whenever I read the time stamp counter (rdtsc) on my old core 2, I would get a value which was identical modulo 20 until the low-order 32 bits wrapped around, at which point it starts again. So after seeing one reading, the low order digit (base 10 or 20) of the subsequent TSC reads would have zero entropy! Not what I would expect from the TSC, but that was how it worked. Anyway, a better idea than just throwing some out, would be to take an XOR of all the bits. Then, if ANY of them are unpredictable, so is the result.

keenerd wrote:

But if you are lazy and want a fairly good quality RNG, just grab a webcam.

<3 I always loved that one...

Last edited by wes (2012-04-01 22:49:20)

wes · 2012-04-01 21:48:21

Xyne wrote:

Now, accepting that my crypto-fu is weak, I ask you... does passing rng-tool's FIPS test mean that the data is "sufficiently" random for secure cryptography? If not, what about if the input/output hash ratio is increased?

Increasing the input/output hash ratio can be made to work. Again, see the leftover hash lemma, or randomness extractors. The ingredients you would need to make it work in a kind of provable way, would be:

1. A lower bound on how many bits of data are unpredictable from the mic output (say out of every 256 bits, is at least 1 unpredictable?). You might be able to make a really conservative estimate and still have things work out efficiently.
2. A small amount of randomness to run the extractor or the leftover hash lemma construction. Maybe use the output of the non-blocking /dev/urandom ??? There is also work on deterministic extractors that might be useful, but I'm not up to date on it.

Does anyone have insights about #1? I guess this of course depends on the input. If it were a silent room, all bits might be predictable, right? (Note: you don't have to identify the unpredictable bit, I just mean that the probability of some (perhaps very clever) person guessing an entire 256 bit block correctly should be 1/2 or less. And if you think that isn't the case, how big of a block would be required before this is true?)

Looking forward to your replies... I like Xyne's idea. It would be cool to make it work.

(Off topic: All this /dev/random stuff reminded me of one other thing that you might enjoy: http://eprint.iacr.org/2005/029 )

keenerd · 2012-04-02 00:19:46

Not sure what 100% entropy would mean.

It means each bit is completely independent of all the others. Quality does indeed apply. Low quality means that the measured entropy is near 0%, high quality means it is near 100%. For example, all pseudo RNGs have 0% entropy because they can be perfectly predicted. Entropy (in computing) is just a fancy way of saying completely unpredictable information. So in this context, "percent entropy" is the ratio of kolmogorov size divided by input size. Sources of perfect entropy don't exist, so improving the quality (debiasing) of entropy sources is a huge deal.

Personally I avoid the term "high entropy", because it can either mean high quality or high rate depending on context. Though it does look like I need to do a little research on when it is appropriate to use the word entropy vs randomness and I may be using both improperly.

a better idea than just throwing some out, would be to take an XOR of all the bits.

It was just an armchair example of various means of improving the quality at the expense of bandwidth. That is another good general purpose method.

When I mentioned extracting high quality bits, I did not mean in a literal sense of pointing at a 0 or 1 and saying "You, you there are completely random." Rather in the vague sense of a bit as a component of information, discarding all the components that can be predicted or (more specifically) do not add to the Kolmogorov complexity of the random sequence. Hashing (and XORing) muddle which pieces of information contribute to randomness and which do not.

Increasing the input/output hash ratio can be made to work.

The leftover hash lemma just proves that debiasing is possible. It does not let you gain any extra randomness. Randomness extractors are just debiasing by a different name.

I can see your point about how you could technically avoid debiasing with a sufficiently high hash ratio, one that matches the quality of the input. But provable does not mean practical. Overestimate the quality (hash too little) and the generator is no good. Most likely the act of measuring the exact percent quality would require running a real debias function in parallel with the hash.

If you want to learn about collecting entropy, then experiment with it. It is fun and challenging. It was fun and challenging enough just getting up to speed on some of these terms :-) But nothing that I wrote is meant to be actually used. It is too easy to overlook a detail. As an educational thing, I like Xyne's idea. As a practical thing, use one of the several that have been heavily reviewed.

wes · 2012-04-02 01:26:32

keenerd wrote:

It means each bit is completely independent of all the others...

Hmmm. I am refering to "entropy" in the sense of Shannon's information theory, in which case it is just a concrete measurement on a probability distribution. Apologies if I'm not understanding -- I'm new here. Anyway, using my definitions, you can have a funny distribution on strings in which each bit is independent of the others, but it still has very low entropy. Say if the bits are usually 0, but their likelyhood of being 0 has nothing to do with any of their neighbors. In this sense, independent =/=> lots of entropy. Anyway, differences in language aside, I think we actually agree on the meaning: "perfect" <==> the value is sampled uniformly and independently at random from {0,1}. Right?

For example, all pseudo RNGs have 0% entropy because they can be perfectly predicted.

Provided you know the seed. In general, entropy <= size of seed. But again, this is using my definition...

So in this context, "percent entropy" is the ratio of kolmogorov size divided by input size. Sources of perfect entropy don't exist, so improving the quality (debiasing) of entropy sources is a huge deal.

Intuitively I like the definition. Although Kolmogorov complexity is not so easy to compute and also not so easy to estimate. Well, I guess it is easy to come up with upper bounds, but we are interested more in lower bounds, which I think Shannon is more suited towards.

Personally I avoid the term "high entropy", because it can either mean high quality or high rate depending on context.

I think I'm coming from a different background. Perhaps I mistook my world for the universe. Sorry if that's the case.

When I mentioned extracting high quality bits, I did not mean in a literal sense of pointing at a 0 or 1 and saying "You, you there are completely random." Rather in the vague sense of a bit as a component of information, discarding all the components that can be predicted or (more specifically) do not add to the Kolmogorov complexity of the random sequence.

I guess the bit you mentioned about the LSB sent me down that path.

The leftover hash lemma just proves that debiasing is possible. It does not let you gain any extra randomness. Randomness extractors are just debiasing by a different name.

Regarding the extra randomness, that's exactly what I said. Regarding the lemma and extractors, I wouldn't say that it just proves it is possible: it gives you a generic and concrete way to do it. Bonus: it comes with formal statements of what it accomplishes, and proofs that it works! But to be honest, I am unfamiliar with the literature on debiasing. I would be interested in any references if you have them.

But provable does not mean practical.

Unfortunate, but true. But if the intersection is NOT the empty set... win! I looked (briefly) at some of the other libraries out there, and didn't find any that claimed an information-theoretic kind of proof. Meh. In reality, most of this is probably overkill, and the ad-hoc method is probably fine. But people have been burned before, and although we certainly aren't doing anything that dumb nowadays, I still would prefer to be on the safe side if possible.

If you want to learn about collecting entropy, then experiment with it.

I did once upon a time. Maybe I will bring that back to life and share the results (if they don't suck). But I thought Xyne's idea had some promise...

Last edited by wes (2012-04-02 01:32:12)

Arch Linux

#1 2012-03-28 03:17:58

Maximum entropy! Is this "sufficiently" random?

#2 2012-03-28 06:03:43

Re: Maximum entropy! Is this "sufficiently" random?

#3 2012-03-28 12:53:58

Re: Maximum entropy! Is this "sufficiently" random?

#4 2012-03-28 13:57:53

Re: Maximum entropy! Is this "sufficiently" random?

#5 2012-03-28 17:59:31

Re: Maximum entropy! Is this "sufficiently" random?

#6 2012-03-28 18:47:23

Re: Maximum entropy! Is this "sufficiently" random?

#7 2012-03-28 20:12:44

Re: Maximum entropy! Is this "sufficiently" random?

#8 2012-03-28 23:48:20

Re: Maximum entropy! Is this "sufficiently" random?

#9 2012-03-29 16:21:02

Re: Maximum entropy! Is this "sufficiently" random?

#10 2012-03-29 17:18:28

Re: Maximum entropy! Is this "sufficiently" random?

#11 2012-04-01 09:47:02

Re: Maximum entropy! Is this "sufficiently" random?

#12 2012-04-01 12:59:52

Re: Maximum entropy! Is this "sufficiently" random?

#13 2012-04-01 20:53:37

Re: Maximum entropy! Is this "sufficiently" random?

#14 2012-04-01 21:48:21

Re: Maximum entropy! Is this "sufficiently" random?

#15 2012-04-02 00:19:46

Re: Maximum entropy! Is this "sufficiently" random?

#16 2012-04-02 01:26:32

Re: Maximum entropy! Is this "sufficiently" random?

Board footer