[SOLVED] Reading sound samples

n0stradamus · 2012-09-30 16:32:40

Hello everyone !

I'm having a small problem with pygame where I get two contradicting results. What I want to do is build a graphic equalizer.
I already had it working, but with lots of packages that provide functionality in a different way than pygame does.
Now I want to see whether it is possible to reduce the overhead.

In order to retrieve the samples from the audio file I relied on the wave package.
As pygame provides that capability as well (pygame.mixer and pygame.sndarray) I want to use these modules.

But when I compare the results I get from the samples, I notice that they differ.
For testing I used a dual-channel wavesound-file with a 44.1kHz sampling rate.

The method I used with the wave package was the following:

# Method used with the wave package
import wave, struct

musicfile = wave.open('test.wav', 'rb')
frames = musicfile.readframes(256)

# convert the bytes into usable numbers
frames = [ struct.unpack("<h", frames[i] + frames[i+1])[0] for i in range(0,len(frames),2) ]
# mix the two channels into one
frames = [ float(frames[i] + frames[i+1])/2 for i in range(0,len(frames),2)) ]

Compared to the code with the pygame package:

import pygame

pygame.mixer.pre_init(frequency=44100)
pygame.mixer.init()

musicfile = pygame.mixer.Sound('test.wav')
frames = pygame.sndarray.samples(musicfile)[0:256]

# pygame samples are an n-channel-tuple, so we'll mix the channels together as well
frames = [ float(frames[i][0] + frames[i][1])/2 for i in range(0, len(frames)) ]

When comparing the two 'frame' variables, I reckon that there are huge differences in value. What might be the reason for this?
I am really at loss here, what might have gone wrong. What I suspect that it has something to with dtype=int16, that is displayed when I check the output of pygame.sndarray.samples.

Any help or hints are greatly appreciated!

Last edited by n0stradamus (2012-10-01 17:54:54)

rockin turtle · 2012-10-01 00:19:16

Well, I've never worked with .wav files, so the following may be incorrect; you'll have to verify this on your own.

As I read things, .wav files may be compressed but the python Wave module only supports uncompressed files, so you should test that your file is uncompressed.

if 'NONE' != musicfile.getcomptype(): sys.exit()

If 'getcomptype' instead returns something like 'PCM' I think you'll still be ok. Since I've never done this, I really don't know what will be returned.

It appears that you are assumming that your file has 2 channels (stereo), but wave files could contain a single channel. It would be better to read the channel count from the file than to assume that it is stereo.

ch_count = musicfile.getchannels()

Now determine how many bytes per sample, bps:

bps = musicfile.getsamplewidth()

Now that you have this info, you can unpack your frames.

According to the spec, each sample is recorded to the file using an even number of bytes. If your bytes per sample is odd (i.e. 1 byte per sample), then an additional 0 byte is padded after each sample. Unfortunately, the python documentation for the wave module doesn't say whether or not they strip these extra bytes out. They just say that the data is returned as a "string of bytes".

So most likely, you have to remove these extra bytes yourself.

I'm not sure that the 'struct' module is really appropriate here, just read the bytes (little endian: least significant byte first) and convert yourself.

from functools import reduce

sample_size = bps + (0 if bps % 2 == 0 else 1)

convert = lambda z: reduce (lambda x, y: x*256+int(y), z[::-1], 0)
sample = [ convert (frames[i:i+bps]) for i in range (0,len(frames),sample_size) ]

The above code removes all extra padding bytes (if there are any). If it turns out that the python wave module has already done this for you, then the above code is wrong. To fix, just change the above line to:

sample = [ convert (frames[i:i+bps]) for i in range (0,len(frames),bps) ]

and it should just work.

Now the list 'sample' contains your .wav data with the channels interleaved.

There is one remaining problem. The .wav file spec says that if the samples are one byte each, they are unsigned quantities; but if the samples are more than one byte each, then they are signed quantities. Again, the python documentation doesn't say whether they have taken care of this for you. Since they are returning a "string of bytes", it seems unlikely that they have done any conversion for you.

The above code has converted all the data assuming that the samples are unsigned values.

If the sample data is multibyte, then we need to convert to signed values. Here, I assume Two's complement integers.

if bps > 1:
    midpoint = [0 for i in range(bps)]
    midpoint[-1] = 0x80
    midpt = convert(midpoint)
    bias = -2 * midpt
    sample = [ s if s<midpt else s+bias for s in sample ]

Now, just average as you desire.

frames = [ float(sum(sample[i:i+ch_count]))/ch_count for i in range(0,len(sample),ch_count) ]

I'm not sure this makes sense if you have more than 2 channels. That's for you to decide.

Edit: spelling

Last edited by rockin turtle (2012-10-01 00:23:41)

n0stradamus · 2012-10-01 16:04:08

First of all, thank you very much for the time you spent to make this long post!

I'm sorry to tell you, that you wrote it in vain
I did not check for the correctness of the way I converted the data from bytes to int.
From the output of the `file' command, which told me I was dealing with 16-bit PCM, I assumed that two consecutive bytes were the value of one channel.
Logically, the next two bytes would be the value of the other channel.
So I didn't really question whether this way was incorrect, but thanks for mentioning it!

The real mistake really had to do with dtype=int16, because python failed to tell me that it was numpy.int16!
With that in mind, the incorrect conversion to a float was easily fixed:

import pygame
import numpy

pygame.mixer.pre_init(frequency=44100)
pygame.mixer.init()

musicfile = pygame.mixer.Sound('test.wav')
frames = pygame.sndarray.samples(musicfile)[0:256]

# pygame samples are an n-channel-tuple, so we'll mix the channels together as well
frames = [ (numpy.asscalar(frames[i][0]) + numpy.asscalar(frames[i][1]))/2 for i in range(0, len(frames)) ]

To verify, I wrote a small script, that does the test automatically, which I'll just post here due to lack of a webspace
It relies on the file to be in a dual-channel, 16-bit, 44kHz .wav format

import struct
import wave
import pygame
import numpy

# first read in as wave
wvfile = wave.open('test.wav','rb')
wvsamps = wvfile.getnframes()
wvraw = wvfile.readframes(wvsamps)
wvdata = [ struct.unpack("<h", wvraw[i] + wvraw[i+1])[0] for i in range(0,len(wvraw),2) ]
wvdata = [ float(wvdata[i] + wvdata[i+1])/2 for i in range(0,len(wvdata),2) ]
wvfile.close()
print('handling data read with wave module complete!')

# then read in as pygame
pygame.mixer.pre_init(frequency=44100)
pygame.mixer.init()
pgfile = pygame.mixer.Sound('test.wav')
pgraw = pygame.sndarray.samples(pgfile)
pgdata = [ float(numpy.asscalar(pgraw[i][0]) + numpy.asscalar(pgraw[i][1]))/2.0 for i in range(0,len(pgraw)) ]
del pgraw, pgfile
print('handling data read with pygame.mixer module complete!')

# then compare the two
if len(pgdata) == len(wvdata):
    print('LENGTH matches!')
    mismatch = False

    for i in range(0,len(wvdata)):
        if wvdata[i] != pgdata[i]:
            print('mismatch...{0} vs. {1}'.format(wvdata[i],pgdata[i]))
            mismatch = True

    if not mismatch:
        print('No mismatches! It worked! :)')

Arch Linux

#1 2012-09-30 16:32:40

[SOLVED] Reading sound samples

#2 2012-10-01 00:19:16

Re: [SOLVED] Reading sound samples

#3 2012-10-01 16:04:08

Re: [SOLVED] Reading sound samples

Board footer