You are not logged in.
I'm building a podcasting system for a community radio station, and need to archive all the audio that comes off the mixing board into one hour chunks, ideally as MP3. These chunks later get stitched together into 2 hour shows. The problem is we never know exactly which 2 hours will be a show, hence recording everything to one hour sections and stitching together later.
I'm using arecord to record the audio, and it works but there's a slight but noticeable glitch at the join points.
Here's a simplified version of what I'm doing, modified for testing to produce one minute chunks:
for i in 1, 2, 3 ; do
fname="/home/wrybread/streams/RadioValencia."`date +%Y-%m-%d.%H%M`".wav"
arecord --format cd --duration 60 --file-type wav $fname
doneWhen I stitch the files together, whether I use Audacity, Sound Forge or my Python script, I get a slight hiccup at the join point.
My theory is that this is the millisecond period that arecorder is re-initializing. If my theory is correct, maybe there's a way to keep arecorder running, but send its output to new files every hour?
In this thread someone is trying to do something similar, though not concerned with the output stitching together seamlessly, just that the output is complete:
https://bbs.archlinux.org/viewtopic.php?pid=462036
The last post shows a method of recording audio to a buffer, and splitting the output. I couldn't get it to work, my system pauses on the arecord statement and never enters the loop. But maybe that's a promising method?
Thanks for any help.
Offline
A fixed of the version in the post you linked to seems to be working fine:
rm -f /tmp/audiorec
mkfifo /tmp/audiorec
arecord --format cd --file-type raw > /tmp/audiorec &
while true ; do
FNAME=`date +%Y%m%d-%a-%H00_%s`
oggenc /tmp/audiorec -r -o $FNAME.ogg &
sleep 3600
pkill oggenc
doneBut you'll have to run it more times than I did to make sure there's no stitching problems.
Last edited by MadCatMk2 (2011-06-29 07:39:12)
Offline
Thanks, that records, but when I stitch the files together there's a bad gap. Here's a version that records to 3 one minute pieces and then exits:
rm -f /tmp/audiorec
mkfifo /tmp/audiorec
arecord --format cd --file-type raw > /tmp/audiorec &
ARECORD_PID=$!
for i in 1, 2, 3 ; do
FNAME="/home/wrybread/streams/RadioValencia."`date +%Y-%m-%d.%H%M`
oggenc /tmp/audiorec -r -o $FNAME.ogg &
OGG_PID=$!
sleep 60
kill $OGG_PID
done
kill $ARECORD_PIDI'd like to remove oggenc from the setup for testing. Do you know how I'd modify that to simply write wav output, without encoding?
Offline
I don't have a solution unfortunately. I did try to do an extremely similar thing a while ago (recording emergency services radio for investigative purposes) but never got a solution. Obviously even a small block of missing audio would be unacceptable for the purpose.
IIRC, I started looking at piping the audio to a named pipe or FIFO buffer and encoding from that. My theory was that the buffer would cache while the encoder is changing files.
FWIW, the community radio station I used to do some work at had the same recorder you're talking about setup running Windows 95, so surely if Win95 can do it, we can do it on Linux! ![]()
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
Not sure how you'd implement this in the script above (I'm clueless on bash scripting I'm afraid), but why not record a large file and split it afterwards? Will you be recording for several hours?
arecord -f cd -t wav > wat.wavEdit: From the arecord manpage:
--max-file-time
While recording, when the output file has been accumulating
sound for this long, close it and open a new output file.
Default is the maximum size supported by the file format: 2 GiB
for WAV files. This option has no effect if
--separate-channels is specified.I assume they have implemented it right (arecord was written by people who actually know what's going on) - give it a try.
Last edited by MadCatMk2 (2011-06-29 08:33:14)
Offline
I didn't look through the other thread completely but I would go for two audio recording processes (e.g. oggenc, arecord, mencoder, ffmpeg). One which runs lets say for an hour and starts at the full hour and one which starts a minute before each full hour and runs for 62 minutes.
You could set up two cron jobs which start your recording tool plus a timestamp in the filename.
Afterwards your could automatically trim the files with sox and concatenate them. Have a look here:
http://billposer.org/Linguistics/Comput … orial.html
Using Sox to Extract Subparts of a File
The trim effect copies the portion of the input starting at start and ending at start plus length to the output. Both parameters may be specified either as numbers of samples, consisting of an integer followed by the letter s, e.g. "8700s" or a time value. Time values are of the form ((hh:)mm:)ss(.fs). A bare integer is therefore a time value in seconds. For example, suppose that you have a recording 1 hour long and wish to cut it into two halves. The following two commands will leave the first half in Half1.wav and the second half in Half2.wav.
sox Input.wav Half1.wav trim 0 30:00
sox Input.wav Half2.wav trim 30:00 30:00The original file is unaffected, so once you have confirmed that the two output files contain what they should, you may delete the original if you wish to.
Using Sox to Concatenate Files
You can concatenate two or more input files into a single file simply by giving multiple input file names. The following command concatenates Half1.wav and Half2.wav into Full.wav.
sox Half1.wav Half2.wav Full.wav
Using sox should provide an option to pretty much seamlessly merging the overlapping recording after some experiments to determine the right cut borders.
I am not shure how a soundcard deals with two recording programs accessing the same device but as alternative there are soundsystems such as Jack or Pulseaudio. The dumb approach would be using two soundcards ![]()
Last edited by Darksoul71 (2011-06-29 08:52:30)
My archlinux x86_64 host:
AMD E350 (2x1.6GHz) / 8GB DDR3 RAM / GeForce 9500GT (passive) / Arch running from 16GB USB Stick
Offline
fukawi2 - I found your previous thread here:
https://bbs.archlinux.org/viewtopic.php?pid=462036
There were some mistakes in the code sample someone gave you, but MadCatMk2 just posted a fixed version in this thread.
MadCatMk2 - Agreed there should be a way to do record a single giant file, maybe one per day. But the problem is that right after people's shows they type in the hours they want podcasted. And often we don't know those hours before-hand. But maybe its possible to pull whatever hours we need from a giant wav file that's still being written to? But I'm still hoping its possible to record to hour chunks, since we have a lot of infrastructure built around that.
Offline
fukawi2 - I found your previous thread here:
https://bbs.archlinux.org/viewtopic.php?pid=462036
There were some mistakes in the code sample someone gave you, but MadCatMk2 just posted a fixed version in this thread.
MadCatMk2 - Agreed there should be a way to do record a single giant file, maybe one per day. But the problem is that right after people's shows they type in the hours they want podcasted. And often we don't know those hours before-hand. But maybe its possible to pull whatever hours we need from a giant wav file that's still being written to? But I'm still hoping its possible to record to hour chunks, since we have a lot of infrastructure built around that.
Linux enables you to work on files that are currently open (just did it in audacity to make sure) but that's definitely not the best solution as you pointed out.
I edited the post above. See if it helps.
Offline
Agreed there should be a way to do record a single giant file, maybe one per day. But the problem is that right after people's shows they type in the hours they want podcasted. And often we don't know those hours before-hand. But maybe its possible to pull whatever hours we need from a giant wav file that's still being written to? But I'm still hoping its possible to record to hour chunks, since we have a lot of infrastructure built around that.
Hm, people should learn patience ![]()
24 Hours recording in wave format translate to a ~ 14 GB file. To me that is not exactly a big file nowadays where you can grab a 1TB HDD for 40 bucks.
Other options for compression are using FLAC or a lossy compression format in high bitrates. Personally I never noticed a big difference between a edited re-encoded 128 kBit MP3 file encoded from a 256 kBit MP3 versus a plain wave source.
To me the approach with two alternating recording jobs, which overlap seems to be the most elegant one.
My archlinux x86_64 host:
AMD E350 (2x1.6GHz) / 8GB DDR3 RAM / GeForce 9500GT (passive) / Arch running from 16GB USB Stick
Offline
I've been doing more testing and, surprisingly, my Python script is currently doing the best job of recording chunks of audio that stitch together smoothly. It produces both MP3s and WAVs, and the MP3s are glitchy, but the WAVs are really smooth, at least in the tests I've done so far.
Posting my script in case it gives anyone any ideas:
#!/usr/bin/python
_version = ".1"
####################
# SET SOME VARIABLES
####################
# set the output directory for the audio files
output_directory = "streams/"
# encode to MP3 after recording each chunk?
encode_to_mp3 = True
# if encoding to MP3, delete the wav file afterwards?
delete_wav = False
# Length of recordings (choices are minute or hour).
# Set to "minute" if testing.
record_length = "hour"
# if using windows, where's lame.exe? If Linux, make sure lame is installed.
lame = "c:\\lame\\lame.exe"
import threading, time, os, sys, wave
import pyaudio
import process # a somewhat rare module that I use instead of subprocess. Can't find the download link, use radiovalencia.fm/dropbox/process.zip
# write to the log file and console
def write(message):
# compose our time nice and purdy - January 3, 2010 - 3:52pm
month = time.strftime("%B")
day = time.strftime("%d").lstrip("0")
year = time.strftime("%Y")
hour = time.strftime("%I").lstrip("0")
minute = time.strftime("%M")
second = time.strftime("%S")
ampm = time.strftime("%p").lower()
timestamp = month + " " + day + ", " + year + " " + hour + ":" + minute + ":" + second + ampm
if message == "SPACER": message = "\n"
else: message = timestamp + ": " + message
print message
# write it to a file
fname = "sound_recorder_log.txt"
try: contents = open(fname, "r").read()
except: contents = ""
# restrict the log file to X number of characters
limit = 115000
contents = contents[-limit:]
message = contents + message + "\n"
handle = open(fname, "w")
handle.write(message)
handle.close()
# get the filename of the wav file
def get_fname():
# we want it to look like this: RadioValencia.11-06-06.0500.mp3
fname = time.strftime("RadioValencia.%y-%m-%d.%H%M") #%S
fname = os.path.abspath(output_directory + "/" + fname + ".wav")
try: os.makedirs( os.path.dirname(fname) )
except: pass
return fname
class MP3Encoder(threading.Thread):
def __init__(self, wav_fname):
self.wav_fname = wav_fname
threading.Thread.__init__(self)
def encode_to_mp3(self, wav_fname):
wav_fname = os.path.abspath(wav_fname)
options = "-b 192 -h --quiet" # add some more options for better quality? # ubuntu version doesn't support --priority 1
output_fname = os.path.splitext(wav_fname)[0] + ".mp3"
options += " " + wav_fname + " " + output_fname
if "linux" in sys.platform: command = "lame " + options
else: command = lame + " " + options # Windows
time.sleep(1)
write("Starting to encode " + os.path.basename(wav_fname) + " to MP3...")
start_time_local = time.time()
try:
p = process.ProcessOpen(command) # using the process module, somewhat rare, see comment up top for location
p.wait(timeout=2100)
output = p.stdout.read()
total_time = time.time() - start_time_local
total_time = round(total_time, 2)
write("Done encoding to MP3 in " + str(total_time) + " seconds.")
return output_fname
except Exception, e:
write("*** ERROR encoding " + os.path.basename(wav_fname) + " to mp3: " + str(e))
return False
def run(self):
fname = self.encode_to_mp3(self.wav_fname)
# delete the WAV file?
if fname and delete_wav:
try: os.unlink(self.wav_fname)
except Exception, e:
write("*** ERROR deleting " + os.path.basename(self.wav_fname) + ": " + str(e))
############################################################################
# START DOING STUFF
############################################################################
write("SPACER")
write("----- Sound Recorder launched -----")
##########################
# initialize our soundcard
##########################
p = pyaudio.PyAudio()
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
# http://people.csail.mit.edu/hubert/pyaudio/docs/
# for non-default soundcard, set input_device_index
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
# keep track of all hours that have been recorded
all_recorded_hours = []
counter = 1
while True:
# set the wave file's properties
wav_fname = get_fname()
#write("SPACER")
write("Starting to record " + os.path.basename(wav_fname) + " (" + str(counter) + ")")
wf = wave.open(wav_fname, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
# start recording
start_time = time.time()
error = False
while True:
try:
data = stream.read(chunk)
wf.writeframes(data)
except Exception, e:
write("*** ERROR writing wav chunk! " + str(e))
error = True
# for debugging, we can set "record_length" to a minute so we record smaller sections
if record_length == "minute":
current_minute = time.strftime("%S")
else:
current_minute = time.strftime("%M")
#print current_minute
unique_timestamp = time.strftime("%y-%m-%d %H:%M") # a unique timestamp for this hour (with seconds), so we can see if we've already recorded it
# This is where we check to see if an hour has passed. If the minute is "00" and its the first time we've seen 00 for this hour, its safe to say its a new hour. Not the best method since there could be some microsecond overlap, but works for now.
# For testing, change the "00" to some approaching minute. For example if its 2:30, change it to "33" and it'll record until 2:33.
# check to see if we should stop recording this file.
if (current_minute == "00" and unique_timestamp not in all_recorded_hours) or error:
all_recorded_hours.append(unique_timestamp) # so we know we recorded this hour already
break
try: wf.close() # close the wav file
except Exception, e: write("*** ERROR closing the wav file! " + str(e))
write("Done recording " + os.path.basename(wav_fname))
# send it to be encoded to MP3
if encode_to_mp3:
MP3Encoder(wav_fname).start()
counter += 1
stream.close()
p.terminate()
write("Gar, we shouldn't have reached this point...")Offline
It produces both MP3s and WAVs, and the MP3s are glitchy, but the WAVs are really smooth, at least in the tests I've done so far.
There I one thing I currently do not get from your script:
How are MP3 files merged ? Typically MP3 files are "glitchy" / show hickups at merge points if they are not corrected / re-encoded afterwards.
In other words: If your wave files merged together sound smooth but the MP3s not, you should merge the wave files first and encode them afterwards.
Last edited by Darksoul71 (2011-06-29 09:32:54)
My archlinux x86_64 host:
AMD E350 (2x1.6GHz) / 8GB DDR3 RAM / GeForce 9500GT (passive) / Arch running from 16GB USB Stick
Offline
In other words: If your wave files merged together sound smooth but the MP3s not, you should merge the wave files first and encode them afterwards.
I'm thinking that might be the way to go.
In other words, keep the wav files for a few weeks, and have a script that archives anything older than that to MP3. If anyone didn't make their podcast by then they deserve a slight hiccup at the join points...
Offline
Hm, the other approach would be a "re-encode" via ffmpeg.
My Nautilus script I did for merging several MP3 tracks together to one (e.g. make a single MP3 file from a audioplay) looks like this:
cat *.mp3 > tmp_merge.mp3
ffmpeg -i tmp_merge.mp3 -acodec copy final.mp3
rm tmp_merge.mp3Hopefully this is completely right but I can crosscheck this when I return home from work. ffmpeg corrects the merged MP3 files (something about timecode, headers and stuff).
My archlinux x86_64 host:
AMD E350 (2x1.6GHz) / 8GB DDR3 RAM / GeForce 9500GT (passive) / Arch running from 16GB USB Stick
Offline
Hopefully this is completely right but I can crosscheck this when I return home from work. ffmpeg corrects the merged MP3 files (something about timecode, headers and stuff).
Very interesting...
I just did a test, and I still get a little bump at the join point, but its less than I was getting when merging with Python.
But in this code, I don't see how ffmpeg is merging the MP3s... Isn't the merge just happening from cat, headers and all?
cat *.mp3 > tmp_merge.mp3
ffmpeg -i tmp_merge.mp3 -acodec copy final.mp3
rm tmp_merge.mp3Offline
Hopefully this is completely right but I can crosscheck this when I return home from work. ffmpeg corrects the merged MP3 files (something about timecode, headers and stuff).
Very interesting...
I just did a test, and I still get a little bump at the join point, but its less than I was getting when merging with Python.
But in this code, I don't see how ffmpeg is merging the MP3s... Isn't the merge just happening from cat, headers and all?
cat *.mp3 > tmp_merge.mp3 ffmpeg -i tmp_merge.mp3 -acodec copy final.mp3 rm tmp_merge.mp3
Hm, sorry, I didn't describe the process completely.
Yes, you are correct. Cat is merging the mp3 files together. ffmpeg is only doing a stream copy for the resuling binary merged file. You need to do something similar when merging MPEG2 video files. Those can also be merged binary but have incorrect timestamps which get corrected when you do a streamcopy with Mencoder or ffmpeg.
In your case merging MP3 files could still be done in Python but the timecodes inside the MP3 then be corrected by calling ffmpeg.
You might want to check out this:
http://lyncd.com/2009/02/how-to-merge-mp3-files/
Last edited by Darksoul71 (2011-06-29 10:09:58)
My archlinux x86_64 host:
AMD E350 (2x1.6GHz) / 8GB DDR3 RAM / GeForce 9500GT (passive) / Arch running from 16GB USB Stick
Offline
Closing the loop on this extremely old problem... I finally found a turnkey solution in LiquidSoap (http://savonet.sourceforge.net/). There is also rotter (https://www.aelius.com/njh/rotter/)
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline