You are not logged in.
I'm currently writing an interactive awk script which must update certain lines of a file without changing others. I'm not sure if awk is the best tool for the job, but I'm writing it in awk anyways and I'd like an opinion on the best way to go about it. Here are some of the options I'm considering:
1. Write back to the file as the lines are read in. Cons: awk doesn't have very good support for this, and if the program is killed prematurely then data gets lost.
2. Write to a temporary file and copy it back to the original at the end. Cons: I don't know, but it seems like I shouldn't need a temporary file.
3. Call an external program like ed or sed to change each line as I come to it. Cons: lots of calls to external programs.
4. Wait until the end of execution to write back everything to the file at once. Cons: the entire file must be held in memory and I'm not sure how common or harmful that's considered. The files are potentially very big (tens of thousands of lines), but usually much smaller than that. If the program is killed prematurely then the file is not updated.
I'd appreciate any help or opinions on the matter. Thanks.
EDIT: Oh, I probably should have mentioned that stdin and stdout are being used for other purposes, which is why I can't go the usual awk route of not touching the input file at all.
Last edited by fflarex (2010-10-16 23:18:00)
Offline
you can use the sponge tool from moreutils
http://bashcurescancer.com/prepend-to-a … utils.html
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
sed's apparent "inline" editing behavior with -i uses a temporary file behind your back. The advantage is that since its a stream editor, it can write as it reads, making it ideal for large files. While `sponge' from moreutils is a nifty utility, it creates an expanding buffer which stores the data in memory until it reaches EOF and then dumps it back into the file you were reading from. Definitely not ideal for large files.
tl;dr: you can't avoid using a temporary file. Solution #2 is your best bet. It gives you a backup of the original in the process, which is a plus.
oops: didn't see your edit. this is a moot point. You can always create a new file descriptor and tie it to file of your choice for the output of awk. q&d Example:
#!/bin/bash
exec 3>$HOME/somefile
echo "this is a message to stdout"
echo "this is going to stderr" >&2
awk '{print}' <<< "and this is going to my file" >&3
Last edited by falconindy (2010-10-17 14:33:20)
Offline
Looks like I'm going with a temporary file.
I'm disappointed to hear that sed -i actually uses a tmp file. Whatever, it's not a very portable option anyway (I think only gnu sed has it).
I'm confused by your example with file descriptors. How is
exec 3>somefile
awk '{print}' somefile >&3
different than the first option I listed?
awk '{print >somefile}' somefile
Clearly keeping the whole file in memory is not an option with very large files, but I'm still curious if this practice is common for programs that usually deal with smaller files.
Last edited by fflarex (2010-10-18 02:37:34)
Offline
I read your initial post and interpreted your 1st method to be a filthy hack along the lines of:
{ rm file && awk '{ print }' > file; } < file
Which is frightfully disturbing, but will indeed appear to edit the file in place.
I should have clarified my example with exec: somefile is not the file you're editing, but another file. You're making a backup in the process, and if all is well, you overwrite the original file with the newly created.
Really, I think the proper solution here is ed(1).
Last edited by falconindy (2010-10-18 02:34:45)
Offline
Hmm, I'll try it with a temporary file and with something like ed or sam -d and see what I like best.
By the way, I edited the examples in my last post to better reflect what I actually meant; I think you understood just fine though.
Offline