You are not logged in.
Pages: 1
What at first seemed like a two minutes action turned out to be 3 hours google session with a shattered ego, because I failed to find a solution.
The task looks pretty simple: Remove all empty (i.e. untranslated) messages from a po file. A quick grep -v '\#:.*msgfmt ""\n\n' file.po should have done the job...if grep would work on a per line base. The same applies for sed and awk. I failed at making perl do the job and all hacks involving awk or sed didn't work the way I wanted - one sed line was really close, but a bit too greedy - it searched for the biggest possible match and killed 99% of the file instead of every single occasion of the regexp.
Do you have any idea on how to do it properly? Here's an example file content: (first two are empty, the last one is translated)
#: src/libs/ec/cpp/RemoteConnect.cpp:91
msgid "Invalid password, not a MD5 hash!"
msgstr ""
#: src/libs/ec/cpp/RemoteConnect.cpp:136
msgid "Connection failure"
msgstr ""
#: src/libs/ec/cpp/RemoteConnect.cpp:194
msgid "EC Connection Failed. Empty reply."
msgstr "EC connection failed: empty reply."
Offline
Maybe this script will help http://www.unix.com.ua/orelly/unix/upt/ch27_11.htm
Offline
So lines that are msgstr "" need to be removed? Why did grep -v fail?
Wait I understand it more now (also the two lines before). Hold on.
Yeah grep -v -B 2 'msgstr ""' doesn't do it.
Last edited by Procyon (2008-06-07 18:58:38)
Offline
Does this do it?
sed '/^#:/ {:get_msgid;N;s/"$/&/;T get_msgid;:get_msgstr;N;s/"$/&/;T get_msgstr;s/msgstr ""/&/;T nodelete;d;:nodelete}' file.txt
It makes some excessive blank lines, because if N hits EOF it doesn't work or something, but "cat -s" will get rid of it (except the first one).
Offline
The problem is a non-fixed length, too. Some of these strings are multi-line and therefore I cannot say: ignore the last 2 lines when you encounter an empty string.
Zepp's link looks very promising, I was close to writing myself a simple C program to do the job. (I have to improve my Python or learn some Perl...seriously...)
Edit: Procyon: Your second project looks very promising, thanks a lot! I just have to add a white-space remover and my problem is gone! I'll have to understand the command later.
Edit2: Unfortunately it fails to remove strings similar to the following:
#: src/amule.cpp:971
#, c-format
msgid ""
"Port %u is not available!\n"
"\n"
"This means that you will be LOWID.\n"
"\n"
"Check your network to make sure the port is open for output and input."
msgstr ""
Last edited by wuischke (2008-06-07 19:31:15)
Offline
Oh so that's what multiline looks like. I thought
msgid "foo
bar"
and additional comments confuse it too. Maybe it should be paragraph based.
Offline
I've given up on sed and friends a while ago. I use perl mostly. Here's what I've got:
#!/usr/bin/env perl
use strict;
my @lines;
while(my $line = <STDIN>) {
chomp $line;
# if message is empty, forget about previous lines in same group
if($line =~ /msgstr ""/) {
@lines = ();
} elsif(!$line) {
# if we have a blank line, I'm assuming it's the end of a group
@lines and print join("\n", @lines) . "\n";
} else {
# otherwise, add line to group buffer
push @lines, $line;
}
}
# EOF reached, but there might still be stuff to be printed
@lines and print join("\n", @lines) . "\n";
Offline
Ok, how about this one:
sed -ne ':get_paragraph;H;n;s/^$//;T get_paragraph;x;s/msgstr ".\+"/&/p' file.txt
It gets a paragraph and prints if it has something in the msgstr.
It ignores the last one due to EOF, so give it a blank last line first (echo >> file.txt (two >>'s not one > like I just did))
Last edited by Procyon (2008-06-07 20:10:55)
Offline
bash script: invoke as ./script | tac
#!/bin/bash
skipline=0
tac "file.po" |
while read -r i; do
if [[ "$i" =~ ^msgstr\ \"\"$ ]]; then
skipline=1
fi
if [[ "$i" =~ ^msgstr\ \".+\" ]]; then
skipline=0
fi
if [ $skipline -eq 1 ]; then
continue
fi
/bin/echo -E "$i"
done
I've spent a while on awk but I couldn't find a decent way to do it, even with obscure RS and FS fiddling
edit: maybe I just had an idea for awk
Offline
edit: maybe I just had an idea for awk
nay, I lost it
Offline
awk processing is record-, not line-based. You only have to assing proper value for record separator...
BEGIN{RS=""; ORS="\n\n"}/msgstr ""/{next}{print}
Offline
briest: Argh, I'm feeling very stupid now. Thanks a lot for this information!
I'll walk through the other solutions, too, because there's a lot I can learn. >>I know regexp<< Is not always enough...
Offline
Pages: 1