You are not logged in.

#1 2014-06-13 15:42:55

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

[Solved] (Python3) Ignoring lines with control characters

Long story short, I log the output of an in-house program with tee. The program displays a progress meter. When I read the file into a list, I end up with lines like:

>>> temp_file[18]
''Write:   1% [>                                                  ] 1 MB\x1b[2K\n'

I can filter these out with:

for line in temp_file:
  if not '\x1b' in line:
    do_something_with(line)

This leaves me with exactly what I want, but maybe someone knows of a better way? Thanks.

Last edited by alphaniner (2014-06-17 16:51:05)


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#2 2014-06-13 15:51:53

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

Perhaps look at string.printable

Scott

Edit: fixed link

Last edited by firecat53 (2014-06-13 16:28:22)

Offline

#3 2014-06-13 16:18:49

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: [Solved] (Python3) Ignoring lines with control characters

Something tells me you didn't test that link. wink


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#4 2014-06-13 16:23:24

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

Oops! Posting from phone :-P
https://docs.python.org/3.4/library/string.html
Sorry!

Edit:

From here:

filtered_string = filter(lambda x: x in string.printable, myStr)

Last edited by firecat53 (2014-06-13 16:31:53)

Offline

#5 2014-06-13 16:45:12

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: [Solved] (Python3) Ignoring lines with control characters

I figured that's where you were going. But I don't want to filter unwanted stuff out of strings, I want to filter out entire strings if they contain unwanted stuff.

IOW, it should go ding when there's stuff. big_smile


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#6 2014-06-13 17:17:30

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

Here's a one-liner. Might be able to use itertools somehow, but I have to leave now smile I guess you could try regex character classes as well. Not sure how this would handle unicode.

[line for line in temp_file if not any([i for i in line if i not in string.printable])]

Offline

#7 2014-06-15 13:38:05

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

TBH, I think your original solution is clearer and to the point -- in a word, Pythonic. If it works just the way you want, leave it.

If you want to filter out all nonprinting characters, I might combine the approaches (untested):

for line in temp_file:
    if any(c not in string.printable for c in line):
        continue
    do_something_with(line)

Just be sure to leave a comment explaining why you're doing such a thing.

N.B. firecat's solution walks through the string creating a list of all the nonprinting characters, then passes that list to any(). Since I left off the [], mine stops the first time it sees a nonprinting character. You might even accelerate the process more by doing "for c in reversed(line)" which will tend to find the nonprinting characters faster when they're near the end of the string.

Last edited by Trent (2014-06-15 21:46:41)

Offline

#8 2014-06-15 16:14:34

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

@Trent thanks! I knew there was a way to stop iteration when the first non-printable character is found. I apparently didn't stare at it long enough to figure it out smile I only moved to string.printable instead of alphaniner's  original solution because it seems more flexible...assuming that at some point there might be other non-printable characters introduced into the input.

Scott

Last edited by firecat53 (2014-06-15 16:14:57)

Offline

#9 2014-06-15 19:53:34

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: [Solved] (Python3) Ignoring lines with control characters

I believe writing it this way might be cleaner:

printset = set(string.printable)
for line in temp_file:
    if printset.issuperset(line):
        do_something_with(line)

| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#10 2014-06-15 20:23:03

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

You could do it that way, but I think it begins to fall afoul of "Readability counts".

Speaking of which, Guido's time machine pulls through:

for line in temp_file:
    if line.isprintable():
        do_something_with(line)

Offline

#11 2014-06-15 20:43:34

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: [Solved] (Python3) Ignoring lines with control characters

Trent wrote:

You could do it that way, but I think it begins to fall afoul of "Readability counts".

In my book printableset.issuperset is more readable than an any(for c in line), but that might be my mathematical knowledge about sets.

Speaking of which, Guido's time machine pulls through:

for line in temp_file:
    if line.isprintable():
        do_something_with(line)

That is right. python3 finally has this function. I am still partially thinking in python2.7
Edit: Take care, since str.isprintable interprets tabs and newlines as non-printable. Maybe do str.expandtabs(1).isprintable().

Last edited by progandy (2014-06-15 21:38:19)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#12 2014-06-15 21:32:08

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

I love Python smile So many ways to skin the proverbial cat! This is a great learning thread...

Scott

Offline

#13 2014-06-15 21:46:56

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

progandy wrote:

In my book printableset.issuperset is more readable than an any(for c in line), but that might be my mathematical knowledge about sets.

To each his own. A string is not a set and I would never ask a question like "Is the set of characters in this string a subset of the set of characters in string.printable?" On the other hand, "Is it true, for any character c in this string, that c is not in string.printable?" seems relatively straightforward. But that's a subjective measure.

Edit: Take care, since str.isprintable interprets tabs and newlines as non-printable. Maybe do str.expandtabs(1).isprintable().

Hrm. Yuck. No, in that case I'll fall back to my original version. Disclaimer removed.

Actually, on second thought, I don't know why I went for any() in the first place, since its complement/counterpart is also provided:

for line in temp_file:
    if all(c in string.printable for c in line):
        do_something_with(line)

Last edited by Trent (2014-06-15 21:51:05)

Offline

#14 2014-06-15 22:56:50

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: [Solved] (Python3) Ignoring lines with control characters

Trent wrote:

Actually, on second thought, I don't know why I went for any() in the first place, since its complement/counterpart is also provided:

for line in temp_file:
    if all(c in string.printable for c in line):
        do_something_with(line)

Didn't you go for it because the for loop would terminate after the first non-matching character? Possibly speeding it up a little? Or does any have to wait for the entire result from the for loop first?

Scott

Last edited by firecat53 (2014-06-15 22:57:44)

Offline

#15 2014-06-16 02:18:19

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

That's true for all() as well. If the first character in line is nonprinting, all() won't continue to request more from the iterator because it already knows the final result.

>>> i = iter('hello, world')
>>> all(c in 'ehlo' for c in i)
False
>>> ''.join(i)
' world'

You could still do the reversed(line) trick, but on the whole it probably buys you very little.

Edit -- I wasn't very clear earlier. The difference between my solution and your earlier one is the lack of square brackets around the comprehension expression. If you were to write instead of the above

all([c in 'ehlo' for c in i])

then Python would construct a list [True, True, True, True, True, False, False, False, True, False, True, False] and hand the entire thing to all(). Without the brackets, Python creates a generator, which is lazy -- it doesn't calculate whether the next value is True or False until it's needed.

Last edited by Trent (2014-06-16 02:27:39)

Offline

#16 2014-06-16 14:50:36

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: [Solved] (Python3) Ignoring lines with control characters

Thanks for all the suggestions everyone! Regarding reversal, I had thought of that as well but in a different context:

for line in temp_file:
  if line.rfind('\x1b', -5) < 0:
    do_something_with(line)

I guess that's rather unpythonic, but only ~20 of ~1000 will pass the test and I'm an efficiency fiend.

Everything else is new to me so I'll have to do some research and mull things over.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#17 2014-06-16 23:18:37

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

Slicing is quicker (in this case) and more pythonic:

if '\x1b' in line[-5:]:

But then perhaps you might as well just

if line[-5] == '\x1b':

Personally, I think that's fine, if it works and you're not expecting the in-house program to change its output format. Don't do more work to no advantage. But if you think you might need it to be more flexible, by all means go for one of the other suggestions.

Offline

#18 2014-06-16 23:37:03

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: [Solved] (Python3) Ignoring lines with control characters

Trent wrote:

Personally, I think that's fine, if it works and you're not expecting the in-house program to change its output format. Don't do more work to no advantage. But if you think you might need it to be more flexible, by all means go for one of the other suggestions.

You have the right mindset. Just document that stuff, otherwise you won't understand your code if you ever have to change it lateron.


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#19 2014-06-17 16:50:05

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: [Solved] (Python3) Ignoring lines with control characters

Trent wrote:

Slicing is quicker (in this case) and more pythonic:

if '\x1b' in line[-5:]:

But then perhaps you might as well just

if line[-5] == '\x1b':

Personally, I think that's fine, if it works and you're not expecting the in-house program to change its output format. Don't do more work to no advantage. But if you think you might need it to be more flexible, by all means go for one of the other suggestions.

Doh, I didn't even consider slicing. Definitely clearer and more pythonic, though I'll have to take your word about quicker. If I use the index it will be necessary to check the length:

if (len(line) < 5 or line[-5] != '\x1b':

for now I'll just stick with the slice to be safe.

Thanks again to everyone.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#20 2014-06-18 00:32:30

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [Solved] (Python3) Ignoring lines with control characters

alphaniner wrote:

Doh, I didn't even consider slicing. Definitely clearer and more pythonic, though I'll have to take your word about quicker.

I know this thread is approaching EOL, but you don't have to take my word for it.

% python -m timeit -s "s='hello, world'" "'\x1b' in s[-5:]"
10000000 loops, best of 3: 0.194 usec per loop
% python -m timeit -s "s='hello, world'" "s.rfind('\x1b', -5)"
1000000 loops, best of 3: 0.304 usec per loop

(using timeit)

If I use the index it will be necessary to check the length

Oh, true. Hadn't thought of that.

Offline

#21 2014-06-18 01:01:02

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: [Solved] (Python3) Ignoring lines with control characters

Trent wrote:

If I use the index it will be necessary to check the length

Oh, true. Hadn't thought of that.

Then slice a single character:

$ python -m timeit -s "s='hello, world'" "'\x1b' in s[-5:]"                 
1000000 loops, best of 3: 0.601 usec per loop
$ python -m timeit -s "s='hello, world'" "s.rfind('\x1b', -5)"              
1000000 loops, best of 3: 0.817 usec per loop
$ python -m timeit -s "s='hello, world'" "s[-5:-4] == '\x1b'"
1000000 loops, best of 3: 0.546 usec per loop

| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

Board footer

Powered by FluxBB