You are not logged in.

#1 2011-07-02 19:15:04

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Utility for de-wrapping text files?

I have a bunch of text documents that I converted from doc format using wv. They all converted fine, but they're now word-wrapped to 80 spaces, making them difficult to edit in a text editor (especially if I want to use "virtual" word wrapping).

Is there any utility that will let me remove the wrapping, without removing *all* line break characters - i.e. still keeping the spacing between paragraphs?

Edit: duh, it just struck me that I could do this with sed. Unfortunately I am not a regex wizard. How could I tell sed to remove all line breaks, *except* for line breaks followed by or following another line break? E.g.

foo.\nBar baz --> line break is removed
foo.\n\nBar baz --> both line breaks are kept

Last edited by Gullible Jones (2011-07-02 19:19:07)

Offline

#2 2011-07-02 19:24:17

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: Utility for de-wrapping text files?

A naive usage of awk might do the trick...

awk 'NF > 0 { printf "%s",$0; next } { printf "\n\n" }' file > newfile

Offline

#3 2011-07-02 19:54:00

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Utility for de-wrapping text files?

Hmm, that mostly works... Thanks. Also, how could I reduce the amount of whitespace? e.g. some words are separated by two or more spaces, how can I reduce any chain of spaces to only one space?

Offline

#4 2011-07-02 20:01:32

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Utility for de-wrapping text files?

Gullible Jones wrote:

Hmm, that mostly works... Thanks. Also, how could I reduce the amount of whitespace? e.g. some words are separated by two or more spaces, how can I reduce any chain of spaces to only one space?

Have you tried

tr -s ' '

?

Offline

#5 2011-07-02 20:23:01

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Utility for de-wrapping text files?

Ah, thanks! I didn't even know that command existed.

Offline

Board footer

Powered by FluxBB