Separate words in a string without spaces

PeteMo · 2008-06-11 18:11:26

I'm doing some shell scripting and have run into the following problem. Say you have a string like "OneTwoThree", e.g. a string consisting of several words jammed together, with each word starting with a capital letter. You want to insert a space between each word, making the above example "One Two Three". Using perl's split, or awk -F[A-Z] doesn't work because they remove the separator. I've managed via the script below, which is very similar to how I would do this in C, namely, look at each character and if it is a capital letter, then insert a space. My method works, I'm just wondering about other ways to achieve this.

#!/bin/bash

function sep_words() {
    string=
    for ((i=0; i < ${#1}; i++)); do
        char=${1:i:1}
        if [[ $char =~ [A-Z] ]]; then
            string=${string}" "
        fi
        string=${string}${char}
    done
}

for i in $@; do
    sep_words $i
    echo $string
done

As an example, the above script gives the following output:

[pmorris@barium ~] $ ./split.sh OneTwoThree HereIsAString
One Two Three
Here Is A String

kishd · 2008-06-11 18:39:03

I have been learning regular expressions recently and

$ echo "OneTwoThree HereIsAString" | sed -e 's/\([A-Z][a-z]*\)/ &/g'

produces

$ One Two Three  Here Is A String

Last edited by kishd (2008-06-11 18:44:58)

PeteMo · 2008-06-11 19:41:35

kishd wrote:

I have been learning regular expressions recently and
$ echo "OneTwoThree HereIsAString" | sed -e 's/$[A-Z][a-z]*$/ &/g'
produces
$ One Two Three  Here Is A String

Ah, that is much simpler. I struggled with a regex first and couldn't come up with a good one. The only difference I see is that a space is inserted before the first "word". "OneTwoThree" becomes " One Two Three". Not a big deal, though. On a side note, you don't need to use the parenthesis to capture the match when using '&'.

shining · 2008-06-11 21:45:03

Hm, weird stuff, I came up with a similar sed rule, and it is not working :

> echo OneTwoThree | sed 's/\([A-Z]\)/ \1/g'
 O n e T w o T h r e e

Works fine with C locale though :

> echo OneTwoThree | LANG=C sed 's/\([A-Z]\)/ \1/g'
 One Two Three

So fun locale stuff again (I am using fr_FR.utf8).

carlocci · 2008-06-12 12:59:44

shining wrote:

Hm, weird stuff, I came up with a similar sed rule, and it is not working :
> echo OneTwoThree | sed 's/$[A-Z]$/ \1/g'
 O n e T w o T h r e e
Works fine with C locale though :
> echo OneTwoThree | LANG=C sed 's/$[A-Z]$/ \1/g'
 One Two Three
So fun locale stuff again (I am using fr_FR.utf8).

Another reason to use posix regexp:

echo OneTwoThree | sed 's/\([[:upper:]]\)/ \1/g'

It's strange anyway as upper and lower case letters are separated in utf8, but I don't really know how these things work really.

rson451 · 2008-06-12 15:32:49

OP: completely offtopic, but i'm a Morris too!

Arch Linux

#1 2008-06-11 18:11:26

Separate words in a string without spaces

#2 2008-06-11 18:39:03

Re: Separate words in a string without spaces

#3 2008-06-11 19:41:35

Re: Separate words in a string without spaces

#4 2008-06-11 21:45:03

Re: Separate words in a string without spaces

#5 2008-06-12 12:59:44

Re: Separate words in a string without spaces

#6 2008-06-12 15:32:49

Re: Separate words in a string without spaces

Board footer