You are not logged in.

#1 2011-11-11 18:39:12

Cotton
Member
From: Cornwall, UK
Registered: 2004-09-17
Posts: 568

[Solved] Regex help - replacement string not identified correctly

Hi,

Using regular expressions, I want to be able to identify a list of filenames from the following log file (ie remove the path names):

08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BOM.xls successfully imported
08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\630222-0301_1_XLS_SSS_PPC BOM.xls successfully imported
08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\630222-0302_1_XLS_SSS_PPC BOM.xls successfully imported

The regex that matches the filenames is:  [^\\]+\.\w+

My first problem is that I don't quite understand why the expression isn't actually:  [^\\].+\.\w+ 
and why this second expression selects everything to the beginning of each line.

The second problem is that no matter which regex enabled editor I use (medit, pspad or Notepad++), I can't separate out the match from the rest of the line.

With the search criteria set to:  .+([^\\]+\.\w+).+
and the replace criteria set to:  $1   (or \1, dependent on the editor)

The result is:
M.xls
M.xls
M.xls

What am I missing?

Last edited by Cotton (2011-11-13 12:06:53)

Offline

#2 2011-11-11 21:21:50

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 227

Re: [Solved] Regex help - replacement string not identified correctly

The regex [^\\]+\.\w+ matches the longest string that contains no '\', contains at least one '.' with no whitespace following the '.'. Thus for the first line you gave, it would match

630222-0300_1_XLS_SSS_PPC BOM.xls

The regex [^\\].+\.\w+ matches the longest string that does not start with a '\', contains at least one '.', and has no whitespace following the '.'. For your first line, it would match

08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BOM.xls

The problem with your .+([^\\]+\.\w+) regex, is that regex's match the LONGEST string that matches. Your ([^\\]+\.\w+) regex will match any of the following strings:

'M.xms', 'OM.xms', 'BOM.xms', ' BOM.xms', 'C BOM.xms', etc

but because it is preceded by the '.+', the '.+' part will match the longest string that precedes any of the above strings. Thus it will match:

08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BO

and the '([^\\]+\.\w+)' will then match

M.xms

I'm not familiar with the editors you mention, but I believe the regex you need is:

.+\\(.+\.\w+)

The '.+\\' will match the longest string that ends in a '\', thus

08/11/2011 09:52:21  C:\Program Files\Import Server\TempFiles\

and the '(.+\.\w+)' will match everything before the '.' and everything non-whitespace following, thus

630222-0300_1_XLS_SSS_PPC BOM.xls

Last edited by rockin turtle (2011-11-11 21:48:48)

Offline

#3 2011-11-12 22:53:05

Cotton
Member
From: Cornwall, UK
Registered: 2004-09-17
Posts: 568

Re: [Solved] Regex help - replacement string not identified correctly

Many thanks for your detailed response.

I've modified the expression you provided to take account of the text at the end of each line, so it now looks like: 

.+\\(.+\.\w+).+

and it works perfectly.

I now understand the greediness of the [^\\].+\.\w+ expression and that the [^\\] criteria is overruled by the .+

I'd assumed that the [^\\] should cause the match to stop at the \ before the filename.

However, making the first part lazy of the original now does what its supposed to: 

.+?([^\\]+\.\w+).+

So now there are two solutions, where before there were none wink

But I still think you solution is more intuitive.  Thanks again.

BTW If anyone's interested, I'm using the online regex tester at: http://myregexp.com/signedJar.html

Offline

Board footer

Powered by FluxBB