You are not logged in.
Hi,
Using regular expressions, I want to be able to identify a list of filenames from the following log file (ie remove the path names):
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BOM.xls successfully imported
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\630222-0301_1_XLS_SSS_PPC BOM.xls successfully imported
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\630222-0302_1_XLS_SSS_PPC BOM.xls successfully imported
The regex that matches the filenames is: [^\\]+\.\w+
My first problem is that I don't quite understand why the expression isn't actually: [^\\].+\.\w+
and why this second expression selects everything to the beginning of each line.
The second problem is that no matter which regex enabled editor I use (medit, pspad or Notepad++), I can't separate out the match from the rest of the line.
With the search criteria set to: .+([^\\]+\.\w+).+
and the replace criteria set to: $1 (or \1, dependent on the editor)
The result is:
M.xls
M.xls
M.xls
What am I missing?
Last edited by Cotton (2011-11-13 12:06:53)
Offline
The regex [^\\]+\.\w+ matches the longest string that contains no '\', contains at least one '.' with no whitespace following the '.'. Thus for the first line you gave, it would match
630222-0300_1_XLS_SSS_PPC BOM.xlsThe regex [^\\].+\.\w+ matches the longest string that does not start with a '\', contains at least one '.', and has no whitespace following the '.'. For your first line, it would match
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BOM.xlsThe problem with your .+([^\\]+\.\w+) regex, is that regex's match the LONGEST string that matches. Your ([^\\]+\.\w+) regex will match any of the following strings:
'M.xms', 'OM.xms', 'BOM.xms', ' BOM.xms', 'C BOM.xms', etcbut because it is preceded by the '.+', the '.+' part will match the longest string that precedes any of the above strings. Thus it will match:
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\630222-0300_1_XLS_SSS_PPC BOand the '([^\\]+\.\w+)' will then match
M.xmsI'm not familiar with the editors you mention, but I believe the regex you need is:
.+\\(.+\.\w+)The '.+\\' will match the longest string that ends in a '\', thus
08/11/2011 09:52:21 C:\Program Files\Import Server\TempFiles\and the '(.+\.\w+)' will match everything before the '.' and everything non-whitespace following, thus
630222-0300_1_XLS_SSS_PPC BOM.xlsLast edited by rockin turtle (2011-11-11 21:48:48)
Offline
Many thanks for your detailed response.
I've modified the expression you provided to take account of the text at the end of each line, so it now looks like:
.+\\(.+\.\w+).+and it works perfectly.
I now understand the greediness of the [^\\].+\.\w+ expression and that the [^\\] criteria is overruled by the .+
I'd assumed that the [^\\] should cause the match to stop at the \ before the filename.
However, making the first part lazy of the original now does what its supposed to:
.+?([^\\]+\.\w+).+So now there are two solutions, where before there were none ![]()
But I still think you solution is more intuitive. Thanks again.
BTW If anyone's interested, I'm using the online regex tester at: http://myregexp.com/signedJar.html
Offline