29/06/2009
› Experimental, Tutorial › Tags: Perl, Regular Expression, Scripting, TutorialGreedy and Non-Greedy Matching using Perl Regular Expression
In this article I will show the difference between the default greedy (.+ and .*) and non-greedy matching using Perl-compatible regular expression.
For example, I have a string in a line:
$line= "Seq[03] Command : CREATE("A") Seq[04] Command : CREATE("B") Seq[04] error: 5006 Seq[03] error: 5006 Seq[05] Command : DELETE("A") Seq[05] error: 5006 ";
And I could capture all the Command without the error code using the following regular expression:
m/Seq\[[0-9]{2}\].+Command.+\(.+\)/gi
and the Perl code to show all string matching string:
while( $line =~ m/Seq\[[0-9]{2}\].+Command.+\(.+\)/gi ) {
print $& . "\n";
}
But it doesn’t work as I expected because of the “greediness” of .+ and/or .*. The above code will match:
Seq[03] Command : CREATE("A") Seq[04] Command :
CREATE("B") Seq[04] error: 5006 Seq[03] error: 5006 Seq[05] Command : DELETE("A")
I expect the following output:
Seq[03] Command : CREATE("A")
Seq[03] Command : CREATE("B")
Seq[03] Command : DELETE("A")
To get the matching output as I expected, I need to modify the greedines the regular expression to a non-greedy one. How? After .+ and/or .* you need to add a question mark ? so the regular expression is now become:
m/Seq\[[0-9]{2}\].+?Command.+?\(.+?\)/gi
The following code is the full Perl code:
#!/bin/perl
$line= "Seq[03] Command : CREATE("A") Seq[04] Command : CREATE("B") Seq[04] error: 5006 Seq[03] error: 5006 Seq[05] Command : DELETE("A") Seq[05] error: 5006 ";
while( $line =~ m/Seq\[[0-9]{2}\].+?Command.+?\(.+?\)/gi ) {
print $& . "\n";
}




