正则表达式语法详解.

来源:http://www.onejava.com  更新日期:2008-06-22 07:20 

Regular expressions allow more complex search and replace functions to be performed in a single operation.

 

 

Regular Expressions Syntax:

 

Symbol

Function

%

Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.

$

Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.

?

Matches any single character except newline.

*

Matches any number of occurrences of any character except newline.

+

Matches one or more of the preceding character/expression.  At least one occurrence of the character must be found.  Does not match repeated newlines.

++

Matches the preceding character/expression zero or more times.  Does not match repeated newlines.

^b

Matches a page break.

^p

Matches a newline (CR/LF) (paragraph) (DOS Files)

^r

Matches a newline (CR Only) (paragraph) (MAC Files)

^n

Matches a newline (LF Only) (paragraph) (UNIX Files)

^t

Matches a tab character

[ ]

Matches any single character or range in the brackets

^{A^}^{B^}

Matches expression A OR B

^

Overrides the following regular expression character

^(區)  

Brackets or tags an expression to use in the replace command.  A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
 
The corresponding replacement expression is ^x, for x in the range 1-9.  Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

 

Note - ^ refers to the character '^' NOT Control Key + value.

 

Examples:

m?n matches "man", "men", "min" but not "moon".

 

t*t matches "test", "tonight" and "tea time" (the "tea t" portion) but not "tea

time" (newline between "tea " and "time").

 

Te+st matches "test", "teest", "teeeest" etc. but does not match "tst".

 

[aeiou] matches every lowercase vowel

[,.?] matches a literal ",", "." or "?".

[0-9a-z] matches any digit, or lowercase letter

[~0-9] matches any character except a digit (~ means NOT the following)

 

You may search for an expression A or B as follows:

 

"^{John^}^{Tom^}?/SPAN>

 

This will search for an occurrence of John or Tom.  There should be nothing between the two expressions.

 

You may combine A or B and C or D in the same search as follows:

 

"^{John^}^{Tom^} ^{Smith^}^{Jones^}"

 

This will search for John or Tom followed by Smith or Jones.

 

The table below shows the syntax for the "Unix" style regular expressions.

 

Regular Expressions (Unix Syntax):

 

Symbol

Function

\

Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "\n" matches a linefeed or newline character.  See examples below (\d, \f, \n etc).

^

Matches/anchors the beginning of line.

$

Matches/anchors the end of line.

*

Matches the preceding character zero or more times.

+

Matches the preceding character one or more times. Does not match repeated newlines.

Matches any single character except a newline character. Does not match repeated newlines.

(expression)

Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
 
The corresponding replacement expression is \x, for x in the range 1-9.  Example: If (h.*o) (f.*s) matches "hello folks", \2 \1 would replace it with "folks hello".

[xyz]

A character set. Matches any characters between brackets.

[^xyz]

A negative character set. Matches any characters NOT between brackets.

\d

Matches a digit character. Equivalent to [0-9].

\D

Matches a nondigit character. Equivalent to [^0-9].

\f

Matches a form-feed character.

\n

Matches a linefeed character.

\r

Matches a carriage return character.

\s

Matches any whitespace including space, tab, form-feed, etc but not newline.

\S

Matches any non-whitespace character but not newline.

\t

Matches a tab character.

\v

Matches a vertical tab character.

\w

Matches any word character including underscore.

\W

Matches any nonword character.

\p

Matches CR/LF (same as \r\n) to match a DOS line terminator

 

Note - ^ refers to the character '^' NOT Control Key + value.

 

Examples:

m.n matches "man", "men", "min" but not "moon".

 

Te+st matches "test", "teest", "teeeest" etc. BUT NOT "tst".

 

Te*st matches "test", "teest", "teeeest" etc. AND "tst".

 

[aeiou] matches every lowercase vowel

[,.?] matches a literal ",", "." or "?".

[0-9a-z] matches any digit, or lowercase letter

[^0-9] matches any character except a digit (^ means NOT the following)

 

You may search for an expression A or B as follows:

 

"(John|Tom)"

 

This will search for an occurrence of John or Tom.  There should be nothing between the two expressions.

 

You may combine A or B and C or D in the same search as follows:

 

"(John|Tom) (Smith|Jones)"

 

 

This will search for John or Tom followed by Smith or Jones.

 

If Regular Expression is not selected for the find/replace and in the Replace field the following special characters are also valid:

 

Symbol

Function

^^

Matches a "^" character

^s

Is substituted with the selected (highlighted) text of the active file window.

^c

Is substituted with the contents of the clipboard.

^b

Matches a page break

^p

Matches a newline (CR/LF) (paragraph) (DOS Files)

^r

Matches a newline (CR Only) (paragraph) (MAC Files)

^n

Matches a newline (LF Only) (paragraph) (UNIX Files)

^t

Matches a tab character

 

Note - ^ refers to the character '^' NOT Control Key + value.