Positive and Negative Lookahead
Negative lookahead is indispensable if you want to match something not followed by something else. When
explaining character classes, I already explained why you cannot use a negated character class to match a “q”
not followed by a “u”. Negative lookahead provides the solution: « q(?!u) ». The negative lookahead
construct is the pair of round brackets, with the opening bracket followed by a question mark and an
exclamation point. Inside the lookahead, we have the trivial regex « u ».
Positive lookahead works just the same. « q(?=u) » matches a q that is followed by a u, without making the u
part of the match. The positive lookahead construct is a pair of round brackets, with the opening bracket
followed by a question mark and an equals sign. You can use any regular expression inside the lookahead.
(Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can
be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved.
Note that the lookahead itself does not create a backreference. So it is not included in the count towards
numbering the backreferences. If you want to store the match of the regex inside a backreference, you have
to put capturing parentheses around the regex inside the lookahead, like this: « (?=(regex)) ».
The other way around will not work, because the lookahead will already have discarded the regex match
by the time the backreference is to be saved.
Partager