Retour chariot : isoler un titre de paragraphe (débutante)

**Vanessa87** · 22/03/2010, 11h30

Bonjour à tous,

Je débute avec les expressions rationnelles et voici mon problème.

Sur la copie d'écran, j'aimerais ne sélectionner que les phrases comme la numéro 1, c'est-à-dire une phrase "isolée" par un (ou deux ?) retour(s) chariot(s) au-dessus et en dessous d'elle, en fait ne sélectionner que le titre d'un paragraphe dans le flot de texte.

Il me semble qu'il faut utiliser \r (ou peut-être \v) mais malheureusement après maint essais, je n'y arrive toujours pas...

Par avance, un grand merci pour votre aide !
Bien à tous
Vanessa

**dariumis** · 22/03/2010, 11h41

salut, en fait ça dépend de la forme du texte si tu es sur du html il faut écrire:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

<br/>

si tu es sur une chaine de caractères il faut placer:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

\r\n

**Vanessa87** · 22/03/2010, 11h50

Salut dariumis,

En fait, je travaille avec un plugin Adobe Acrobat (création automatique de signets) et qui pour ce faire utilise les expressions rationnelles.

J'ai repris ton code, mais malheureusement ça ne fonctionne pas.
J'ai essayé aussi ceci :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

\r\n.+

Je le "comprends" de la manière suivante :
retour chariot (\r)
nouvelle ligne (\n)
prendre "tout" le texte (.+) ==> à savoir dans mon exemple Plato's Aims in the Republic

Il me semble pourtant que ce code correspond à ce que je veux réaliser, mais malheureusement, ça ne fonctionne pas. Pfff... je ne sais plus comment faire

Voilà, si tu as une soluce !

**ThomasR** · 22/03/2010, 14h19

La question est, par quoi est représenté ce saut de ligne dans la source, par un <br/> ou par un retour chariot ?

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
 
\r\n(.*?)\r\n
 
<br/?>(.*?)<br/?>

Aussi, tu ne souhaites pas récupérer tout ce qui suit un saut de ligne, mais tout ce qui est compris entre deux sauts de ligne.

**Vanessa87** · 22/03/2010, 15h35

Je travaille à partir d'un document PDF, aussi je ne peux pas te dire par quoi est représenté le saut de ligne dans ce type de document...

J'ai essayé tes deux codes, mais malheureusement ça ne fonctionne pas.

Je ne comprends pas pourquoi...

**ThomasR** · 22/03/2010, 15h40

Les expressions sont correctes pourtant.

Il faut voir la manière dont tu tentes de les utiliser.

Peux-tu nous montrer ton code s'il te plait ?

**Vanessa87** · 22/03/2010, 16h11

J'ai peur de ne pas comprendre... (et je ne suis pas blonde !)

J'ai essayé simplement les deux codes que tu as mis. J'ai tenté aussi celui-là :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

\v\n(.*?)\v\n

A priori, le \v veut dire tabulation verticale, mais ça ne fonctionne toujours pas. Je n'arrive pas à isoler la phrase test de la copie d'écran (Plato's Aims in the Republic).

**John Blobsmith** · 22/03/2010, 19h46

Ouai c'est peut être bien \n mais parfois le \ sert à échapper les caractères...
Essaye \\n

**Vanessa87** · 23/03/2010, 08h31

J'ai essayé, mais sans succès...

Enfin dans un PDF il y a bien des retours chariots, c'est ça que je ne comprends pas !

**Thes32** · 23/03/2010, 10h46

salut,

et avec

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

chr(10).chr(13)

ou les deux séparément ?

**ThomasR** · 23/03/2010, 12h16

Envoyé par Vanessa87

J'ai essayé, mais sans succès...

Enfin dans un PDF il y a bien des retours chariots, c'est ça que je ne comprends pas !

Dans quel environnement tu l'executes ta regex ? Dans un PDF avec une recherche ? Dans un script PHP ?

**Vanessa87** · 23/03/2010, 16h34

Je travaille avec Adobe Acrobat Pro v9 et un plugin spécifique permettant de créer automatiquement des signets et qui, pour ce faire, utilise les expressions rationnelles.

Voici une copie d'écran de l'environnement :

**s.n.a.f.u** · 24/03/2010, 15h05

Il y a un gros bouton "help", est-ce que tu pourrais nous en communiquer le résultat ?

**Vanessa87** · 24/03/2010, 15h22

Salut,

Voici un extrait de l'aide du plugin concernant les expressions rationnelles :

Regular Expressions for Pattern Matching

What is a regular expression?

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. Regular expressions are also described in the Perl documentation and in a number of other books and online resources, some of which have copious examples. There are many web sites that serve as online repository of useful regular expressions. The description here is intended as introductory documentation only.

Introduction

A regular expression, or regex for short, is a pattern describing a certain amount of text. In this document, regular expressions are highlighted in bold red as regex. Term "string" is used to indicate the text that regular expression is applied to. Text strings will be highlighted as follows: “Text string”.

The simplest form of regular expression is actual literal text. For example, regex Chapter matches text strings containing Chapter sub-string. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of meta-characters, which do not stand for themselves but instead are interpreted in a different way.

Character types

Backslash can be used to specify generic character types:

\d any decimal digit
\D any character that is not a decimal digit
\s any whitespace character
\S any character that is not a whitespace character
\w any "word" character (A "word" character is any letter or digit or the underscore character)
\W any "non-word" character

For example: \d{8} matches exactly 8 digits.

Matching alternatives

Vertical bar characters are used to separate alternative patterns. For example, the pattern Configuration|Settings matches either "Configuration" or "Settings". Any number of alternatives may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used.

Sub-Patterns

Sub-patterns are delimited by parentheses (round brackets), which can be nested. For example, the pattern ((red|white) (BMW|Volvo)) matches all combinations of "red" and "white" with words "BMW" and "Volvo" (i.e. "red BMW" or "white Volvo"). Another example: (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. The meta-character \1 here serves as a back reference to the first matching sub-pattern. Such references must, however, follow the sub-pattern to which they refer.

Matching whole words

Simple text patterns such as Alert are also going to match words Alerts, Alerted and etc. If you want your pattern to match only whole words, surround it with \b meta-characters. For example, use \bAlert\b to match only word Alert and exclude all other words that might contain it as a sub-string.

Matching sub-string

If text that you want to match should appear only inside bigger word, use \B meta-character. For example, the pattern \Bword\B will match word "swordfish", but will ignore words "word", "words" and "password".

Repetitions

The general repetition quantifier specifies a minimum and maximum number of permitted matches, by giving the two numbers in curly brackets (braces), separated by a comma. The numbers must be less than 65536, and the first must be less than or equal to the second. For example: z{2,4} matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special character. If the second number is omitted, but the comma is present, there is no upper limit; if the second number and the comma are both omitted, the quantifier specifies an exact number of required matches.

Character Classes or Character Sets

A "character class" matches only one out of several characters. To match an “a” or an “e”, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character. gr[ae]y will not match graay, graey or any such thing. The order of the characters inside a character class does not matter. You can use a hyphen inside a character class to specify a range of characters. [0-9] matches a single digit between 0 and 9. You can use more than one range. [0-9a-fA-F] matches a single hexadecimal digit, case insensitively. You can combine ranges and single characters. [0-9a-fxA-FX] matches a hexadecimal digit or the letter X.
Typing a caret after the opening square bracket will negate the character class. The result is that the character class will match any character that is not in the character class. q[^x] matches qu in question. It does not match Iraq since there is no character after the q for the negated character class to match.

Using Anchors to Match Text Lines

Anchors do not match any characters. They match only a particular text position in the string. Meta-character ^ matches at the start of the string, and $ matches at the end of the string. Symbol \b matches at a word boundary. E.g. ^b matches only the first b in bob. A word boundary is a position between a character that can be matched by \w and a character that cannot be matched by \w. Meta-character \b also matches at the start and/or end of the string if the first and/or last characters in the string are word characters. \B matches at every position where \b cannot match.

Examples:

Chapter \d$ - matches Chapter 1 , but does not match Chapter 1 Appendix
^Chapter \d - matches Chapter 1 , but does not match In the Chapter 1
Chapter\b - matches Chapter or Chapter 1, but does not match Chapters

**s.n.a.f.u** · 24/03/2010, 17h53

Merci, ça ressemble donc fortement aux expressions classiques de Perl.
Le jeu est donc de trouver la représentation du passage à la ligne.

Que donne ces trois patterns ?

Code :

Sélectionner tout - Visualiser dans une fenêtre à part