Simplification de syntaxe

**MaxwellCoste** · 29/11/2019, 17h04

Bonjour à tous,

Je me sert de ce script que j'ai trouvé

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
<?php
//01 get the html returned from the following url
$html = file_get_contents('https://fr.wikipedia.org/wiki/Anaconda,_le_prédateur'); 
//init DOMDocument
$scriptDocument = new DOMDocument();
//disable libxml errors
libxml_use_internal_errors(TRUE); 
//check if any html is actually returned
if(!empty($html)){ 
	//loadHTML
	$scriptDocument->loadHTML($html);
	//clear errors for yucky html
	libxml_clear_errors(); 
	//init DOMXPath
	$scriptDOMXPath = new DOMXPath($scriptDocument);
	//get all the h1's
	$scriptRow = $scriptDOMXPath->query('//h1');
	//check
	if($scriptRow->length > 0){
		foreach($scriptRow as $row){
			echo $row->nodeValue . "<br/>";
		}
	}
}
?>
<?php
//02 get the html returned from the following url
$html = file_get_contents('https://fr.wikipedia.org/wiki/Eunectes_murinus'); 
//init DOMDocument
$scriptDocument = new DOMDocument();
//disable libxml errors
libxml_use_internal_errors(TRUE); 
//check if any html is actually returned
if(!empty($html)){ 
	//loadHTML
	$scriptDocument->loadHTML($html);
	//clear errors for yucky html
	libxml_clear_errors(); 
	//init DOMXPath
	$scriptDOMXPath = new DOMXPath($scriptDocument);
	//get all the h1's
	$scriptRow = $scriptDOMXPath->query('//h1');
	//check
	if($scriptRow->length > 0){
		foreach($scriptRow as $row){
			echo $row->nodeValue . "<br/>";
		}
	}
}
?>
<?php
//03 get the html returned from the following url
$html = file_get_contents('https://fr.wikipedia.org/wiki/Eunectes'); 
//init DOMDocument
$scriptDocument = new DOMDocument();
//disable libxml errors
libxml_use_internal_errors(TRUE); 
//check if any html is actually returned
if(!empty($html)){ 
	//loadHTML
	$scriptDocument->loadHTML($html);
	//clear errors for yucky html
	libxml_clear_errors(); 
	//init DOMXPath
	$scriptDOMXPath = new DOMXPath($scriptDocument);
	//get all the h1's
	$scriptRow = $scriptDOMXPath->query('//h1');
	//check
	if($scriptRow->length > 0){
		foreach($scriptRow as $row){
			echo $row->nodeValue . "<br/>";
		}
	}
}
?>
<?php
//04 get the html returned from the following url
$html = file_get_contents('https://fr.vikidia.org/wiki/Grand_anaconda'); 
//init DOMDocument
$scriptDocument = new DOMDocument();
//disable libxml errors
libxml_use_internal_errors(TRUE); 
//check if any html is actually returned
if(!empty($html)){ 
	//loadHTML
	$scriptDocument->loadHTML($html);
	//clear errors for yucky html
	libxml_clear_errors(); 
	//init DOMXPath
	$scriptDOMXPath = new DOMXPath($scriptDocument);
	//get all the h1's
	$scriptRow = $scriptDOMXPath->query('//h1');
	//check
	if($scriptRow->length > 0){
		foreach($scriptRow as $row){
			echo $row->nodeValue . "<br/>";
		}
	}
}
?>
<?php
//05 get the html returned from the following url
$html = file_get_contents('https://fr.vikidia.org/wiki/Anaconda'); 
//init DOMDocument
$scriptDocument = new DOMDocument();
//disable libxml errors
libxml_use_internal_errors(TRUE); 
//check if any html is actually returned
if(!empty($html)){ 
	//loadHTML
	$scriptDocument->loadHTML($html);
	//clear errors for yucky html
	libxml_clear_errors(); 
	//init DOMXPath
	$scriptDOMXPath = new DOMXPath($scriptDocument);
	//get all the h1's
	$scriptRow = $scriptDOMXPath->query('//h1');
	//check
	if($scriptRow->length > 0){
		foreach($scriptRow as $row){
			echo $row->nodeValue . "<br/>";
		}
	}
}
?>

comme vous le voyer, il n'y a que les url qui changent.

J'ai 200 url Wiki à traiter.

Est-ce qu'il y a un moyen d'écrire la syntaxe plus simplement sans devoir copier à chaque fois le même code pour chaque Url ?

Merci beaucoup pour votre aide
Maxwell

Invité · 29/11/2019, 18h30

Bonjour,

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?php // 1- Fonction : get the html returned from the following url
function get_the_html_returned_from_the_following_url( $url )
{
	$html = file_get_contents($url); 
	$return = '';
	//init DOMDocument
	$scriptDocument = new DOMDocument();
	//disable libxml errors
	libxml_use_internal_errors(TRUE); 
	//check if any html is actually returned
	if(!empty($html)){ 
		//loadHTML
		$scriptDocument->loadHTML($html);
		//clear errors for yucky html
		libxml_clear_errors(); 
		//init DOMXPath
		$scriptDOMXPath = new DOMXPath($scriptDocument);
		//get all the h1's
		$scriptRow = $scriptDOMXPath->query('//h1');
		//check
		if($scriptRow->length > 0){
			foreach($scriptRow as $row){
				$return .= $row->nodeValue . "<br/>";
			}
		}
	}
	return $return;
}
?>

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
<?php // 2- Affichage
echo get_the_html_returned_from_the_following_url( 'https://fr.wikipedia.org/wiki/Anaconda,_le_prédateur' );
echo get_the_html_returned_from_the_following_url( 'https://fr.wikipedia.org/wiki/Eunectes_murinus' );
echo get_the_html_returned_from_the_following_url( 'https://fr.wikipedia.org/wiki/Eunectes' );
echo get_the_html_returned_from_the_following_url( 'https://fr.vikidia.org/wiki/Grand_anaconda' ); 
echo get_the_html_returned_from_the_following_url( 'https://fr.vikidia.org/wiki/Anaconda' ); 
?>

**MaxwellCoste** · 29/11/2019, 22h05

Bonsoir merci pour votre réponse mais ça me met HTTP ERROR 500

**mathieu** · 30/11/2019, 00h35

j'ai un vieux souvenir qu'il faille indiquer un entête "User-Agent" pour pouvoir récupérer une page wikipedia. mais je n'arrive plus à retrouver d'information sur la raison.

pour rajouter l'entête à envoyer, vous pouvez utiliser le 3e argument "$context" à l'appel de file_get_contents

Invité · 30/11/2019, 10h20

Bonjour,

il manquait juste une accolade à mon code (corrigé depuis) :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
function get_the_html_returned_from_the_following_url( $url )
{
....

On affiche bien :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
Anaconda, le prédateur
Eunectes murinus
Eunectes
Grand anaconda
Anaconda

**MaxwellCoste** · 01/12/2019, 18h57

Merci beaucoup pour votre aide ça fonctionne très bien
Max

Simplification de syntaxe

PHP & Base de données

Vue hybride

Discussions similaires

Partager

Partager