Bonjour , j ' aimerai pouvoir parser un fichier html que je récupere sur internet via jdom.
Or , je ne peut parser le fichier a cause de la ligne :"<!DOCTYPE html>
<html lang="fr">" qui se trouve dans le fichier.
Peut - on ignorer ces instructions lors du parsing html ?

Pour info , mon fichier html ressemble a ceci:

Code : Sélectionner tout - Visualiser dans une fenêtre à part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
<!DOCTYPE html>
<html lang="fr">
  <head>
    <meta name="generator"
    content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" />
    <meta http-equiv="content-type"
    content="text/html; charset=ISO-8859-1" />
<script type="text/javascript">
(function(){var h=document.documentElement;h.className+=" js";(new
Image()).src='http://a.l.yimg.com/a/i/eu/sch/fr_srp_metro_20090914.png';})();
    </script>
    <link rel="alternate" type="application/rss+xml"
    title="Yahoo! France Search results for itin"
    href="http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&amp;query=itin&amp;adult_ok=1&amp;region=fr" />
    <title>itin - Yahoo! France R�sultats de
    recherche</title>
    <link rel="stylesheet" type="text/css"
    href="http://a.l.yimg.com/a/lib/s8/srp-core-css_201102221437.css" />
<style type="text/css">
s,.sprt,#logo,#att_icon,#verizon_icon
.ico,#at-tog,.h-gui-icon,.ssbang,.sschk,.ssochk,.ssbx,#ss
a.ss-remove,.ads a.pp-l,.news-nph,.stars-sm span,.stars-lg
span,.thmbplay,.sc-promo-img,.sc-close{background-image:url(http://a.l.yimg.com/a/i/eu/sch/fr_srp_metro_20090914.png);}.alsotry{font-size:1.1em;}.msg404{background-color:#7B0099;color:#FFF;margin-left:197px;padding:5px;line-height:1.5em}.msg404
h1{font-size:138.5%;display:inline;margin-left:5px;*margin-left:0}.msg404
.ico{float:left;width:22px;height:22px;top:-2px;left:-2px;border:none;margin-right:2px}.msg404
.ico img{width:22px;height:22px;vertical-align:top;border:1px solid
#7B0099}.msg404 a,.custom a:visited{color:#FFF}.msg404
p{margin-left:28px}.msg404 .relmsg{font-weight:bold}.msg404 .relmsg
cite{font-weight:normal}
    </style>
<style type="text/css">
.sm-plain
.thmbplay,span.sm-ldg,a.sm-cls,span.sm-flg,a.sm-flg,a.sm-snd,a.sm-hlp,a.sm-ext,a.sm-ext-u,.bbl-cls,#ysch
.yui-dialog .container-close,#ysch .yui-dialog
div.on,.sm-ctl-b,.sm-ctl-t{background-image:url(http://a.l.yimg.com/a/i/us/sch/gr4/smsprt_20090407.png);background-repeat:no-repeat;font-size:
0;}.ads.horiz.bot{background-color:#ecf4fa;}.rais{min-width:516px}.rais
h2 a{margin-right:7px}.rais
.abst{max-width:42em;margin-right:8px;_margin-right:0;_width:508px}.rais
p{margin-top:1px;*margin-top:0}.spns li.rais div
h3{margin-bottom:1px}.spns li.rais div h3 a{font-size:123%}.spns
li.rais div a{font-size:100%}body.opera .rais h3,body.safari .rais
h3{margin-bottom:0}.rais .thmb{float:right;margin:4px 0 0 5px}.rais
.btm{clear:both}.rais em{color:#008000}.rais_dl{color:#888}.rais_dl
a{white-space:nowrap}.rais_vert{max-width:28em;margin:0;padding:3px
0 0
43px}.rais_vert:after{clear:both;content:'.';display:block;height:0;visibility:hidden}.rais_vert_dl{max-width:20em;float:left}.rais_vert_frm{max-width:20em;float:left;margin-left:35px}.ads
li.rais .rais_dl a{display:inline}.ads.horiz{zoom:1}.ads .rais
.vidmsg{width:94px;margin-left:4px}.ads .rais .thmb
span.thmbplay{background-position:-183px
-376px}#rais_cr_pdt,#rais_cr_ddt,#rais_cr_loc,#rais_go{font-size:12px}#rais_cr_pt,#rais_cr_dt{bottom:1px;font-size:11px;position:relative}.ads.horiz
ul.reducepx-spnslist{padding-bottom:4px;padding-top:3px}.rais
a.thmb{width:82px;height:62px;border:1px solid
#999;background-color:#FFF}.rais .thmb
img{width:80px;height:60px;border:0;margin:1px}.rais
img.thmb{width:80px;height:60px}.rais .logo{top:57px}.rais
div.clr{clear:both}.rais ul.pharma{float:right;padding:0}.rais
ul.pharma li{padding:0;margin:0}.rais ul.pharma
.thmb{float:left;width:87px;height:67px}#warn_lnk{cursor:pointer}#warn_lnk
a{margin-left:5px;margin-right:2px;color:#0000DE;font-size:12px}#warn_lnk
s{background-repeat:no-repeat;cursor:pointer;font-size:0}#warn_lnk
s.ico-colps{background-position:-192px
-66px;height:5px;width:8px;cursor:pointer;display:inline-block}#warn_lnk
s.ico-expnd{-moz-transform:rotate(180deg);-webkit-transform:rotate(180deg);filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);_filter:none;top:-2px;position:relative}.rais
.warnmsg{font:12px arial;float:left;height:0;width:490px;margin:4px
15px 7px 0;background-color:#fff;border:1px solid
#999;padding:10px;overflow:auto}.rais .warnmsg
h4{color:#0000DE;text-decoration:underline;font-weight:bold;margin-bottom:1em}.rais
.hide{border:medium
none;overflow:hidden;padding:0;margin:0;position:relative}
    </style>
  </head>
  <body id="ysch" class=" netscape windows v5_0">
    <div id="doc">
      <h1 class="off-left">Yahoo! France Recherche Web</h1>
      <div id="uh">
        <ul class="l">
          <li class="hi">Bonjour, 
          <strong>Invit�</strong></li>
          <li>
            <a
            href="http://fr.wrs.yahoo.com/_ylt=A7x9Qb5hr6VNIVAA7UVjAQx./SIG=15ft0suel/EXP=1302725569/**http%3a//login.europe.yahoo.com/config/login%3f.src=srch%26.intl=fr%26.done=http%253A%252F%252Ffr.search.yahoo.com%252Fsearch%253Fei%253DUTF-8%2526q%253Ditin"
             target="_top">Ouvrir une session</a>
          </li>
          <li>
            <a
            href="http://fr.wrs.yahoo.com/_ylt=A7x9Qb5hr6VNIVAA7kVjAQx./SIG=11utmp3ug/EXP=1302725569/**http%3a//eur.help.yahoo.com/help/fr/ysearch/">Aide</a>
          </li>
etc....
Merci bien