lxml et caractères non ascii
Salut,
Je cherche à écrire et lire un fichier XML contenant des caractères non ascii
Voici ce que j'ai fait :
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| # -*- coding: utf-8 -*-
from lxml import etree
def write_xml(filename):
root_elt = etree.Element("Repository", {"version": str(1),
"val": "éléà"})
elt_tree = etree.ElementTree(root_elt)
tmp = etree.tostring(elt_tree, pretty_print=True, encoding="utf-8", xml_declaration=True)
with open(filename, 'w') as f:
f.write(str(tmp, encoding="utf-8"))
def read_xml(filename):
tree = etree.parse(filename)
root = tree.getroot()
version = int(root.get("version"))
val = root.get("val")
filename = "fic.xml"
write_xml(filename)
read_xml(filename) |
Voici la trace que j'obtiens :
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| Traceback (most recent call last):
File "test.py", line 22, in <module>
read_xml(filename)
File "test.py", line 13, in read_xml
tree = etree.parse(filename)
File "lxml.etree.pyx", line 2954, in lxml.etree.parse (src/lxml\lxml.etree.c:56220)
File "parser.pxi", line 1533, in lxml.etree._parseDocument (src/lxml\lxml.etree.c:82303)
File "parser.pxi", line 1562, in lxml.etree._parseDocumentFromURL (src/lxml\lxml.etree.c:82596)
File "parser.pxi", line 1462, in lxml.etree._parseDocFromFile (src/lxml\lxml.etree.c:81635)
File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml\lxml.etree.c:78544)
File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml\lxml.etree.c:74488)
File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml\lxml.etree.c:75379)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74712)
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x6C 0xE9 0xE0, line 2, column 25 |
J'ai essayé de jouer avec le paramètre encoding sans réussir à faire quelque chose de fonctionnel.
Quelqu'un a une idée ?