java.io.UTFDataFormatException lorsque je parse un document xml contenat des accents
Bonjour,
Je n'arrive pas a lire un document xml de ce type UTF-8.
Code:
1 2 3 4 5 6
|
<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" classCode="DOCCLIN" moodCode="EVN">
<id root="2.16.840.1.113883.3.31.3.1.2.2.1" extension="dev_ser_400" displayable="false"/>
<code xsi:type="CD" code="120" displayName=" CR de sꫯur hospitalier" codeSystem="2.16.840.1.113883.3.31.4.1"/>
... |
Comme vous pouvez le voir ce document contient un accent transformé en ꫯ.
Lorsque je parse le document xml, contenant un accent, avec mon code java suivant :
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13
|
Object LoadMessage(String messagetypestr, String message){
Object rim = null;
MessageTypeLoader<MessageType> mtl =org.hl7.meta.mif.MessageTypeLoaderAdapter.getInstance();
MessageType messageType = mtl.loadMessageType(messagetypestr);
try{
File f = new File(message);
FileInputStream in = new FileInputStream(f);
ApplicationContext ac = new ContextForThis();
rim = MessageContentHandler.parseMessage(ac, in, messageType);
}catch(Exception e){System.out.println("Cant load message file");}
return rim;
} |
L'erreur se trouve lorsque j utilise le code suivant : MessageContentHandler.parseMessage(ac, in, messageType);
J'obtiens l'erreur suivante :
Citation:
java.lang.Error: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
at org.hl7.xml.parser.MessageContentHandler.parseMessage(MessageContentHandler.java:101)
at fr.aphp.mediweb.hl7.HL7ImportTest.LoadMessage(HL7ImportTest.java:31)
at fr.aphp.mediweb.hl7.HL7ImportTest.testCDABasicExample(HL7ImportTest.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1198)
at org.hl7.xml.parser.MessageContentHandler.parseMessage(MessageContentHandler.java:93)
... 20 more
Une facon de résoudre ce probleme c est de changer le caractere encodibg en ISO-8859-1 au lieu de UTF-8. J'aimerais avoir une solution qui puisse supporter sans modifier le document. Comment faire ?
Merci