The database supports implicit and explicit validation of XML documents. Implicit validation can be executed automagically when documents are being inserted into the database, explicit validation can be performed using xquery extension functions.
To enable this feature the eXist-db configuration must be changed by editing the file conf.xml. The following items must be configured:
Validation mode
Catalog Entity Resolver
<validation mode="auto">
<entity-resolver>
<catalog uri="${WEBAPP_HOME}/WEB-INF/catalog.xml" />
</entity-resolver>
</validation>
<validation mode="auto">
<entity-resolver>
<catalog uri="xmldb:exist:///db/grammar/catalog.xml" />
<catalog uri="${WEBAPP_HOME}/WEB-INF/catalog.xml" />
</entity-resolver>
</validation>
With the parameter mode it is possible to switch on/off the validation capabilities of the (Xerces) XML parser. The possible values are:
Switch on validation. All XML documents will be validated. When the grammar (XML schema, DTD) documents cannot be resolved, the document is rejected.
Switch off validation. All well-formed XML documents will be accepted.
Validation of an XML document will be performed based on the contents of the document. When a document contains a reference to a grammar (XML schema or DTD) document, the XML parser tries to resolve this grammar and the XML document will be validated against this grammar, just like mode="yes" is configured. If the grammar cannot be resolved, the XML document will be rejected. When the XML document does not contain a reference to a grammar, it will be parsed like mode="no" is configured.
All grammar (XML schema, DTD) files that must be part of the implicit validation process must be registered to the database using OASIS catalog files. These catalog files can be stored on disk and in the database.
In the upper example the ${WEBAPP_HOME} is substituted by a file:// URL pointing to the 'webapp'-directory of eXist (e.g. '$EXIST_HOME/webapp/') or the equivalent directory of a deployed WAR file when eXist is deployed in a servlet container (e.g. '${CATALINA_HOME}/webapps/exist/')
A catalog which is stored in the database can be addressed by an URL like 'xmldb:exist:///db/mycollection/catalog.xml' (note the 3 slashes) or the shorter equivalent '/db/mycollection/catalog.xml'.
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//PLAY//EN" uri="entities/play.dtd"/>
<system systemId="play.dtd" uri="entities/play.dtd"/>
<system systemId="mondial.dtd" uri="entities/mondial.dtd"/>
<uri name="http://exist-db.org/samples/shakespeare" uri="entities/play.xsd"/>
<uri name="http://www.w3.org/XML/1998/namespace" uri="entities/xml.xsd"/>
<uri name="http://www.w3.org/2001/XMLSchema" uri="entities/XMLSchema.xsd"/>
<uri name="urn:oasis:names:tc:entity:xmlns:xml:catalog" uri="entities/catalog.xsd" />
</catalog>
It is possible to configure any number of catalog entries in the entity-resolver section of conf.xml . The relative "uri="s are resolved relative to the location of the catalog document.
The validation mode for each individal collection can be configured using collection.xconf documents, in the same way these are used for configuring indexes. These documents need to be stored in '/db/system/config/db/....'.
<?xml version='1.0'?>
<collection xmlns="http://exist-db.org/collection-config/1.0">
<validation mode="yes"/>
</collection>
The database provides two extension functions to perform XML validation from an xquery script:
validate(...)
validate-report(...)
The first function returns a simple true or false while the second generates a XML validation report. Both functions accept either one or two parameters:
validate-report($a) : Validate $a using the same catalog as used by the implicit validation process.
validate-report($a,$b) : Validate $a using $b.
Explaination of parameters:
|
$a |
XML document as
|
|---|---|
|
$b |
xs:anyURI pointing to
|
<report>
<status>invalid</status>
<time>62</time>
<message level="Error" line="12" column="15">cvc-complex-type.2.4.a: Invalid content was
found starting with element 'name'. One of '{"http://jmvanel.free.fr/xsd/addressBook":cname}' is expected.</message>
</report></exist>
The XML parser (Xerces) compiles all grammar files (dtd, xsd) when they are used. For efficiency reasons these compiled grammars are cached for reuse, this results into a dramatic increase of validation speed. However under certain conditions (e.g. grammar development) this cache must be cleared. There are two grammar management functions available:
clear-grammar-cache() : removes all cached grammar and returns the number of removed grammar
show-grammar-cache() : returns an XML report about all cached grammar
<?xml version='1.0'?>
<report>
<grammar type="http://www.w3.org/2001/XMLSchema">
<Namespace>http://www.w3.org/XML/1998/namespace</Namespace>
<BaseSystemId>file:/Users/guest/existdb/trunk/webapp//WEB-INF/entities/XMLSchema.xsd</BaseSystemId>
<LiteralSystemId>http://www.w3.org/2001/xml.xsd</LiteralSystemId>
<ExpandedSystemId>http://www.w3.org/2001/xml.xsd</ExpandedSystemId>
</grammar>
<grammar type="http://www.w3.org/2001/XMLSchema">
<Namespace>http://www.w3.org/2001/XMLSchema</Namespace>
<BaseSystemId>file:/Users/guest/existdb/trunk/schema/collection.xconf.xsd</BaseSystemId>
</grammar>
</report>
Note: the element BaseSystemId typically does not provide usefull information.
The interactive shell mode of the java client provides a simple validate command that accepts the similar explicit validation arguments.
This section provides a number of XML fragments demonstrating the required format of the XML documents. Note that a root element should always have a reference to a namespace.
Most simple reference to an XML schema. The xmlns info is used by the parser to resolve the grammar document.
<?xml version='1.0'?>
<addressBook xmlns="http://jmvanel.free.fr/xsd/addressBook">
.....
</addressBook>
xsi:schemaLocation provides additional information to the parser on how to resolve grammar file. According to the XML schema specifications this information is considered to be a hint and might be ignored. eXist will ignore this informaton, the grammar will be resolved like the previous example.
<?xml version='1.0'?>
<addressBook xsi:schemaLocation="http://jmvanel.free.fr/xsd/addressBook http://myshost/schema.xsd"
xmlns="http://jmvanel.free.fr/xsd/addressBook"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
.....
</addressBook>
Taken from: conf.xml. The xsi:noNamespaceSchemaLocation is honoured by the parser during implicit validation.
<?xml version='1.0'?>
<exist xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema/conf.xsd">
.....
</exist>
Taken from 'samples/validation/dtd'. eXist resolves the grammar by searching catalog files for the PUBLIC identifier.
<?xml version='1.0'?>
<!DOCTYPE PLAY PUBLIC "-//VALIDATION//EN" "hamlet.dtd">
<PLAY>
.....
</PLAY>
Tomcat has an long standing bug which makes it impossible to register
a custom protocol handler (object URLStreamHandler) to the JVM. The alternative is to register
the object by setting the system property
java.protocol.handler.pkgs but this fails as well.
As a result the validation features are only partly useable in tomcat.
There are two altenatives: (1) switch to a recent version of Jetty, or
(2) use absolute URLs pointing the the rest interface, e.g.
http://localhost:8080/exist/rest/db/mycollection/schema.xsd.
eXist heavily relies on the features as provided by the Xerces XML parser. By default eXist izPack installer provides all required jar files. However, when eXist is installed in e.g. Tomcat the required parser libraries need to be copied manually from the lib/endorsed directory into the server 'endorsed' directory.
Required endorsed files: resolver-*.jar xalan-*.jar xml-apis.jar serializer-*.jar xercesImpl-*.jar
To prevent potential deadlocks it is considered to be a good idea to store XML instance documents and grammar documents in seperate collections.