When deploying eXist in a production environment, I really want to make sure that the database is in a consistent state and that potential problems are detected as early as possible. Even if the database is running well, bad things can happen which are outside of eXist's reach (e.g. an OutOfMemory error in the servlet container, which can be fatal).

eXist 1.2.1 will thus offer an automatic consistency and sanity checker. It's main job is to detect inconsistencies or damages in the core database files. This includes the document and collection storage (dom.dbx, collections.dbx) as well as the symbol table (symbols.dbx). While all the indexes can be rebuild after a crash, a corruption in the core files can lead to real data loss.

The tool will first check the collection hierarchy, then scan through the stored node tree of every document in the db, testing node properties like the node's id, child count, attribute count and node relationships. Contrary to normal database operations, the different dbx files are checked independently. This means that even if a collection is no longer readable, the tool will still be able to scan the documents in the damaged collection (this becomes important in connection with emergency backups, see below).

Checking documents is very fast, much faster than serializing or exporting the data.

Emergency Backups

The consistency check service is coupled with a new emergency backup class, called SystemExport. If an error is reported by the consistency checker, the service will immediately trigger an emergency export of the entire database. Contrary to the standard backup/restore, the SystemExport class operates on a much lower-level of the db, directly accessing the core database files.

It uses the information provided by the consistency check to work around damages in the db. SystemExport tries to export as much data as possible, even if parts of the collection hierarchy are corrupted or documents are damaged. Features:

  • Descendant collections will be exported properly even if their ancestor collection is corrupted
  • Documents which are intact but belong to a destroyed collection will be stored into a special collection /db/lost_and_found
  • Damaged documents are detected and are removed from the backup

The format of the emergency backup is compatible with the standard backup/restore tools.

Testing the Consistency Check and Export Tools

A standalone version of the consistency check and export tools can be launched with

bin/run.sh org.exist.backup.ExportGUI
exportgui.png

The main purpose of this utility is to create emergency backups from a database which can no longer be started normally. The utility needs to open the database embedded. You have to shut down any running eXist instance before starting the scan or you will get an exception.

For every check run, an error report will be written into the directory specified in "Output Directory". If you clicked on "Check Export", the utility will also export the database into a zip file in the same directory. This backup can be restored via the standard backup/restore tools, see for more info.

Scheduling the Service

To run the consistency checker in a live environment, it needs to be set up as a system task. System tasks are launched by eXist's core scheduler. The scheduler will wait for all active transactions to complete then start the system task. New transactions will be blocked while the task is running. The system task can be configured in conf.xml (though it is also possible to schedule it via XQuery). Add the following job definition to the scheduler section:

<job type="system" class="org.exist.storage.ConsistencyCheckTask" cron-trigger="0 0 0/12 * * ?"> <!-- the output directory. paths are relative to the data dir --> <parameter name="output" value="sanity"/> <parameter name="backup" value="no"/> </job>

This will launch a consistency check every 12 hours, starting at midnight. If parameter "backup" is set to "yes", a complete backup of the system will be triggered after each check run. If set to "no", a backup will only be created if consistency errors are detected.

Using JMX for Remote Access

I also thought about sending out an email to the admin if a consistency check fails. However, I finally chose a more generic approach based on JMX and notifications. If needed, it should be easy to implement your own email alerter class which registers for the JMX notification.

jconsole.png

If Java Management Extensions (JMX) are enabled in the Java VM that is running eXist, you can use a JMX client to see the latest consistency check reports. The screenshot shows jconsole, which is included with the Java 5 and 6 JDKs.

You can also subscribe to the notifications made available by the SanityReport MBean to be informed of sanity check results. Please consult on how to configure JMX.