Production Use - Good Practice

(1Q20)


Based on our experiences of developing and using eXist-db in production environments we learned a number of lessons. This Good Practice guide is an attempt to cover some of the considerations that should be taken into account when deploying eXist-db for use in a production environment.

The concepts laid out within this document should not be considered absolute or accepted wholesale - they should rather be used as suggestions to guide users in their eXist-db deployments.

The Server

  • Ensure that your server is up-to-date and patched with any necessary security fixes.

  • eXist-db is written in Java - so for performance and security reasons, please ensure that you have the latest and greatest Java JDK release that is compatible with your version of eXist. The latest version can always be found here at: http://java.sun.com and the recommended major version for a given eXist release can be found at: https://bintray.com/existdb/releases/exist#read

  • For dockerized production systems we strongly recommend to use semantic-version tags of official release, e.g. 5.3.0 instead of release or latest.

Install from Source or Release?

Most users will install an officially released version of eXist-db on their production systems and usually this is perfectly fine. However, for production systems there can be advantages to installing eXist-db from source code.

eXist-db may be installed and build (see Building eXist-db) from source code to a production system in one of two ways:

Via Local Build Machine (preferred)

You checkout the eXist-db code for a release branch (or trunk) from our GitHub repository to a local machine. From here you build a distribution which you test and then deploy to your live server.

Directly from GitHub

In this case you don't use a local machine for building an eXist-db distribution, but you checkout the code from a release branch (or the develop branch) directly from our GitHub repository on your server and build it in-situ.

Some advantages of installing eXist-db from source code are:

Patches

If patches or fixes are developed that are relevant to your specific needs, you can update your code and re-build eXist.

Features

If you are following trunk and new features are developed which you are interested in, you can update your code and re-build to take advantage of these.

Warning:

eXist's code trunk is generally not recommended for production use! Although it should always compile and be relatively stable, it may also contain as yet unrecognised regressions or result in unexpected behaviour.

Upgrading

If you are upgrading the version of eXist-db that you use in your production system, please always follow these two points:

Backup

Always make sure you have a full database backup before you upgrade.

Test

Always test your application in the new version of eXist-db in a development environment to ensure expected behaviour before you upgrade your production system.

Configuring eXist

There are four main things to consider here:

Security - Permissions

Ensure that eXist-db is installed in a secure manner.

Security - Attack Surface

Configure eXist-db so it provides only what you need for your application.

Resources

Configure your system and eXist-db so that eXist-db has access to enough resources and the system starts and stops eXist-db in a clean manner.

Performance

Configure your system and eXist-db so that you get the maximum performance possible.

Security - Permissions

The permission requirements for development and deployment servers are rather different. Here we explain what to look out for starting with the default configuration.

eXist-db Permissions

eXist-db ships with fairly relaxed permissions to facilitate rapid application development. However, for production systems these should be constrained:

admin account

The password of the admin account is blank by default! Ensure that you set a decent password.

default permissions

The default permissions for creating resources and collections in eXist-db are 0666 for resources, and 0777 for collections. From these default permissions, the user's umask is subtracted to give the permissions assigned to new resources and collections. By default each new user has the umask 022, which leads to new resources having the mode 0644, and collections 0755. You may wish to modify the umask of some of your users to further restrict the default permisions when they create resources and collections.

/db permissions

The default permissions for /db are 0755, which should be sufficient in most cases. In case you needed to change this, you could do that with (here for 0775):

sm:chmod(xs:anyURI("/db"), "rwxrwxr-x")

Operating System Permissions

eXist-db should be deployed and configured to run whilst following the security best practices of the operating system on which it is deployed.

Typically we would recommend creating an exist user account and exist user group with no login privileges (no shell and empty password), changing the permissions of the eXist-db installation to be owned by that user and group. Then run eXist-db using those credentials. An example of this on OpenSolaris might be:

$ pfexec groupadd exist
$ pfexec useradd -c "eXist Native XML Database" -d /home/exist -g exist -m exist
$ pfexec chown -R exist:exist /opt/eXist

Security - Attack Surface

For any live application it is best practice to keep the attack surface of the application as small as possible. There are three aspects to this:

  1. Limiting means of arbitrary code execution.

  2. Reducing the application itself to the absolute essentials.

  3. Limiting access routes to the application.

eXist-db is no exception and should be configured for your production systems so that it provides only what you need and no more. For example, the majority of applications will be unlikely to require the WebDAV or SOAP Admin features for operation in a live environment. These and other services can be disabled easily.

Means for anonymous users to execute arbitrary code require special attention. There are three means of code execution in eXist, which make sense during development, but should be reconsidered for production systems:

Java binding

The ability to execute java code from inside the XQuery processor is disabled by default in conf.xml:

<xquery enable-java-binding="no"/>

It is strongly recommended to keep it disabled on production systems.

XML external entities

In order to ensure a secure environment, the external-general-entities, external-parameter-entities, and secure-processing feature flags should be set in conf.xml:

<parser>
  <xml>
    <features>
      <feature name="http://xml.org/sax/features/external-general-entities" value="false"/>
      <feature name="http://xml.org/sax/features/external-parameter-entities" value="false"/>
      <feature name="http://javax.xml.XMLConstants/feature/secure-processing" value="true"/>
    </features>
  </xml>
</parser>
REST server

We recommend to prevent eXist's REST server from directly receiving web requests, and use URL Rewriting only to control code execution. The REST server feature is enabled by default in $EXIST_HOME/etc/webapp/WEB-INF/web.xml. Changing the param-value to true, allows you to filter request via your own XQuery controller.

<init-param>
  <param-name>
    hidden
  </param-name>
  <param-value>
    true
  </param-value>
</init-param>

The following options allow a more fine-grained control over the REST server's functionality:

XQuery submissions

We recommend to restrict the REST servers ability to execute XQuery code to authenticated users, by modifying: $EXIST_HOME/etc/webapp/WEB-INF/web.xml:

<init-param>
  <param-name>
    xquery-submission
  </param-name>
  <param-value>
    authenticated
  </param-value>
</init-param>
XUpdate statements

In addition, we recommend to restrict the REST servers ability to execute XUpdate statements. Simply modify $EXIST_HOME/etc/webapp/WEB-INF/web.xmlby changing the param-value from enabled to disabled:

<init-param>
  <param-name>
    xupdate-submission
  </param-name>
  <param-value>
    disabled
  </param-value>
</init-param>

Further considerations for a live environment:

Services

eXist-db provides services for accessing the database. You should reduce these to the absolute minimum you need for your production application.This is done via etc/webapp/WEB-INF/web.xml. You should look at each configured service, servlet or filter and ask yourself: do we use this? Most production environments are unlikely to need WebDAV.

Extension Modules

eXist-db loads several XQuery and Index extension modules by default. You should modify the <builtin-modules> section of etc/conf.xml and only load what you need for your application.

If you make use of the Cache Module, you should make sure that it has either a maximumSize or expireAfterAccess bound configured, this ensures that the Cache can consume all memory.

Resources

You should ensure that you have enough memory and disk space in your system so that eXist-db can cope with peak demands.

-Xmx

However you decide to deploy and start eXist, please ensure that you allocate enough maximum memory to eXist-db uwing the Java -Xmx setting. See bin/backup.sh and bin/startup.sh.

cacheSize and collectionCache

These two settings in <db-connection> of etc/conf.xml should be adjusted appropriately based on your -Xmx setting (see above). See the tuning guide for advice on sensible values.

Disk space

Please ensure that you have plenty of space for your database to grow. Unsurprisingly, running out of disk space can result in database corruptions or having to rollback the database to a known state.

Performance

Keeping the eXist-db application, data and journal on separate disks, connected to different I/O channels, can have a positive impact on performance. The location of the data files and journals can be changed in etc/conf.xml.

In addition to gain the absolute best performance, for eXist-db 5.0.0 or newer, it may be beneficial to disable Lock Event Tracking in the Lock Table. The Lock Table can be disabled in the etc/conf.xml configuration file.

Backups

This is fundamental: Make sure you have them, that they are up-to-date and that a restore is possible!

eXist-db provides 3 different mechanisms for performing backups -

  1. Full database backup.

  2. Differential database backup.

  3. Snapshot of the database data files.

Each of these backup mechanisms can be scheduled, either with eXist-db or with your operating system scheduler. See the backup article and conf.xml for further details.

Web Deployments

eXist-db, like any Web Application Server, should not be directly exposed to the Web. Instead, we strongly recommend proxying eXist-db through a Web Server such as Nginx or Apache HTTPD. See here for further details.

If you proxy eXist-db through a Web Server, you can also configure your firewall to allow external access directly to the Web Server only. If done correctly this means that web users will not be able to access any eXist-db services directly, except your application, which is proxyied into the Web Servers namespace.

Enable GZip Compression

eXist-db by default operates inside the Jetty Application Server. Jetty (and most other Java Application Servers) provides a mechanism for enabling dynamic GZip compression of resources. In other words: Jetty can be configured to dynamically GZip compress any resource received from the server by HTTP. Enabling dynamic GZip compression can reduce the size of transfers, and as such reduce the transfer time of resources from the server to the client, hopefully resulting in a faster experience for the end-user.

GZip Compression can be enabled in web.xml, which can be found in either $EXIST_HOME/etc/webapp/WEB-INF/web.xml for default deployments or $EXIST_HOME/etc/jetty/standalone/WEB-INF/web.xml for standalone deployments.