REST-Style Web API

(2Q19)


eXist-db provides a REST-style (or RESTful) API through HTTP, which provides a simple and quick way to access the database. To use this API, all one needs is an HTTP client, which is provided by nearly all programming languages and environments. Or simply use a web-browser…

Introduction

In the standard eXIst-db configuration, the system will listen for REST request at:

http://localhost:8080/exist/rest/

The server treats all HTTP request paths as paths to a database collection (instead of the file system). Relative paths are resolved relative to the database root collection. For instance:

http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml

The server will receive an HTTP GET request for the resource hamlet.xml in the collection /db/shakespeare/plays. Itr will look for this collection and check if the resource is available. If so, it will retrieve its contents and send this back to the client. If the document does not exist, an HTTP 404 (Not Found) status response will be returned.

To keep the interface simple, the basic database operations are directly mapped to HTTP request methods wherever possible:

GET

Retrieves a resource or collection from the database. XQuery and XPath queries may also be specified using GET's optional parameters applied to the selected resource. See GET Requests.

PUT

Uploads a resource to the database. If required, collections are automatically created and existing resources overwritten. See PUT Requests.

DELETE

Removes a resource (document or collection) from the database. See DELETE Requests.

POST

Submits an XML fragment in the content of the request. This fragment specifies the action to take. The fragment can be either an XUpdate document or a query request. Query requests are used to pass complex XQuery expressions too large to be URL-encoded. See POST Requests.

When running eXist-db as a stand-alone server(when the database has been started using the shell-script bin/server.sh (Unix) or batch file bin/server.bat (Windows/DOS)), HTTP access is supported through a simple, built-in web server. This web server has limited capabilities, restricted to the basic operations defined by eXist's REST API (GET, POST, PUT and DELETE).

When running in a servlet-context (the usual way of starting eXist-db), this same server functionality is provided by the EXistServlet.

Both the stand-alone server and the servlet rely on Java class org.exist.http.RESTServer to do the actual work.

HTTP Authentication

Authentication is done through the basic HTTP authentication mechanism so only authenticated users can access the database. If no username and password are specified, the server assumes a "guest" user identity, which has limited capabilities. If the username submitted is not known, or an incorrect password is submitted, an error page (403 - Forbidden) is returned.

GET Requests

If the server receives an HTTP GET request, it first checks the request for known parameters. If no parameters are given or known it will try to locate the collection or document specified in the URI database path and return a representation of this resource the client.

When the located resource is XML, the returned content-type attribute value is application/xml and for binary resources application/octet-stream.

If the path resolves to a database collection, the retrieved results are returned as an XML fragment. For example:

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">
  <exist:collection name="/db/xinclude" owner="guest" group="guest" permissions="rwur-ur-u">
    <exist:resource name="disclaimer.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
    <exist:resource name="sidebar.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
    <exist:resource name="xinclude.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
  </exist:collection>
</exist:result>

If an xml-stylesheet processing instruction is found in a requested XML document, the database will try to apply this stylesheet before returning the document. A relative path will be resolved relative to the location of the source document. For example, if the document hamlet.xml, which is stored in collection /db/shakespeare/plays contains the XSLT processing instruction:

<?xml-stylesheet type="application/xml" href="shakes.xsl"?>

The database will load the stylesheet from /db/shakespeare/plays/shakes.xsl.

GET accepts the following optional request parameters (which must be URL-encoded):

_xsl=XSL Stylesheet

Applies an XSL stylesheet to the requested resource. A relative path is considered relative to the database root collection. This option will override any XSL stylesheet processing instructions found in the source XML file.

Setting _xsl to no disables any stylesheet processing. This is useful for retrieving unprocessed XML from documents that have a stylesheet declaration.

Warning:

If your document has a valid XSL stylesheet declaration, the web browser may still decide to apply the XSL. In this case, passing _xsl=no has no visible effect, though the XSL is now rendered by the browser, not eXist.

_query=XPath/XQuery Expression

Executes the query specified. The collection or resource referenced in the request path is added to the set of statically known documents for the query.

_indent=yes | no

Whether to return indented pretty-printed XML. The default value is yes.

_encoding=Character Encoding Type

Sets the character encoding for the resulting XML. The default value is UTF-8.

_howmany=Number of Items

Specifies the maximum number of items to return from the result sequence. The default value is 10.

_start=Starting Position in Sequence

Specifies the index position of the first item in the result sequence to return. The default value is 1.

_wrap=yes | no

Specifies whether the returned query results must be wrapped in a parent <exist:result> element. The default value is yes.

_source=yes | no

Specifies whether the query should display its source code instead of being executed. The default value is no. See the <allow-source> section in descriptor.xml about explicitly allowing this behaviour.

_cache=yes | no

If set to yes, the results of the current query are stored in a session on the server. A session id will be returned with the response. Subsequent requests can pass this session id via the _session parameter. If the server finds a valid session id, it will return the cached results instead of re-evaluating the query. See below.

_session=session id

Specifies a session id returned by a previous query request. Query results will be read from the cached session.

_release=session id

Release the session identified by session id.

As an example: The following URI will find all <SPEECH> elements in the collection /db/shakespeare with "Juliet" as the <SPEAKER>. As specified, it will return 5 items from the result sequence, starting at position 3:

http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=3&_howmany=5

PUT Requests

Documents can be stored or updated in the database using an HTTP PUT request. The request URI points to the location where the document must be stored. As defined by the HTTP specifications, an existing document at the specified path will be updated. Any collections defined in the path that do not exist are created automatically.

For example, the following Python script stores a document (the name is specified on the command-line) in the database collection /db/test,. This will be created if it does not exist. Note that the HTTP header field content-type is specified as application/xml, since otherwise the document would be stored as a binary resource.

import httplib
import sys
from string import rfind

collection = sys.argv[1]
file = sys.argv[2]

f = open(file, 'r')
print "reading file %s ..." % file
xml = f.read()
f.close()

p = rfind(file, '/')
if p > -1:
    doc = file[p+1:]
else:
    doc = file
print doc
print "storing document to collection %s ..." % collection
con = httplib.HTTP('localhost:8080')
con.putrequest('PUT', '/exist/rest/%s/%s' % (collection, doc))
con.putheader('Content-Type', 'application/xml')
clen = len(xml)
con.putheader('Content-Length', `clen`)
con.endheaders()
con.send(xml)

errcode, errmsg, headers = con.getreply()

if errcode != 200:
    f = con.getfile()
    print 'An error occurred: %s' % errmsg
    f.close()
else:
    print "Ok."

DELETE Requests

DELETE removes a collection or resource from the database.

POST Requests

POST requests require an XML fragment in the content of the request. This fragment specifies the action to take.

  • If the root node of the fragment uses the XUpdate namespace (http://www.xmldb.org/xupdate), the fragment is sent to the XUpdateProcessor to be processed.

  • Otherwise the root node must have the namespace for eXist requests (http://exist.sourceforge.net/NS/exist). The fragment is interpreted as an extended query request. Extended query requests can be used to post complex XQuery scripts that are too large to be encoded in a GET request.

The structure of the POST XML request is as follows:

<query xmlns="http://exist.sourceforge.net/NS/exist" start="[first item to be returned]" max="[maximum number of items to be returned]" cache="[yes|no: create a session and cache results]" session-id="[session id as returned by previous request]">
  <text>
    [XQuery expression]
  </text>
  <properties>
    <property name="[name1]" value="[value1]"/>
  </properties>
</query>

The root element query identifies the fragment as an extended query request. The XQuery expression for this request is enclosed in the text element. The start, max, cache and session-id attributes have the same meaning as the corresponding GET parameters (see GET Requests).

You may have to enclose the XQuery expression in a CDATA section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors.

Optional output properties, such as pretty-print, can be passed in the <properties> element.

An example of POST for Perl is provided below:

require LWP::UserAgent;

$URL = 'http://localhost:8080/exist/rest/db/';
$QUERY = <<END;
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns="http://exist.sourceforge.net/NS/exist"
    start="1" max="20">
    <text>
        for \$speech in //SPEECH[LINE &= 'corrupt*']
        order by \$speech/SPEAKER[1]
        return
            <hit>{\$speech}</hit>
    </text>
    <properties>
        <property name="indent" value="yes"/>
    </properties>
</query>
END

$ua = LWP::UserAgent->new();
$req = HTTP::Request->new(POST => $URL);
$req->content_type('application/xml');
$req->content($QUERY);

$res = $ua->request($req);
if($res->is_success) {
    print $res->content . "\n";
} else {
    print "Error:\n\n" . $res->status_line . "\n";
}

The returned query results are enclosed in an <exist:result> element:

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" hits="2628" start="1" count="10">
  <SPEECH>
    <SPEAKER>
      BERNARDO
    </SPEAKER>
    <LINE>
      Who's there?
    </LINE>
  </SPEECH>
  ... more items follow ...
</exist:result>

Calling Stored XQueries

The REST interface supports executing stored XQueries on the server. If the target resource of a GET or POST request is a binary resource with the mime-type application/xquery, the REST server will try to compile and execute it as an XQuery script. The script has access to the entire HTTP context, including parameters and session attributes.

Stored XQueries are a good way to provide dynamic views on data or create small services. However, they can do more: because you can also store binary resources like images, CSS stylesheets or Javascript files into a database collection, it is entirely possible to serve a complex application out of the database. For instance, have a look at the example Using XQuery for Web Applications on the demo server.

Cached Query Results

When executing queries using GET or POST, the server is able to cache query results in a server-side session. These results are cached in memory.

Memory consumption will be low for query results which reference nodes stored in the database and high for nodes constructed within the XQuery itself.

To create a session and store query results, pass _cache=yes with a GET request or set attribute cache="yes" within the XML payload of a POST query request. The server will execute the query as usual. If the result sequence contains more than one item, the entire sequence will be stored into a newly created session.

The id of the created session is included in the response. For requests which return a <exist:result> wrapper element, the session id will be specified in the exist:session attribute. The session id is also available in the HTTP header X-Session-Id.

The following example shows an example of the HTTP header and <exist:result> tag returned by the server:

HTTP/1.1 200 OK
Date: Thu, 01 May 2008 16:28:16 GMT
Server: Jetty/5.1.12 (Linux/2.6.22-14-generic i386 java/1.6.0_03
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Last-Modified: Tue, 29 Apr 2008 20:34:33 GMT
X-Session-Id: 2
Content-Type: application/xml; charset=UTF-8
Content-Length: 4699

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" 
    exist:hits="3406" exist:start="1" exist:count="10" 
    exist:session="2">
...
</exist:result>

The session id can be passed with subsequent requests to retrieve further chunks of data without re-evaluating the query. For a GET request, pass the session id with parameter _session. For a POST request, add an attribute session="sessionId" to the XML content of the request.

If the session does not exist or has timed out, the server will re-evaluate the query. The timeout is set to 2 minutes.

A session can be deleted by sending a GET request to an arbitrary collection URL. Pass the session id in the _release parameter:

http://localhost:8080/exist/rest/db?_release=0