<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- $Id$ --><html xmlns:exist="http://exist.sourceforge.net/NS/exist" xmlns:sidebar="http://exist-db.org/NS/sidebar">
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Google Summer of Code 2008</title>
<link rel="stylesheet" href="../../styles/SyntaxHighlighter.css" type="text/css">
<link href="../../styles/default-style.css" type="text/css" rel="stylesheet">
<script src="../../styles/niftycube.js" type="text/javascript"></script><script src="../../scripts/syntax/sh-min.js" type="text/javascript"></script><script type="text/javascript">
                    window.onload = function() {
                        Nifty("h1.chaptertitle", "transparent");
                        Nifty("div.note", "transparent");
                        Nifty("div.example", "transparent");
                        Nifty("div.important", "transparent");
                        Nifty("div.block div.head", "top");
                        Nifty("div.block ul", "bottom");
                        
                        dp.SyntaxHighlighter.HighlightAll('code');
                    }
                </script>
</head>
<body bgcolor="#FFFFFF">
<div id="page-head">
<img src="../../logo.jpg"><div id="quicksearch">
<form method="GET" action="">
<input name="q" size="20" type="text"><input value="Search" type="submit">
</form>
</div>
<div id="navbar">
<h1>Open Source Native XML Database</h1>
</div>
</div>
<div id="content2col">
<div class="chapter">
<h1 class="chaptertitle">
<a name="N10020"></a>Google Summer of Code 2008</h1>
<ul class="toc">
<li>
<a href="#N1003B">1. Timeline</a>
</li>
<li>
<a href="#N10068">2. Accepted Projects</a>
<ul>
<li>
<a href="#N1006D">2.1. <a href="http://code.google.com/soc/2008/exist/app.html?csaid=KxYWFA5bHTMMNhAkFAZQWi1LOxwJSVlTBnAGYEBXSVgFUnAHaksC%0A">Distributed Search for eXist</a></a>
</li>
<li>
<a href="#N10086">2.2. <a href="http://code.google.com/soc/2008/exist/app.html?csaid=FRIKHDlQXSsENl0mEgdQXSACLTMDHgpYX28GNx5eQ14DB3UDaUpeXl8BUHAGPUtc%0A">Implement an XSLT 2.0 Processor</a></a>
</li>
</ul>
</li>
<li>
<a href="#N100A0">3. Suggested Project Ideas</a>
<ul>
<li>
<a href="#N100A5">3.1. Remote Debugging Interface</a>
</li>
<li>
<a href="#N100CB">3.2. Implement an XSLT 2.0 Processor</a>
</li>
<li>
<a href="#N10109">3.3. Distributed Search</a>
</li>
<li>
<a href="#N10118">3.4. Improved dynamic configuration/process control</a>
</li>
<li>
<a href="#N10137">3.5. Add index-support for order-by, distinct-values and aggregate functions</a>
</li>
</ul>
</li>
<li>
<a href="#N1014A">4. Mentors</a>
</li>
</ul>
<p>Last year eXist participated in the Google Summer of Code for the first time and after
        what we feel was a successful summer, we have decided to apply to participate again this year.</p>
<p>In 2007 we had two students working on an XQJ implementation and Fulltext extensions for XQuery respectively.
        The first version of the XQJ implementation is almost complete and waiting to be merged with the main code base. The
        Fulltext extensions for XQuery are waiting our XQuery parser refactorings to be merged.</p>
<p>Suggested projects for 2008 are listed below, however students may also propose their own projects. We
        suggest discussing it with us before submitting your application to ensure that it is suitable and viable in the Google Summer of Code 2008 frame.</p>
<p>For all questions concerning the Summmer of Code,
        contact our GSoC administrators <a href="mailto:existadmin@gmail.com">eXistAdmin@gmail.com</a>, send an email to the
        exist-open <a href="http://sourceforge.net/mail/?group_id=17691">mailing list</a>
        or meet us in <a href="http://irc.exist-db.org/">IRC</a>. For short questions, IRC
        is the preferred medium.</p>
<h2>
<a name="N1003B"></a>1. Timeline</h2>
<ul>
   	    	
<li>3rd - 12th March - eXist applies to Google Summer of Code</li>
   	    	
<li>13th - 14th March - eXists application considered by Google</li>
	    	
<li>14th - 26th March - Student Applications</li>
	    	
<li>26th March - 10th April - Appraisal and evaluation of student applications</li>
	    	
<li>11th April - Successful student applications announced</li>
	    	
<li>11th April - 27th May - Students learn about eXist community and project</li>
	    	
<li>28th May - Students begin coding</li>
	    	
<li>9th July - Students upload work in progress / Mentors start mid-term evaluation</li>
	    	
<li>16th July - Mentors finish mid-term evaluation</li>
	    	
<li>20th August - Students finish coding and start evaluation / Mentors start final evaluation</li>
	    	
<li>31st August - Students and Mentors finish final evaluation</li>
	    	
<li>3rd September - Students submit required code samples to Google</li> 
    	
</ul>
<h2>
<a name="N10068"></a>2. Accepted Projects</h2>
<h3>
<a name="N1006D"></a>2.1. <a href="http://code.google.com/soc/2008/exist/app.html?csaid=KxYWFA5bHTMMNhAkFAZQWi1LOxwJSVlTBnAGYEBXSVgFUnAHaksC%0A">Distributed Search for eXist</a>
</h3>
<p>
                
<ol>
                    
<li>Student: Sergej Rinc</li>
                    
<li>Mentor: Leif-J&ouml;ran Olsson</li>
                    
<li>
<a href="soc-dsrch.xml">Statement of Work</a>
</li>
                
</ol>
            
</p>
<h3>
<a name="N10086"></a>2.2. <a href="http://code.google.com/soc/2008/exist/app.html?csaid=FRIKHDlQXSsENl0mEgdQXSACLTMDHgpYX28GNx5eQ14DB3UDaUpeXl8BUHAGPUtc%0A">Implement an XSLT 2.0 Processor</a>
</h3>
<p>
                
<ol>
                    
<li>Student: Leela Manoranjan</li>
                    
<li>Mentor: Pierrick Brihaye</li>
                    
<li>
<a href="soc-xslt.xml">Statement of Work</a>
</li>                  
                
</ol>
            
</p>
<h2>
<a name="N100A0"></a>3. Suggested Project Ideas</h2>
<h3>
<a name="N100A5"></a>3.1. Remote Debugging Interface</h3>
<p>XQuery programs can get quite complex (scripts with more than 1000 lines are not uncommon), especially
	if they use a lot of modules. However, debugging the code is currently a tedious, time-consuming job due to the lack
	of tool support. While some commercial XML editors do already include XQuery debuggers (e.g. Oxygen), eXist lacks an appropriate
	debugging API to interface with them.</p>
<p>A remote debugging API should be implemented on top of the eXist server. This should at least include the
	ability to stop XQuery execution at predefined breakpoints, inspect the current query context and switch into single-step
	execution. A basic command-line or graphical debugging interface should be shipped with eXist. The Oxygen team
	already expressed their interest to support eXist from their commercial XQuery debugger.</p>
<p>
<em>Resources:</em>
</p>
<ul>
	  
<li>
	    
<p>see the debugging API in <a href="saxon.sf.net">saxon</a>
</p>
	  
</li>
	  
<li>
	    
<p>XQuery debugger in <a href="http://www.oxygenxml.com/xquery_debugger.html">Oxygen</a> (commercial product)</p>
	  
</li>
	
</ul>
<h3>
<a name="N100CB"></a>3.2. Implement an XSLT 2.0 Processor</h3>
<p>eXist aims to be compliant with XQuery 2.0 specifications. It would be interesting that the "sister" recommendation, XSLT 2.0, should be implemented as well,
	thus allowing XSLT 2.0 processing on (eventually huge) persistent documents. Most of the code is already here since both recommendations are built on-top of XPath 2.0. </p>
<p>However, this is still to be implemented:</p>
<ul>
	  
<li>
	    
<p>Clean separation of XPath 2.0 and XQuery 1.0 code. 
	    Exist used to have a dedicated package for XPath in the past: it somehow has to be revived and the XQuery 1.0 specific classes have to be moved to a dedicated package. 
	    Functions, including experimental grouping ones (which have to be improved with regard to performance) have to be moved as well.</p>
	  
</li>
	  
<li>
	    
<p>Write a dedicated XSLT 2.0 frontend to the existing XQuery 1.0 parser that would be used to build the expression tree.</p>
	  
</li>
	  
<li>
	    
<p>Attention should be drawn to performance concerns. Recent code is definitely more friendly to the programmer with regard to performance. Implementing an XSLT 2.0 processor could help in bringing even more improvements in this area.</p>
	  
</li>
	
</ul>
<p>
<em>Resources:</em>
</p>
<ul>
	  
<li>
	    
<p>
<a href="http://www.w3.org/TR/xslt20/">XSL Transformations (XSLT) Version 2.0</a>. W3C Recommendation.</p>
	  
</li>
	  
<li>
	    
<p>
<a href="http://monet.nag.co.uk/xq2xml//xsltest-20061026.zip">XQ2XML: XML syntaxes for XQuery</a>. A test suite by David Carlisle that provides an XSLT 2.0 syntax for some of the <a href="http://www.w3.org/XML/Query/test-suite/">XQuery test suite</a> tests.</p>
	  
</li>
	
</ul>
<h3>
<a name="N10109"></a>3.3. Distributed Search</h3>
<p>Implement a federated search service over distributed eXist databases. There are various reasons why users may
	have more than one database instance deployed, for example, to distribute load or to keep sensitive data in its own
	data store. Another important area of application would be in the context of grid computing.</p>
<p>Unfortunately, there's no simple way to combine results from distributed data stores in a single XQuery. eXist's
	query engine can only operate on local resources. It can retrieve data from external locations, but only to parse them into a
	local DOM tree, which is then used for querying. A distributed search facility would allow eXist to directly forward parts of an expression to a remote database
	instance. The XQuery specification already provides the necessary framework: the collection() and doc() functions
	both accept arbitrary URIs, so collections as well as resources can be at external locations.</p>
<p>The main challenge will be to properly merge intermediate results from different database instances and track references
	to remote node sequences throughout the query.</p>
<h3>
<a name="N10118"></a>3.4. Improved dynamic configuration/process control</h3>
<p>Until now most database parameters are configured in a central XML configuration file. This file is only
        read once during database startup. While many parameters should indeed not be changed at runtime, there are a few
        settings which could be modified without requiring a db restart. Examples include the current cache size settings, job scheduling,
        or index plugins. However, eXist does currently not provide an interface to modify settings at runtime. Some read-only settings are
        already exposed via Java Management Extensions (JMX) though.</p>
<p>The goal of the project would be to provide a common interface to dynamically configure certain aspects of the database
        instance at runtime. Ideally access to this interface should be provided via JMX. Additionally, the existing JMX mbeans should be extended
        to provide more control over jobs running on the db instance. For an administrator, it should be possible to view all running queries or 
        jobs, and modify their access permissions or even kill a process.
        </p>
<ul>
            
<li>
                
<p>
<a href="http://jcp.org/aboutJava/communityprocess/final/jsr003/index3.html">JSR 003</a>
</p>
            
</li>
            
<li>
                
<p>
<a href="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/">JMX resources</a>
</p>
            
</li>
        
</ul>
<h3>
<a name="N10137"></a>3.5. Add index-support for order-by, distinct-values and aggregate functions</h3>
<p>Sorting a set of nodes is a frequent operation in many XQuery applications. The "order by" clause in XQuery is
            very powerful and allows the definition of an arbitrary number of ordering specificiations to be applied on the tuple stream
            returned by a FLWOR expression.</p>
<p>However, ordering is quite expensive: for each tuple in the return sequence we have to evaluate all ordering expressions once
                and atomize the result, i.e. transform it into an atomic sequence. Atomization requires access to the actual node stored
                in the db, thus generating a huge amount of IO. As a result, "order by" expressions should always be applied with care. Query execution times will increase linearily with
                the size of the return sequence.</p>
<p>To improve this, eXist should at least provide indexed access to the atomized values needed for the ordering.
                Unfortunately, the existing index structures can not be directly used: the range index maps atomized node values to
                a sequence of node ids, while order by would need to order node ids by their node value. So either the existing range
                index has to be extended to support value lookups by node id or a new index structure has to be implemented.</p>
<p>Other XQuery operations could benefit from such an index as well: this includes the aggregate functions min, max
            and sum, as well as distinct-values.</p>
<h2>
<a name="N1014A"></a>4. Mentors</h2>
<p>The following people are available as mentors. Once your project has been accepted,
        you will be assigned one or two mentors to support you directly, however all mentors will provide support where required.</p>
<ul>
            
<li>
                
<p>Adam Retter (GSoC Administrator)</p>
            
</li>
            
<li>
                
<p>Wolfgang Meier</p>
            
</li>
            
<li>
                
<p>Leif-J&ouml;ran Olsson</p>
            
</li>
            
<li>
                
<p>Dannes Wessels</p>
            
</li>
            
<li>
                
<p>Pierrick Brihaye</p>
            
</li>
            
<li>
                
<p>Andrzej Jan Taramina</p>
            
</li>
            
<li>
                
<p>Piotr Kaminski</p>
            
</li>
        
</ul>
</div>
<div class="authors">
<div class="author first">Adam Retter<br>
</div>
</div>
</div>
</body>
</html>
