How to use WAQL-PP

This document is a guide which introduces the general architecture of the preprocessor and how it can be used in common use-case scenarios. It is divided into the following chapters.

The central API of the preprocessor is the so called PreprocessorEngine, an object which conveys the state of your WAQL query through the different preprocessing steps and also allows you to modify that state. There are basically three different steps to be performed on a WAQL query in the following order.

  1. Parsing: Reads and parses the WAQL query in textual form from the input and constructs an intermediate representation out of it. The input is read as a whole and any syntactical or grammatical anomalies will be discovered during this phase.
  2. Resolving of data dependencies: At this point the engine has generated a list of all unresolved data dependencies contained in the WAQL query. Those dependencies have to be resolved by the application before the actual transformation can be done.
  3. Transformation: Writes the final XQuery (without any WAQL extensions) to the output. This is the last phase in the preprocessor life-cycle.

The following chapters will explain how to take the engine through these steps and what the implication on your WAQL query are when doing so.

1. Parsing and transforming a query

In this first section of the guide we are dealing with a WAQL query that contains several Template Lists but is free of any Data Dependencies. As an example let’s look at the following query.

<method>
   <type>$(<call/>,<mail/>)</type>
   <id>$1(1 to 3)</id>
   <name>$1("Oliver","John","Mark")</name>
</method>

The first Template List in line 2 contains two direct element constructors, this illustrates that the elements inside a Template List can be arbitrary WAQL expressions of any kind. The next two Template Lists in line 3 and 4 are correlated because they are annotated with the same integer identifier right after the dollar sign. This means that their elements will be used in pairs rather than individually, it also requires them to have the same1 length. The final result of this query will be a set of six XML documents.

First we instantiate a new PreprocessorEngine which will perform the necessary steps. Note that usually each instance of the engine will be responsible for just one query and should be thrown away afterwards, reuse of engine objects is not recommended. The instantiation can be accomplished by using the PreprocessorFactory like this.

PreprocessorEngine engine = PreprocessorFactory.getEngine();

Next we can let the engine perform the parsing step which will read the given query as a whole, construct an internal intermediate representation and report any syntactical or grammatical anomalies in the query. If the query is given as a String object, the following snippet will accomplish this task.

InputStream input = new ByteArrayInputStream(query.getBytes());
try {
    engine.parse(input);
} catch (MalformedQueryException e) {
    // Perform error handling for anomalies in the query ...
}

Finally we can transform the constructed intermediate representation back into a valid XQuery without any further WAQL language extensions. You might also want to have the final result stored inside a String object, as done with the following lines.

OutputStream output = new ByteArrayOutputStream();
try {
    engine.transform(output);
} catch (UnresolvedDependencyException e) {
    // Will only happen if Data Dependencies are present in the query ...
}
String result = output.toString();

So now that we know how to perform the parsing and transformation steps, lets look at the final result the preprocessor produces. As mentioned earlier the below query will result in six XML documents when fed into a third-party XQuery engine.

for $_waql_2 in (<call/>,<mail/>)
let $_waql_1_1 := ("Oliver","John","Mark")
for $_waql_1 at $_waql_1_cnt in (1 to 3)
return (
<method>
   <type>{$_waql_2}</type>
   <id>{$_waql_1}</id>
   <name>{$_waql_1_1[$_waql_1_cnt]}</name>
</method>
)

Now we know how to transform WAQL queries not containing any Data Dependencies.

2. Resolving Data Dependencies inside a query

In this next example we want to take a look at Data Dependencies and how they can be resolved with the preprocessor. Note that the preprocessor only discovers those dependencies in the query, retrieving the actual data to resolve them with is the responsibility of the application. Once the data to resolve a specific dependency is collected, it can be fed back into the preprocessor engine. Let’s look at the following WAQL query.

<result>
    $23{//person[@id=${//user/id/text()} and @name=$7{//user/name/text()}]}
</result>

This WAQL query contains three Data Dependencies, one outermost dependency containing an XPath expression with two nested dependencies. Two of the dependencies have an integer identifier attached to them, which can be used to refer to a special data source in your application. The first observation is that those dependencies have to be resolved from innermost to outermost. If you parse this query as described in the previous chapter, the engine will report two available Data Dependencies (i.e. the two nested ones) through the getDependencies() method.

Collection<DataDependency> deps = engine.getDependencies();
for (DataDependency dep : deps)
	System.out.printf(">> #%d, %s\n", dep.getIdentifier(), dep.getRequest());
Output of the snippet:
>> #null, //user/id/text()
>> #7, //user/name/text()

Only after those two Data Dependencies are actually resolved by calling the resolveDependency() method, the outermost one will appear in the list of available dependencies. The reason is that the XPath expression inside the outermost one is still incomplete. Let’s say we resolve the first dependency with the integer value 42 and the second one with the string data “John”, we get an intermediate WAQL query looking like the following.

<result>
    $23{//person[@id=42 and @name='John']}
</result>

Note that this WAQL query is never actually produced, just the intermediate representation inside the preprocessor engine is adapted accordingly. This should illustrate why calling getDependencies() just once is insufficient to discover all dependencies. Instead the getter should be called on a regular basis. There are several patterns to achieve this goal like using a Set or a Queue to manage unresolved dependencies. One simplified scenario might be the following.

SortedSet<DataDependency> deps = new TreeSet<DataDependency>();
deps.addAll(engine.getDependencies());
while (!deps.isEmpty()) {
    DataDependency dep = deps.first();
    Object data = myFindData(dep.getIdentifier(), dep.getRequest());
    engine.resolveDependency(dep, data);
    deps.remove(dep);
    deps.addAll(engine.getDependencies());
}

If we finally replace the outermost dependency with some data object, lets say a org.w3c.dom.Element object containing the personal data, we finally resolve all dependencies and can continue with the transformation step as described in the last chapter. This will produce a final XQuery looking like the following.

<result>
    <person id="42" name="John">Some personal data!</person>
</result>

Now we know how to resolve Data Dependencies inside our WAQL queries.

3. Extending the printer pipeline to format objects

In this last chapter we want to customize the resolving process a bit. Up to this point we only used well-known data object types to resolve the Data Dependencies with. For example we used java.lang.String, java.lang.Integer and org.w3c.dom.Element in the previous chapter, the preprocessor engine knows how to convert those into valid XQuery expressions. But how about a custom data type like the following?

public class MyDataPair {
	private Integer id;
	private String name;
}

One could use the usual toString() method to do the conversion, but that would not be flexible at all. Fortunately the preprocessor provides a mechanism to solve this issue. The conversion of an object into it’s textual representation is done by the Printer Pipeline, a list2 of predefined DataPrinter instances which all handle different object types. It is easy to extend the predefined Printer Pipeline by adding a new DataPrinter instance on top of the list with the following lines.

DataPrinter printer = new MyDataPairPrinter();
engine.addDataPrinter(printer);

The implementation of the DataPrinter can easily rely on the already defined printers by recursively calling into the pipeline. This is demonstrated in the following example, which converts our MyDataPair into a well-known org.w3c.dom.Element type, which in turn is fed into the pipeline again.

public class MyDataPairPrinter implements DataPrinter {
	public boolean canHandle(Object object) {
		return (object instanceof MyDataPair);
	}
	public String printAsText(Object object, DataPrinter pipeline) {
		MyDataPair pair = (MyDataPair) object;
		Element ePair = createElement("pair");
		Element eId = createElement("id");
		Element eName = createElement("name");
		eId.setTextContent(pair.getId());
		eName.setTextContent(pair.getName());
		ePair.appendChild(eId);
		ePair.appendChild(eName);
		return pipeline.printAsText(ePair);
	}
}

With the above printer you can resolve the Data Dependency in the following WAQL query with a MyDataPair object and it will be transformed correctly.

<result>
    ${//people/self}
</result>
<result>
    <pair><id>23</id><name>John</name></pair>
</result>

Now we know how to customize the conversion of data objects while resolving Data Dependencies. Having read this guide you should now be able to use the WAQL Preprocessor in your application. Have fun coding!

Footnotes

1 To be more specific it requires the first Template List in a correlated set to be shorter or equal in length compared to all other Template Lists in the same set.

2 The Printer Pipeline iterates through the list until a DataPrinter is reached which can handle the given object type. The pipeline will then use this result. Note that the last DataPrinter can handle all java.lang.Object objects and will use the usual toString() method as a fall-back for unknown object types.