WAQL-PP: Preprocessor for a Data Aggregation Query Language
This week I started to design and implement a preprocessor for the Web-service Aggregation Query Language (WAQL) which is an extension of XQuery. This language is used as part of the WS-Aggregation framework developed at the Distributed Systems Group of the Vienna University of Technology. With this text I want to explain the motivation behind WAQL and how the preprocessor will be designed. The motivation is nicely stated as part of my task description.
The key idea of WAQL is that it provides a convenience syntax for XQuery, which otherwise tends to become complex and hardly comprehensible in bigger scenarios. WAQL queries are transformed into valid XQuery expressions, which are finally executed by a (third-party) XQuery engine.
First of all we need to get a grasp of what the WAQL extensions to the XQuery language are. Since WAQL is still in its experimental stages, there is no exact specification of the language and it may change or grow over time. At the moment WAQL consists of two language constructs:
- Template Lists: This extension tries to simplify the specification of generated inputs. It basically is syntactical sugar representing a XQuery for-loop construct and as such can be transformed easily.
- Data Dependencies: This second extension is the interesting one, it can express dependencies between several different queries. The framework has to identify these dependencies and execute the queries in a valid order, so that all dependencies can be resolved.
The above two constructs should explain why the actual transformation has to be split into several phases which can be triggered by the framework at different points in time. The separate steps the preprocessor has to perform are as follows:
- Parsing: The textual WAQL query is parsed and an intermediate representation is constructed. Since WAQL is an extension which enhances the set of expressions for the XQuery language, the actual parser has to understand the full XQuery grammar. This may sound like a lot of work, but the XQuery specification provides a detailed description of the grammar in about 140 EBNF rules. So defining a valid parser is a doable job.
- Resolving of data dependencies: At this point the preprocessor has generated a list of all unresolved data dependencies. However the preprocessor has no idea which other queries are linked to the one currently being processed. So the actual resolving has to be done by the framework, the preprocessor just adapts the intermediate representation to the data provided by the framework.
- Transformation: Once all dependencies have been resolved the intermediate representation can be transformed back into a textual XQuery (without any WAQL extensions), which can then be passed on to a third-party XQuery engine.
Now that the basic operations are defined, we are able to give a rough description of the WAQL preprocessor and how it can be embedded into the existing framework. The two basic modules are a generated parser (obviously performing the parsing step) and a driving engine (performing the resolving and transformation steps). The parser will most certainly be generated using the JavaCC parser generator. The below graphic should explain the architecture.
Note that the above explanation is written from the compiler-constructor point of view, it just covers the preprocessor as part of the framework. All the other nasty details of WS-Aggregation are beyond the scope of this text. If you are interested you should read the paper or contact Waldemar Hummer who was kind enough to explain it to me. Also I will continue to write about the ongoing development of the preprocessor, so stay tuned.