Creating, Importing, Querying, Saving XML documents with Semantic XML

XML is one of the most widely used formats to exchange data between applications. In order to lift this XML-based data into a format that is processable by semantic technology tools, a suitable translation mechanism is required.

Semantic XML is a technology from TopQuadrant for mapping arbitrary XML documents to RDF. Semantic XML can be used to convert XML files to OWL so that you can run SPARQL queries etc on them. You can also manipulate the resulting OWL model and then save it back to XML. In order to control the Semantic XML mappings, you can use process ontologies, which can usually be generated by importing them from XML Schemas. (For information on importing XML Schemas, see the XSD to OWL Importer.)

The following example screenshot shows how a simple example XML file has been rendered into OWL using TopBraid:

XMap Example 

 

How does Semantic XML work?

Semantic XML is based on a small ontology that defines various classes and attributes to enable round-tripping between XML and OWL. In particular it introduces two annotation properties sxml:element and sxml:attribute that are used to define the mapping. When an XML file is opened with TopBraid, it will apply the following mapping strategy:

If a process ontology is used to control the Semantic XML mapping, then the following mapping strategy will be applied:

For more information on process ontology mapping strategies, see the XSD to OWL importer. If there is no XML Schema available, then it is also possible to build a process ontology manually or using SPIN rules.

Importing XML Files

When you double-click an XML file in the Project Explorer, Eclipse is usually configured to open it in an XML editing mode. To tell TopBraid Composer to open the file as Semantic XML, right-click it and pick Open With, which displays a cascade menu. On the cascade menu, pick TopBraid (Semantic XML Documents).

Any workspace file ending with .xml can be opened this way. The system will automatically convert its XML contents into Semantic XML format. The XML file can also be imported into other OWL ontologies using the Imports View. This enabled you to define customized OWL mappings. With the example XML file above, you could define a separate process ontology that has sxml:element, sxml:attribute and/or sxml:tag annotations attached to it, and then import the XML instances file into an ontology that imports the process ontology as well. Then the importer will not generate new classes on the fly, but try to reuse the existing ones from the process ontology. You can use the XML Schema Import engine to create such ontologies with associated SXML annotations.

When you open an .xml file, the system will ask you whether you have a schema with Semantic XML annotations somewhere in your workspace. This schema could have been the result of previous executions of the XML Schema importer (as explained in Import XSD and corresponding XML) or by generalizing the SXML created by opening a similar XML file in the past. If you do have such a file, you can select this and specify whether you want this schema to be used for this .xml file only, all files in the same folder, or all files in the project. If you finish this dialog, the system will create a file ending with .sxml in your workspace. If such a file is present, then you will not be asked about the schema file again, but instead the SXML loader will automatically instantiate the existing classes.

Some file types have been fine tuned to have an optimized mapping to Semantic XML format. Currently, this is XHTML and XML Schema. For example, files ending with .html can be directly opened with TopBraid and will be instantiated into a dedicated XHTML ontology. The HTML importer will run an HTML tidy algorithm in case the HTML file is not well-formed XHTML.

Viewing and querying Semantic XML models

Although the results of the Semantic XML importer are normal RDF resources, TopBraid provides some customized capabilities to display XML elements. In particular, you can use the Associations View to view (and edit) the XML hierarchy. In order to open it, right-click on composite:child in the Properties View, and select Show in Associations View. As shown in the middle of the screenshot above, the Associations View renders the XML element instances in an XML-like notation, so that it becomes easier to see the resulting XML structure.

Another way of exploring the XML elements and their relationships is through the form. The items listed under composite:child can be recursively expanded using the small + button over the icon.

All the generic RDF querying and inferencing capabilities of TopBraid Composer are available to analyze the XML elements. For example you can use the SPARQL view to find all XML elements that fulfill a certain pattern.

Creating XML Files

OWL models imported from XML files with Semantic XML are editable. When modified and saved, the system will overwrite the original XML file. This enables you, for example, to delete or replace certain attribute values in TopBraid. Note that the output by default will be another XML file that may not contain all triples. For example if you have added values of properties that are not mapped into XML (such as an owl:versionInfo) to some element, then this information will be lost. However, you can save any XML model imported with Semantic XML to a triple format such as RDF/XML or Turtle files, and then export the XML data as needed. This enables you to store additional data in the XML structure without losing information. In order to export an Semantic XML model (or parts of it) to an XML file, select the root resource and execute the item Export to XML File... in the resource context menu. This approach can also be used if XML needs to be created from scratch, i.e. without starting from an existing XML document.