XML is one of the most widely used formats to exchange data between applications. In order to lift this XML-based data into a format that is processable by semantic technology tools, a suitable translation mechanism is required.
Semantic XML is a technology from TopQuadrant for mapping arbitrary XML documents to RDF. Semantic XML can be used to convert XML files to OWL so that you can run SPARQL queries etc on them. You can also manipulate the resulting OWL model and then save it back to XML. In order to control the Semantic XML mappings, you can use process ontologies, which can usually be generated by importing them from XML Schemas. (For information on importing XML Schemas, see the XSD to OWL Importer.)
The following example screenshot shows how a simple example XML file has been rendered into OWL using TopBraid:
Semantic XML is based on a small ontology
that defines various classes and attributes to enable round-tripping between
XML and OWL. In particular it introduces two annotation properties
sxml:element
and sxml:attribute
that are used to
define the mapping. When an XML file is opened with TopBraid, it will apply
the following mapping strategy:
Process
.name
.sxml:element
that points back to the XML element name or the full URI of the element.sxml:attribute
that points back to the XML attribute name.Process
.composite:child
from the composite ontology. The ordering of children is kept using composite:index
property values.sxml:prefix
.If a process ontology is used to control the Semantic XML mapping, then the following mapping strategy will be applied:
sxml:element
or a sxml:tag
property value.sxml:attribute
property value. The attribute or child element instance value will be mapped into a typed literal, where the type is specified as the owl:allValuesFrom
property value in a restriction on the OWL class of the mapped instance.sxml:tag
property value. The element instance value will be mapped into an anonymous instance of the OWL class, which is specified as the owl:allValuesFrom
property value in a restriction on the OWL class of the mapped parent instance.sxml:tag
property value. The attribute or child element instance value will be mapped into an existing instance with the same dtype:value
property value and has an OWL class, which is a subclass of an EnumeratedValue
.composite:index
property values of the mapped instances or sxml:order
property values in the restrictions on properties related to the child element.xsi:type
, then the mapping from that type will override the default type.For more information on process ontology mapping strategies, see the XSD to OWL importer. If there is no XML Schema available, then it is also possible to build a process ontology manually or using SPIN rules.
When you double-click an XML file in the Project Explorer, Eclipse is usually configured to open it in an XML editing mode. To tell TopBraid Composer to open the file as Semantic XML, right-click it and pick Open With, which displays a cascade menu. On the cascade menu, pick TopBraid (Semantic XML Documents).
Any workspace file ending with .xml can be opened this way.
The system will automatically convert its XML contents into Semantic XML
format. The XML file can also be imported into other OWL ontologies using
the Imports View. This enabled you to define
customized OWL mappings. With the example XML file above, you could define
a separate process ontology that has sxml:element
,
sxml:attribute
and/or sxml:tag
annotations attached to it, and then import
the XML instances file into an ontology that imports the process ontology
as well. Then the importer will not generate new classes on the fly, but
try to reuse the existing ones from the process ontology. You can use the
XML Schema Import engine to create such ontologies
with associated SXML annotations.
When you open an .xml file, the system will ask you whether you have a
schema with Semantic XML annotations somewhere in your workspace. This
schema could have been the result of previous executions of the XML
Schema importer (as explained in Import XSD
and corresponding XML) or by generalizing the SXML created by
opening a similar XML file in the past. If you do have such a file,
you can select this and specify whether you want this schema to be
used for this .xml
file only, all files in the same folder, or all
files in the project. If you finish this dialog, the system will
create a file ending with .sxml
in your workspace. If
such a file is present, then you will not be asked about the schema
file again, but instead the SXML loader will automatically instantiate
the existing classes.
Some file types have been fine tuned to have an optimized mapping to
Semantic XML format. Currently, this is XHTML and XML Schema.
For example, files ending with .html
can be directly opened with TopBraid and
will be instantiated into a dedicated XHTML ontology.
The HTML importer will run an HTML tidy algorithm in case the HTML file
is not well-formed XHTML.
Although the results of the Semantic XML importer are normal RDF resources, TopBraid provides some customized capabilities to display XML elements. In particular, you can use the Associations View to view (and edit) the XML hierarchy. In order to open it, right-click on composite:child in the Properties View, and select Show in Associations View. As shown in the middle of the screenshot above, the Associations View renders the XML element instances in an XML-like notation, so that it becomes easier to see the resulting XML structure.
Another way of exploring the XML elements and their relationships is through the form. The items listed under composite:child can be recursively expanded using the small + button over the icon.
All the generic RDF querying and inferencing capabilities of TopBraid Composer are available to analyze the XML elements. For example you can use the SPARQL view to find all XML elements that fulfill a certain pattern.
OWL models imported from XML files with Semantic XML are editable.
When modified and saved, the system will overwrite the original XML file.
This enables you, for example, to delete or replace certain attribute values
in TopBraid. Note that the output by default will be another XML file that
may not contain all triples. For example if you have added values of
properties that are not mapped into XML (such as an owl:versionInfo
)
to some element, then this information will be lost. However, you can save
any XML model imported with Semantic XML to a triple format such as RDF/XML or
Turtle files, and then export the XML data as needed. This enables you to store
additional data in the XML structure without losing information. In order to
export an Semantic XML model (or parts of it) to an XML file, select the root
resource and execute the item Export to XML File... in the resource
context menu. This approach can also be used if XML needs to be created from
scratch, i.e. without starting from an existing XML document.