TopBraid Suite (TBS) implements named graphs as specified by the relevant W3C specifications. A TBS workspace defines a dataset consisting of multiple named graphs. Each named graph is defined by a file in the workspace, identified by its base URI. Thus, the best way to understand how a workspace works is to view it as a quad triple store that directly or indirectly stores a variety of named graphs.
Only one workspace (dataset) can be loaded for any TBS product, including TopBraid Composer (TBC), TopBraid Live (TBL) and Enterprise Data Governance (EDG). Multiple workspaces can be defined, each representing a separate dataset, but only one can be loaded at a time. Since the workspace is a file directory, it can be located anywhere, but to avoid network latency, it is best for the workspace to be on the same machine as the TBS product.
Each named graph must have a name that uniquely identifies it. All RDF graphs managed by TBS use a base URI as a unique identifier for a graph within the scope of a workspace. The graphs may be files that store RDF data directly, named graphs stored within a TDB or RDBMS model, or connector files to the remotely stored RDF data. These include:
In most cases (with the exception of non-RDF files that TopBraid auto-converts to RDF on the fly as described above) that base URI that identifies a graph is saved within a file using one of the following types of statements:
For RDF/XML serialization:
xml:base="http://www.example.org/myExample/baseURI"
For Turtle and N-Triple serializations:
# baseURI: http://www.example.org/myExample/baseURI
Since there is no way to directly persist this statement within the remote RDF or relational databases, the statement is captured only in TopBraid connectors (the workspace graphs that stand as a proxy for the remote graphs).
TopBraid will assign base URIs to XML, HTML and Excel files based on their location within the TopBraid workspace and the file name. Thus, changing a location or name of such file will effectively change its identity, while making such changes for other files will have no bearing on their identity.
All graphs (with the exception of non-RDF files that TopBraid auto-converts to RDF on the fly) should also contain the following triple, which is shown here in the Turtle syntax:
<http://example.org/myExample/baseURI> a owl:Ontology .
At system startup, TBS products must scan the files to name the graphs—that is, to associate the base URIs with the files in the workspace. TopBraid will first look for either a # baseURI
or xml:base
(depending on serialization) statement at the beginning of the file. This avoids having to look through the entire file to find the owl:Ontology
statement. When # baseURI/xml:base
and owl:Ontology
statements are both present and conflict, precedence is given to the # baseURI/xml:base
statement.
Thus, each file must:
Include # baseURI
or xml:base
statement depending on the serialization. TopBraid cannot open a graph that is not identified by a name (the base URI).
Use a unique URI in the # baseURI/xml:base
statement (see more on this topic below).
Preferably, ensure that the subject in the owl:Ontology
statement and the URI in the # baseURI/xml:base
statement align.
If for some reason a # baseURI/xml:base
statement can not be provided, include one (and only one) triple with owl:Ontology
as an object, rdf:type
as a predicate and use a unique URI as a subject. This approach is not recommended as it is likely to hurt performance on system startup.
When graphs are created using TopBraid, TopBraid will ensure that the appropriate statements exist. Graphs created outside of TopBraid must take care to follow the rules specified above.
Other important points:
Two RDF graphs containing exactly the same set of triples can have different identities—that is, different base URIs. For example, using TopBraid:
One can take an RDF serialized file called graph1.ttl with the base URI http://www.example.org/graph1 and export it as a new RDF serialized file called graph2.rdf, giving it base URI http://www.example.org/graph2 and/or export it into an RDF database such as Jena TDB giving it base URI http://www.example.org/graph3 and naming the connector graph1db.tdb. These three graphs will contain exactly the same triples, but each will have its own identity—each are a different named graph.
One can create SPARQL Endpoint connectors to entire DBPedia content giving the connectors identities (a base URI) of http://www.example.org/dbpedia1 and http://www.example.org/dbpedia2 respectively. As long as in both cases we have specified the same SPARQL Endpoint access URL, every time either of those graphs is mentioned in TopBraid, requests for data will go to the same place.
When a graph is hosted by an external system (for example, an external SPARQL Endpoint or RDF database), the identity of a graph within TopBraid can be different from the identity of a graph within its host system. For example:
The DBPedia SPARQL Endpoint provides SPARQL access to multiple graphs including http://www.ontologyportal.org/WordNet#, http://www.ontologyportal.org/SUMO#, and others.
One can create a SPARQL Endpoint connector for DBPedia with identity http://www.example.org/dbpedia3 and specify as the named graph for this connector http://www.ontologyportal.org/WordNet#.
With this, the http://www.example.org/dbpedia3 graph essentially becomes a proxy for the http://www.ontologyportal.org/WordNet# graph. All requests to TopBraid for data from http://www.example.org/dbpedia3 will request data from DBPedia, limiting it to data that is part of the http://www.ontologyportal.org/WordNet# graph.
Base URIs (named graphs) are used by TopBraid in accordance with standards set in SPARQL and RDF (1.1). Here is a representative while not a comprehensive list:
In SPARQL queries when using GRAPH keyword, the named graph specified is the base URI. For example, GRAPH <http://topbraid.org/examples/kennedys> {}
will operate over the graph named http://topbraid.org/examples/kennedys.
Various SPARQLMotion modules use the base URI. For example, the graph to be loaded (in case of a file) or connected to (in case of a remote resource such as RDF or Relational DB) is identified by the base URI (named graph) by which it is known to TopBraid.
When using an owl:import statement, the identity of a graph mentioned in it is resolved according to the base URI (named graph) provided in the owl:import statement.
baseURI is an optional parameter for TopBraid built-in servlets such as describe and template. It identifies the named graph that a service will be executed against.
When a TBS product encounters a base URI, it will always first try to locate the named graph in the workspace. If the base URI specified is not associated with one of the graphs managed by TBS, TBS will attempt to access the graph in the location provided by the base URI. For example, if no TBS named graphs have the base URI http://www.example.org/graph1, TBS will attempt to connect to the web address http://www.example.org/graph1 to retrieve the graph data. Performance will depend on the response time of that web server. If the graph in question is not hosted in the appropriate manner by the server at the specified location, users will experience some delay as TBS confirms that there is no response (which would be the case with any example.org URIs).
TopBraid includes a component called FileRegistry that keeps an up-to-date list of all graphs managed by TopBraid. TopBraid Composer provides a view that shows FileRegistry content for a workspace. For more info see the File Registry View help panel. For TBS server products, a list of named graphs in the workspace is found through the Administrative Console at the Base URI Management link at the URLhttp://[host]:[port]/{evn,tbl}/tbl/admin/baseURIMgmt.
TopBraid can only operate over a single graph for a given base URI. This is identical to any other named graph implementation and consistent with the named graph specification—each graph must have a unique identity. This policy is enforced in TBS servers and by default in Composer. A preference is provided in TopBraid Composer to allow base URI conflicts (see below). However, the best and highly recommended practice is to avoid using the same base URI for two different files. TopQuadrant cannot provide support to resolve problems resulting from users attempting to keep and use multiple graphs with the same identity.
TopBraid Composer provides warnings when the uniqueness of the base URIs for the workspace graphs is violated. It also prevents users from uploading projects to TopBraid Live that either contain files with conflicting base URIs or contain files with base URIs that conflict with those already present in the TopBraid Live workspace.
The preference to allow base URI conflicts in Composer means that multiple files can use the same base URI, but only one file is defined as the "primary file" for the base URI. This can be used by advanced users to manipulate how data is imported, etc. If, however, there are two or more files with the same base URI, the conflict is resolved in the following way:
TopBraid Composer will decide that the base URI is associated with the file that has been updated most recently.
In TopBraid Composer only, the end user can alter this decision by opening the file that they want to be the "main" one for the URI. They will get a message asking them to confirm that this is what they want.