Database Performance and Caching

Access to RDF data stored in databases, other than Apache Jena TDB, is significantly slower than with data from memory (files). A large factor of the performance difference is due to network effects, esp the number of transactions. As a general rule, performance is better if the number of database queries is kept to a minimum.

If you have files that are large (i.e., 500K triples or more), we recommend saving them as TDB databases. You can do this by using Import > TopBraid Composer> Import RDF files into a new TDB. This will avoid loading the entire graph into memory and will result in you being able to open large graphs much faster and with less allocated RAM.

Start-up Inferencing

Graphs that contain many classes (and sub-class relationships) may take a while to open with TBC. This is because the system attempts to infer missing superclass relationships to make sure that each class shows up in the Classes View. The TopBraid preferences page contains an option to bypass this step to improve start-up performance. Note that if you use this option, you need to make sure that each named class in your model has at least one other named class as superclass. You can achieve this by running the start-up inferences normally once, and then assert all inferred rdfs:subClassOf triples of the direct subclasses of owl:Thing or rdfs:Resource. Then save the database and check the option to suppress the default superclass inference.

Incremental Caching

By default, TopBraid wrap each database (except Jena TDBs) with an in-memory cache. This in-memory cache is built up incrementally, and basically remembers every query that was done in the past, unless it exceeds the capabilities of the cache. For example, if a user has clicked on an instance to get all triples related to that instance, then the system will have cached these SPO patterns in memory, so that future requests against the same instance can be handled almost instantaneously, without having to round trip with the database. In order to facilitate this, TopBraid Composer pre-builds certain caches such as (*, rdf:type, *) so that the most commonly needed queries can be handled by the cache. The properties in that category can be configured in the TopBraid Composer Preferences.

There is an optional global setting (specified as system property topbraid.cacheAll=true which might be particularly useful for TopBraid Live applications. In this mode, the system will cache the whole database at start-up, and thus never have to read from the database again. This, of course, requires lots of memory and will significantly slow down start-up times.

One limitation of this caching approach are that certain triple matches may have too many matches and exceed the cache size. Another limitation is that it is often hard to anticipate which query patterns should be cached in advance, so that the system still needs to ask many small queries.