top of page

Using SPARQL Subgraphs to Segment TBox and ABox

  • Writer: Michael DeBellis
    Michael DeBellis
  • 6 hours ago
  • 7 min read

First, a minor point about comments: I encourage comments but thanks to Rob Rovetto I realized that someone was spamming the comments (thanks Rob for pointing that out). There was so much spam it just wasn't worth it to delete each one manually and I didn't see a way in Wix to just delete from a specific user. So I just deleted all the comments. Sorry if that deleted any comments you made and that I never replied to. I've set up a moderation rule now that I think should detect and automatically delete spam in the future.


One thing I like to do is work on problems that I've seen people mention on the Protégé user support email list. A common question I've seen is how to separate TBox entities from ABox entities. One reason to do this is that often when you are working with a triplestore graph database you end up with so much data that there is no way you can load the entire graph into Protégé. Being able to separate the definitions of entities from the data allows you to still view the ontology in Protégé without all the data, which typically you don't want to see in Protégé anyway. The next step would be to find a way to edit the ontology in Protégé and load the changes back into the triplestore. But one thing at a time. I think this may be interesting because in addition to being useful, the SPARQL code below shows how to really dig into things like the axioms that make up defined classes which (this always seems to happen) turned out to be a lot more difficult than I thought. The file with all the SPARQL queries can be found here: https://github.com/mdebellis/SPARQL_Tools/blob/main/TBox%20and%20ABox%20Partitioning.txt And the example ontology I'll be using is called the People Ontology. It shows various kinds of social and family relations as well as other easy to understand concepts such as pets. It's an ontology I find useful for people learning OWL in addition to the Pizza tutorial. That file is here: https://github.com/mdebellis/SPARQL_Tools/blob/main/ontologies/People_Ontology.ttl I'm going to go through all the SPARQL code in that first file and explain what it is doing and why.


I'll be using the AllegroGraph SPARQL tools but these should work in any tool that has a complete implementation of the SPARQL specification (which of course unfortunately doesn't include either of the SPARQL tools in Protégé). I'll be using a new concept that I've never worked with before in SPARQL called a subgraph. Of course when Protégé saves an OWL file it creates an RDF graph. A subgraph is simply a way to delineate a specific part of the larger graph that you want to keep separate from the entire graph. The SPARQL code creates two subgraphs: one for the TBox entities and one for the ABox entities. However, the TBox is the hard one so that is the one I'll describe.


Also, a caveat about terminology: a very small number of things that are needed to create the complete ontology in Protégé may technically not be in the TBox. I asked both ChatGPT and the Protégé email list and in both cases the answers weren't certain. For example, in the People ontology I have a defined class called Gender that is an enumerated class with the definition: {Female, Male, Non_Binary}. Those are each individuals so one might argue they go in the ABox. I think that they define the terminology rather than specific data so I would think they go in the TBox. Some people on the Protégé list agreed with me, others said no they are individuals and individuals go into the ABox. The bottom line is it doesn't matter for this use case, you need them to give you the complete ontology model so they will go in the TBox subgraph regardless of whether or not the formal answer is that they are in the ABox.


Step 0: Prefixes and Graph initialization

The beginning of the file has the standard prefixes. Note: if you use it you will want to change the last two prefixes:

PREFIX : <http://michaeldebellis.com/people/>

The first prefix is to provide an IRI for your sub-graphs that is different from that of the ontology. Keeping graph IRIs in their own namespace avoids mixing ontology IRIs with graph IRIs. The second is of course the IRI for the ontology which you'll want to change with the ontology for your IRI. The next expression is:

DROP SILENT GRAPH g:TBox ;

This removes the named graph g:TBox if it exists; “SILENT” suppresses the error if it doesn’t.

This ensures re-running the script won’t duplicate old triples—every run starts from a clean TBox.


Step 1: Core schema triples

INSERT { GRAPH g:TBox { ?s ?p ?o } }

WHERE {{ # schema subjects

?s a ?type .

VALUES ?type {

owl:Class rdfs:Class

rdf:Property owl:ObjectProperty owl:DatatypeProperty owl:AnnotationProperty owl:OntologyProperty

owl:Ontology}

?s ?p ?o .}

UNION

{ # schema predicates (expanded)

?s ?p ?o .

VALUES ?p {

rdfs:subClassOf rdfs:subPropertyOf rdfs:domain rdfs:range

owl:equivalentClass owl:equivalentProperty owl:disjointWith owl:disjointUnionOf

owl:inverseOf owl:propertyChainAxiom

owl:hasKey

owl:intersectionOf owl:unionOf owl:complementOf

owl:onProperty owl:someValuesFrom owl:allValuesFrom owl:hasValue

owl:minQualifiedCardinality owl:maxQualifiedCardinality owl:qualifiedCardinality

owl:minCardinality owl:maxCardinality owl:cardinality

owl:onDatatype owl:withRestrictions

owl:onClass owl:onDataRange # <-- add these

owl:hasSelf # <-- and this (just in case)}}

UNION

{ ?s a owl:Restriction .

BIND(rdf:type AS ?p) BIND(owl:Restriction AS ?o) }};

Step 1 scoops up everything you need to rebuild class/property definitions, and nothing that looks like instance data. We’re trying to grab all schema-defining triples, and there isn’t one pattern that catches them all, so we combine three different patterns using UNION:

  1. If the subject is declared as a schema thing (a class or property), copy all its outgoing triples.

    This gets triples such as: :Adult rdfs:subClassOf :Person, :has_age rdfs:range xsd:integer, labels/comments on classes & properties, the ontology header, etc.

    It won’t touch ABox entities, because we only accept subjects typed owl:Class, owl:ObjectProperty,...

  2. Copy any triple whose predicate is an OWL/RDFS axiom. This catches the stuff with blank-node subjects or subjects that aren’t explicitly typed: e.g., owl:onProperty :has_age, owl:someValuesFrom, owl:intersectionOf , owl:withRestrictions, etc.

  3. The final part of the union makes sure that every restriction node has rdf:type owl:Restriction.

If you've never used VALUES before it is a powerful tool to iterate through a group of specific nodes. E.g., VALUES ?z { "abc" "def" } binds ?z to "abc" and "def". You can also use VALUES to bind tuples. E.g., VALUES (?x ?y) { (:Person 1) (:Gender 2) } will bind (?x ?y) to (:Person 1) and (:Gender 2).


Step 2: List Elements (oneOf / unionOf / intersectionOf ...)

INSERT { GRAPH g:TBox { ?lnode ?lp ?lo } }

WHERE {?owner ?schemaListPred ?head .

VALUES ?schemaListPred {

owl:oneOf owl:unionOf owl:intersectionOf

owl:propertyChainAxiom owl:hasKey

owl:withRestrictions # <-- include datatype restriction lists

}

# walk the whole list, not just the head

?head (rdf:rest*) ?lnode .

?lnode ?lp ?lo .

FILTER (?lp IN (rdf:first, rdf:rest))};

RDF models lists as a chain of blank nodes with two pointers per node: rdf:first → the item at this position, rdf:rest → the next node (or rdf:nil at the end). This is a common way to implement lists going back to Lisp. This gets all the nodes in various lists used to define TBox axioms. This step only gets the list nodes but not the triples that have those nodes as a subject. That's done in the next step.


Step 3a: Annotations for all items in owl:oneOf

INSERT { GRAPH g:TBox { ?enum ?ep ?eo } }

WHERE {?c owl:oneOf ?L .

?L (rdf:rest*/rdf:first) ?enum .

?enum ?ep ?eo .

VALUES ?ep { rdfs:label skos:prefLabel rdfs:comment }};

For every class defined by an owl:oneOf list, it walks the list and copies just the annotations (labels/comments) of each member into g:TBox.


Step 3b: Triples for datatype restrictions in owl:withRestrictions

INSERT { GRAPH g:TBox { ?item ?facet ?val } }

WHERE {?dr owl:withRestrictions ?L .

?L (rdf:rest*/rdf:first) ?item .

?item ?facet ?val .

VALUES ?facet {xsd:minInclusive xsd:maxInclusive xsd:minExclusive xsd:maxExclusive

xsd:length xsd:minLength xsd:maxLength xsd:pattern

xsd:totalDigits xsd:fractionDigits}};

Copies the XSD datatype restriction triples used inside owl:withRestrictions lists (e.g., xsd:minInclusive 18, xsd:pattern "[A-Z]{3}") into g:TBox.


Step 4: Type the datarange nodes (optional polish)

INSERT { GRAPH g:TBox { ?dr rdf:type rdfs:Datatype } }

WHERE { ?dr owl:onDatatype ?dt . };

This adds rdf:type rdfs:Datatype to any node used with owl:onDatatype (i.e., the datarange blank node inside a datatype restriction). This doesn’t change semantics. It just adds metadata for readability and interoperability.


Step 5: Type missing restriction nodes

INSERT { GRAPH g:TBox { ?r a owl:Restriction } }

WHERE {

GRAPH g:TBox {

{?r owl:onProperty ?p .}

UNION {?r owl:someValuesFrom ?x .}

UNION {?r owl:allValuesFrom ?x .}

UNION {?r owl:hasValue ?x .}

UNION {?r owl:minQualifiedCardinality ?n .}

UNION {?r owl:maxQualifiedCardinality ?n .}

UNION {?r owl:qualifiedCardinality ?n .}

UNION {?r owl:minCardinality ?n .}

UNION {?r owl:maxCardinality ?n .}

UNION {?r owl:cardinality ?n .}

FILTER NOT EXISTS { ?r a owl:Restriction }}}


This finds rdf:type owl:Restriction for any node in g:TBox that acts like a restriction (has owl:onProperty, ValuesFrom, cardinalities, etc.) but isn’t explicitly typed as one.


In the SPARQL file (see the link to GitHub above) there are additional SPARQL queries to validate that it worked and there is also an update to put the Individuals and their properties in an ABox graph. They are pretty straight forward so I'll leave those for people to explore on their own. Also, one other point about the file: you will notice that many of the INSERT updates have a semi-colon after the closing "}". This enables you to copy multiple updates and paste them all into your SPARQL editor. So we can paste all of the updates above and the SPARQL tool can execute them all at once. I usually like to do them one at a time so it is easier to see any output, warnings, or errors and know which update they are associated with. If you do that, take off the semi-colon after the closing curly bracket. Finally, after you've done all of the above you can do the following:


CONSTRUCT { ?s ?p ?o }

WHERE {GRAPH g:TBox { ?s ?p ?o }}


and SPARQL will list all the triples in the TBox. You can save those to a file and read it back into Protégé and you should see all the class and property definitions as well as annotations for labels, etc. The image below shows the results of running this final CONSTRUCT in AllegroGraph after running the updates that created the TBox subgraph. It returns all the TBox triples which can then be saved as Turtle and read back into Protégé.

ree

  • facebook
  • linkedin

©2019 by Michael DeBellis. Proudly created with Wix.com

bottom of page