Watch Your S: A Subtle "Gotcha" of IRIs

Michael DeBellis
Oct 30
3 min read

One thing I like to do on this blog is write about issues that took me way too long to figure out, in the hope that I can save other developers from the same fate. This is one of those examples.

For a long time, I believed that the important part of an IRI (Internationalized Resource Identifier) was everything after the //. After all, that’s how it works when you type URLs into your browser, right? The part before that (http or https) is just about security. And as someone who had to go through the hassle of identity theft once, I take security very seriously.

Well, that’s all true for web browsing. But when you enter the world of RDF and Semantic Web technologies, things get trickier. RDF doesn’t just look at the part of the IRI after the //. It treats the entire IRI—protocol and all, as a literal identifier. That means:

http://mydomain/my_awesome_ontology/

is not the same as

https://mydomain/my_awesome_ontology/

They’re two completely different identifiers, as far as RDF (and OWL, SPARQL, SHACL, etc.) are concerned.

Here’s where it gets even trickier. Many of the standard vocabularies we all use such as OWL, RDF, RDFS, Dublin Core, SKOS, and XSD, were created before HTTPS existed. Their official namespaces still use HTTP, not HTTPS.

So, if you’re like me, and you’ve gotten into the habit of always using https, you might look at something like:

http://purl.org/dc/terms/

and think, “That’s not secure—I’ll just fix it.”

Don’t. If you change that to:

https://purl.org/dc/terms/

it may look like an improvement, but you’ve just created a brand-new namespace that doesn’t actually exist in the RDF world. That’s how I once found myself staring at a SPARQL query that refused to recognize some of my entities. I kept checking my prefixes, convinced I had a typo or had mixed up / and #. The problem was that sneaky little “S”. RDF was doing exactly what it’s supposed to do. It saw two different IRIs.

So, the moral of the story is simple but important:

RDF cares about every character in an IRI.
Don’t “upgrade” legacy namespaces to HTTPS.
When defining your own IRIs, use HTTPS and ALWAYS use HTTPS.

And next time you’re debugging a SPARQL query that seems haunted, take a quick glance over your shoulder… maybe your problem is that you forgot to watch your S.

Pro Tip: Those XML Catalog Files are Actually Useful

I've been using OWL since 2017 and until recently I've never known what those catalog files that Protégé and other tools generate when you save an ontology are used for. You know the ones that typically are named catalog-v001.xml. Catalog files are basically little maps that tell your tools where to find an ontology file without changing its IRI. They let you say: Whenever you see this IRI, don’t go out on the Internet looking for it. I’ve got a nice clean copy right here. This can be extremely useful if you don't want to be dependent on the Internet in order to load your ontologies but you also don't want to merge imported ontologies into yours because that makes it much harder to incorporate future versions of the ontology you are reusing. In the past, I would always merge imported ontologies precisely because I didn't want to have to rely on access to the Internet in order to load an ontology. For some namespaces that haven't changed in forever and aren't likely to change such as Dublin Core I still think merging is the way to go. However, other ontologies, such as the Gist upper model from Semantic Arts are being very actively developed and for most use cases you will want to be able to load future versions (which is very hard to do once you merge a vocabulary into your ontology). That's actually one of the primary use cases for that catalog file.

For example, SKOS still lives at: http://www.w3.org/2004/02/skos/core# But you can have a local copy of the SKOS ontology to avoid network lookups and enable work offline. You can add a catalog entry like this:

<uri name="http://www.w3.org/2004/02/skos/core#"

uri="https://mydomain/ontologies/skos_local_copy.ttl"/>

That tells your tools:

Whenever you need SKOS, use my secure HTTPS copy but keep treating it as the same SKOS everyone else uses.

This way you get the best of both worlds:

You still identify everything with the correct SKOS IRI (so your data stays interoperable).
You load it from your preferred HTTPS or local source (so your setup stays secure and reliable).

Bottom line: If you are using a vocabulary like Gist that is still actively being developed, use the catalog. That way, you can keep your ontologies organized and independent of access to the Internet but still update to new versions when they are available.