"What's in a name? A rose with IRI OWLClass_19862dfc_691eWh_4643_9c7c_65775f1eeb49 would smell as sweet."
One concept that some new users of Protégé find confusing is the name of an entity. An entity is any ontology object: a class, property, or individual. When talking about the name of an entity there are two related but very different things that can be meant. The first is the last part of the IRI. IRI is the new term that replaces URI. A URI is like a URL but where as a URL is typically meant to be something that can be displayed in a web browser a URI (Uniform Resource Identifier) is meant to be a unique address where a particular resource can be found. The newer term IRI (Internationalized Resource Identifier) is simply URI’s but with support for character sets such as Kanji rather than simply ASCII as was the case with URI’s.
Typically an ontology has a base IRI such as https://semanticweb.org/FOAF. Each individual entity in the ontology has its own unique IRI that typically starts with the base IRI followed by a # sign. For example: https://semanticweb.org/FOAF#Agent . The text that comes after the # sign is often referred to simply as the name of the entity.
However, since OWL is meant to model large domains in an intuitive manner, it is often the case that the same entity may be referenced by many names. For example, email and mailbox may be different names used to refer to the same data property. In addition, having spaces or special characters in an IRI can make utilizing the IRI more complex. Thus, one of the core properties that is included in every Protégé ontology (just as the class owl:Thing is) is the annotation property rdfs:label. A good way to think of the difference is that an IRI is typically used by machines and developers to identify an entity where as the label is the string that tools use to present a readable name for users.
An entity can have multiple rdfs:label values. For example the entity: https://semanticweb.org/FOAF#hasEmail property could have the labels: email and mailbox. Also, the Simple Knowledge Organization System (SKOS) has a useful property called skos:altLabel to distinguish the primary display name from the alternative display names.
An important decision when creating a new ontology in Protégé is whether to use UUIDs or user supplied names for the IRI of each entity. The figure below shows the Protégé dialogue for selecting this. This can be found in File>Preferences>New entities. The following image shows how the dialogue would be configured when setting new entities to use user supplied names rather than UUIDs.
The default in Protégé now is to use UUIDs rather than user supplied names. A UUID is a Universal Unique Identifier. This is a string generated by an algorithm that is guaranteed to be unique.
The advantage of using a UUID is that it is guaranteed to be a unique identifier, similar to a unique key column in a relational database. However, for T-Box entities (classes and properties) there is also a significant downside to using UUIDs. That is that when writing SPARQL queries the queries become much less intuitive to read. As a simple example here is a query from our Covid CODO ontology that finds all the patients and their family relations:
SELECT ?p ?r
WHERE {?p a codo:Patient;
codo:hasFamilyRelationship ?r.}
With UUIDs this query would look something like:
SELECT ?p ?r
WHERE {?p a codo:OWLClass_f861e81c_661a_4243_a9be_cb9c780cb78a;
codo:OWLProperty_c744v9fv_594j_3640_a9be_dge5305fe45v ?r }
Obviously far less intuitive. So using UUIDs for T-Box entities comes with a non-trivial cost: the effort to define and maintain SPARQL queries increases significantly. One needs to constantly look up specific IRIs. Also, rather than have self documenting code as with the first query the developer must add comments to clarify the actual entity or perform lookups in SPARQL via the label (which adds additional time to each query).
Although I think I may be in the minority my preference is to still use intuitive IRI names for T-Box entities (classes and properties) but to use UUIDs for A-Box entities (individuals). I think this is analogous to how one typically uses keys in a relational database. The key column in an RDB should ideally not be some field that is subject to change or yields information about the specific row. E.g., a bad practice is to use social security numbers as keys. Using UUIDs for the key in an RDB is far preferable and the same usually applies to individuals in an ontology. However, one typically doesn’t bother with UUIDs for the names of Entities or Relations in an RDB, instead one uses intuitive names. In my experience for most ontologies this is a better idea as well.
When making this choice the other preference setting to know about in Protégé is is File>Preferences>Renderer. If you ever see a screen that looks like the following, it probably means you need to change this preference:
If you see something like this it probably means you have New Entities set to Auto-generated ID but you have Rendering set to Render by entity IRI short name (ID).
To fix this go back to preferences but this time select the Preferences>Renderer tab. If you set Rendering to Render by annotation property the ontology should look readable again. Although, in general what I prefer to do is to set the New entity preference to User supplied name. In that case you want the rendering to be Render by entity IRI short name (ID).
I use Protege to maintain a knowledge graph. Then import the graph into a relational database. I have to keep both in sync. So i think I will use a fixed UUID iri identifier. Then my database uses an integer for primary key and a str field iri to refer to protege. This way I can keep track of changes on both sides.
Nice explanation.