In a previous post I talked about SKOS and how punning is needed to use the SKOS object properties on OWL classes. Some people find punning confusing. People who have done Object-Oriented Programming understand puns because puns are similar to meta-classes. However, many people use OWL who aren't programmers. So in this post I'm going to discuss what punning is, why we need it and an example of how to use it.
Why We Need OWL Puns
To begin with remember that OWL is built on top of RDF. RDF is a language for defining graphs, i.e., nodes and links. In order to support logical reasoning OWL needs to put restrictions on the kinds of graphs we create. The cost of having an automated reasoner is that we have to have some restrictions on RDF. One of those restrictions is a strict demarcation between classes and instances. These kinds of restrictions constrain the work of the reasoner to a manageable computational space. If a resource can be either a class or an individual it makes things too difficult for the reasoner. Usually that's not much of a sacrifice. Usually we want a distinction between individuals and classes. However, there can be use cases where it would be useful to treat a class as an individual. E.g., to use the SKOS object properties. That's where punning comes in. Punning allows us to maintain the individual/class distinction but still treat classes as individuals. When we pun a class we create an Individual with the same IRI as a Class. You may be thinking "but you just said we can't do that".
Well, yes I did but puns allow a loophole (or a hack depending on who you talk to) around this. A pun is a node in RDF that has the same IRI for a class and individual. Except from the standpoint of the OWL reasoner this doesn't matter. The OWL reasoner thinks in terms of sets (classes) and elements of sets (individuals). Typically, there is a 1:1 map between a class or individual and an RDF node. With puns you have a 2:1 relation. Both a class and an instance have the same IRI (the same RDF node). But OWL doesn't know or care about RDF nodes. To the reasoner the fact that the same node represents both doesn't matter as long as the reasoner can always tell by the context (which it can) whether the IRI refers to a class or an individual. In this way you can assert values on a class. You assert values on the individual that has the same IRI as the class and those values can be used in various ways. E.g., sometimes you want to just assert something about a whole class.
Other times you may treat the pun as what's called in Cognitive Science a prototype. This is a different meaning than what we typically mean by prototype. In Cognitive Science a prototype is some default object or set of property values we associate with a class. E.g., when we think of a dog most people have an image in their mind of a medium sized dog. There have been fascinating cognitive science experiments on this. Eleanor Rosch is the researcher who first discovered them but they have since been embraced by many linguists and cognitive scientists. Probably the most famous advocate of prototype theory is George Lakoff the linguist at UC Berkeley in his book Women, Fire, and Dangerous Things. For more on prototype theory here is a good paper by Rosch. There are times when you have values that apply to most but not all of the instances of a class. E.g., most dogs have 4 legs but it is possible to have a dog with 3 legs so you often wouldn't want a restriction that says the class Dog has exactly 4 legs. So instead you could put that information on a pun and write some SPARQL or Python code that looks at the pun for each class when it creates a new instance and fills in property values with those from the pun. These are also similar to class variables in Java.
Puns are a convention that allows the developer to assert values on classes but as far as the reasoner understands a pun is just like any other individual. The convention that the pun has the same IRI as the class is useful for the developer but not something the reasoner understands because the reasoner works at the logic level not the graph level.
How to Use OWL Puns
The first thing to remember is always think carefully before you use puns. Remember that annotations properties can also add data to classes and properties. So for data such as comments, labels, creators, i.e., meta-data, annotations are usually the better way to go because you don't need puns. Also, there are often other ways to achieve the same result using other features of OWL rather than puns.
For example, the classic use case for puns in the OWL documentation is to add a boolean to classes about animals that describe if the class is an endangered species. The example in the OWL documentation is the Eagle class and wanting to add a True value to the data property isEndangeredSpecies. Personally, I've never really liked that example because there are other ways to do the semantic equivalent instead of puns. For example, you could have a subclass of Species called EndangeredSpecies which is a defined class and has the axiom definition:
Species and isEndangeredSpecies value True
Someone gave a counter argument to me about this that it is the species that is endangered not the individual animals. That's not a compelling argument to me. The way I see it, saying a species is endangered is just linguistic shorthand for saying that each animal that is a member of that species is endangered. When a species is endangered we don't worry about the species as an abstraction, we worry about the individual animals, where they live, laws about hunting them, etc. E.g., we might have rules relating to specific animals so that if the animal is an endangered species we want certain rules to fire. In that use case it would be easier to have the value directly on the animal rather than having to check all its superclasses.
I'm not saying my answer is necessarily the correct answer rather that I think both are legitimate and in this use case I would often go with the defined class rather than a pun. An argument that would convince me however is that with the approach I describe we end up setting a boolean on every instance of every endangered species. For most ontologies that wouldn't be a big deal but I could imagine some models where we have perhaps thousands or more instances for each species. In such a case we would be adding thousands or more extra triples rather than one extra triple with the pun and if space was an issue that would be a compelling reason to use puns in this use case. My point isn't that the classic example is wrong. Just that there are often alternatives to puns that should be considered.
The first question that comes up when using a pun is: what class should the pun be an instance of? I've asked this on forums with some of the best minds in ontology design and the universal consensus is ¯\_(ツ)_/¯, As far as I know there is no consensus on what class to make a pun an instance of. However, there is a consensus on what class to not make a pun an instance of and that is the class itself. I.e., if we pun Eagle it's not an instance of the Eagle class because it represents the set of all Eagles.
Back to more practical matters. One convention that I've started to (mostly) use is to make puns instances of the skos:Concept class as described in a previous post. To the extent that there is a "right" semantic answer (and there really isn't) I would argue that this is the best option. This is because skos:Concept is itself a vague notion. This is on purpose. The designers of SKOS wanted their vocabulary to apply to models other than formal ontologies so they wanted the concepts to be very generic and applicable to different knowledge representations.
However, I'm currently working on a project where we had this design discussion and decided that we would use puns. We also decided that in this case there was a more useful standard than making puns instances of skos:Concept. We are building an ontology to do semantic search on documents. The topic for a document can be any entity in the ontology: classes, properties, or instances. The figure below is a much simpler example than the actual ontology. In the case of the figure below we have a Topic class with subclasses. This example is a bit too simple because we could just make all the topics instances. However, in the actual ontology the domain is much broader and the topics include classes (with many levels of sub-classes). We considered having an annotation property called hasTopic. That would alleviate the need for puns. However, the example below shows why we decided to go with puns. Even though the reasoner doesn't know that a punned instance is the same node in the graph as the class with the same name, we can still utilize the reasoner with puns. In this case we make the puns instances of their superclass. I.e., the pun for Linguistics is an instance of the class AcademicTopic. You can see this in the Usage information for the class Linguistics. Note that the instance Linguistics is shown in the usage right under the class itself. This is because they both have the same IRI. This isn't the first time I've used this convention. I've found using puns in this way can often be very useful.
In this way, we can create defined classes that utilize the puns. In this simple example I've created a defined class called AcademicBook with the axiom:
Book and (hasTopic some AcademicTopic)
In the figure below, you can see the definition for the defined class and that several books that have puns for the subclasses of AcademicTopic as their topic have been classified by the reasoner as AcademicBooks:
The figure below is another view of this small demonstration ontology. In this view we see one of the instances: the book Syntactic Structures by Chomsky. In property assertions we see that it hasTopic the instance Linguistics which is a pun for the class Linguistics. As a result as shown in the Description the reasoner has classified Syntactic Structures as an instance of AcademicBook. Another advantage of this approach is that by utilizing puns we can define inverses for the object property hasTopic. So in the ontology there is another property called isTopicOf that automatically gets set when we define hasTopic. That way we have a relation from all the topics to the documents that have them as topics. Of course, since annotation properties aren't seen by the reasoner you can't define things like inverses on them.
Astute readers (and especially logical purists) are thinking: "but the class Linguistics isn't really an (instance of) AcademicTopic either!" And they are correct. There is no question this is a hack but puns in general are a hack and in the real world hacks often have their place. That's why it is better not to use them if there are alternatives. However, there are use cases such as these where you can bend logical purity a bit and get some useful functionality.
Comentários