I try to focus most of my posts on practical topics that will help people developing ontologies and knowledge graphs. However, every once in a while (like now) I like to go into some less practical areas. There are two related proofs that provided the formal foundation for modern programmable digital computers and that is what I'm going to discuss in this post. When I worked at the Software Engineering lab at Accenture, one of the consultants who worked with us loved the concept of meta-objects and meta-models because he had done very practical work up to that point and hadn't heard the term "meta" applied to computer science. He liked to say "let's get meta" whenever we were about to go into fairly esoteric but interesting discussions. So let's get meta.
There are some reasons which I'll get to, that the proof is especially relevant to OWL users but it also has major implications for computer science in general. The proof is that the Entscheidungsproblem is unsolvable. The Entscheidungsproblem gained prominence at the beginning of the 20th century due to a mathematician named Hilbert who was considered one of the most important mathematicians of the time. At a mathematical conference Hilbert gave the keynote speech and described what he considered the most important open questions facing mathematics. One of those was the Entscheidungsproblem. Entscheidungsproblem is simply German for "Decision Problem".
The decision the problem referred to was an algorithm that could take in any arbitrary formula in First Order Logic (FOL) and provide an answer as to whether or not it was valid. Most people thought such an algorithm must exist, we just hadn't found it yet. The reason people thought that there was such an algorithm was that such an algorithm existed for Propositional Logic and FOL is just an extension of propositional logic with two new operators. If you've taken an introductory class in Logic you've used propositional logic. Propositional logic consists of the logical connectors: or (∨), and (∧), if-then (⊃), and not (¬) along with variables such as P and Q. E.g.,
P ∨ ~P "P or not P"
is a valid formula in propositional logic. Valid means always true regardless of the values assigned to the variables. In this case regardless of whether P is true or false "P or not P" is always true. The algorithm for propositional logic is a truth table. A table that lists all the possible values for each variables and the truth of the whole statement for each alternative binding. Below is the simple truth table for "P or not P". Since the middle column representing the overall truth of the formula is always true, the formula is valid.
P | ∨ | ¬P |
T | T | F |
F | T | T |
Truth tables go back to the ancient Greeks. First Order Logic (FOL) is propositional logic with the addition of two quantifiers: universal (for all) and existential (there exists). These should be familiar to OWL users. Universal quantification is the Only statement in Description Logic and existential quantification is the Some statement. The symbols for these are ∀ for universal quantification and ∃ for existential quantification.
In 1936 an American and an Englishman found very different proofs that the Entscheidungsproblem had no solution. The American was Alonzo Church. The Englishman was Alan Turing. They both proved that it was impossible to have a general algorithm that could determine the validity of any set of FOL formulas. Of course, this doesn't mean you can't prove things in FOL. In fact it doesn't imply that there is any theorems you can't prove in FOL (although there are but that's a different proof by Gödel). All it means is there is no one algorithm that can determine the validity of every set of FOL formulas.
As sometimes happens in mathematics (e.g., Newton and calculus), the formalisms that Turing and Church created for their proofs turned out to be as or more important than the proof itself. Turing/Church created the first mathematical models of computation. Church created the Lambda calculus and Turing the Turing Machine. Later it was proven that the two are equivalent. Anything that can be defined in the Lambda calculus can be defined on a Turing Machine and vice versa.
The Lambda calculus was extremely important for the early years of Artificial Intelligence because one of the first and most important languages for AI was Lisp and Lisp is an implementation of the Lambda calculus. It was called the Lambda calculus because the main formalism was a lambda expression, which is similar to a function definition, except a lambda expression can also be treated as data and can itself be passed as an argument to a function. In the Lambda calculus as well as Lisp and Python you can have arbitrary pieces of code called lambda expressions that can be passed to functions as data and then be interpreted inside the context of another function. This allows all sorts of possible manipulations. Code can be generated on the fly, the language (in Lisp) can literally redefine itself as it goes.
While Church defined the first modern computer language (before there were any computers to run it). Turing's model was the model that was and is the model for all programmable digital computers: the Turing Machine. Turing took the concept of a Finite State Machine and added the ability of the machine to read and write to memory. Turing's and Church's genius realization was that data and process were different sides of the same coin. The information read from the memory of a Turing Machine could change the behavior of that machine. Thus, rather than a Finite State Machine, Turing created a model for a state machine that could have infinite states since the machine could keep on changing its memory, which would change its behavior, which would change its memory, which would change its behavior,...
Aesthetics are subjective of course, however, to me Turing's proof has the same kind of beauty as a great musical composition. Aesthetics aside this proof is especially relevant for those of us who use OWL because the OWL reasoner is an algorithm that (among other things) determines the validity of any set of OWL axioms (aka any OWL ontology). How is this possible given the Turing/Church proof? It is possible because OWL is not FOL. OWL is a subset of FOL called Description Logic. The creation of description logic goes back much further than research in OWL and was driven by the desire to find a subset of First Order Logic that was not subject to the Turing/Church proof.
In some ways the early history of AI was a struggle with the Turing/Church proof. FOL was a very logical model for AI researchers to use to model AI systems. It is very expressive and powerful. However, the Turing/Church proof meant that no language with the expressivity of FOL could support a reasoner that was guaranteed to terminate. So AI researchers experimented with various subsets of FOL. One of the first successful examples were rule-based systems and inference engines. Rules were a very limited subset of FOL. However, because they were such a limited subset it was possible to define inference engines (aka reasoners) that could operate very efficiently, which was critical back in the days when the most powerful computers had less CPU power and memory than modern phones.
An interesting side note is: why are people so concerned about decidability? After all we use languages that can define programs that won't terminate all the time. E.g., a loop in a Python program that never terminates. This is the position of John Sowa and others who think that we should just use First Order Logic, decidability be damned, what matters is expressivity. I don't think there is a right or wrong answer to this question but I think Sowa makes a very interesting point, that many people just take it as a given that reasoners must be guaranteed to terminate and that isn't necessarily a constraint we have to abide by. Like so much in software engineering, it is a trade-off and the decision should be dictated by requirements, not something we assume a priori.
Turing's proof is proof is defined in his paper: On Computable Numbers: with an application to the Entscheidungsproblem. In this paper Turing defines what it means to be a computable number. This is another important contribution of the Turing/Church proof, it provided the first definition of what we mean by saying something is computable. I.e., that it can be the output of a Turing Machine or a deduction of the Lambda calculus. Before this, everyone had an intuitive understanding of what "computable" means but mathematicians don't like to rely on intuitions, they like rigorous definitions. This is known as the Turing-Church thesis because it is not a proof, in fact it can't be proven, because it is a definition rather than a theorem. However most mathematicians believe it is the best definition of computability and since Turing/Church no one has come up with an alternative definition of computability.
One of the key ideas that made Turing's proof possible is that there are different sizes of infinity. This is counter intuitive because we know that "infinity + 1" is still infinity. To understand different sizes of infinity we need to tease apart the difference between cardinality and ordinality. With finite numbers these ideas are mostly bundled together. When we think of infinite sets we need to separate them. Ordinality is an ordering imposed on a set via some relation. The most common of course being > and its inverse <. E.g., 1 < 2 < 3. Cardinality is the size of a set, how many elements it contains. So the cardinality of {1, 2, 3} is 3. We can't say one infinity is greater than another in the sense of ordinality because there is no largest infinite number. But we can say that the cardinality of one infinite set is larger than the cardinality of another infinite set.
For example, the cardinality of the integers is the same as the cardinality of the natural numbers (aka the non-negative integers). We determine this for infinite sets by a mapping. If we can map the elements of one infinite set to another infinite set then they have the same cardinality. An example mapping of the integers to the natural numbers (expressed as tuples with the element from the integers first and the naturals second) is: <0,0>, <1, 1>, <-1, 2>, <2, 3>, <-2, 4>, <3, 5>, <-3, 6>,... I.e., map 0 to 0 then map all the negative integers to the even natural numbers and all the positive integers to the odd natural numbers. You will always be much further along the ordinality for the natural numbers but for any integer you can always find a natural number that it maps to. You can do the same with the rational numbers and the integers. This kind of infinity has cardinality called Aleph 0 or Aleph null and is represented by they symbol ℵ0. Aleph is the first letter in the Hebrew alphabet. Where things get interesting is with the real numbers. A mathematician named Cantor proved that you can't map the reals to the rationals (or integers or naturals). Cantor used a diagonalization proof which is a common kind of tool in mathematic proofs. For details see: the Wikipedia article on Cantor's Diagonalization proof. The cardinality of the real numbers is known as Aleph 1 (ℵ1).
Turing used Cantor's proof for his proof. He proved that the cardinality of all possible Turing machines was Aleph 0, however, the cardinality of all possible valid theorems is Aleph 1. They are both infinite but one infinity is bigger. Thus, there must be some valid theorems that don't have proofs. From this it follows that there is no solution to the Entscheidungsproblem, because even the most basic algorithm we could think of: enumerate all Turing machines until you find one that proves your theorem, won't always work for all valid theorems.
The work of Turing/Church was part of what is now known as the theory of computation which was begun by Gödel who proved that there must be some valid theorems that can't be proven. Turing and Church built on Gödel's work. The theory of computation was continued by Chomsky who defined the Chomsky language hierarchy: a definition of various computing models and the ever more complex sets of languages they can parse. Chomsky's first book Syntactic Structures was written at a time (the 1950's) when computers were just being applied to symbolic problems and people had great hopes that finite state machines could parse natural language. Chomsky proved that finite state machines could only parse regular expressions and that human language is far more complex than regular expressions. The sets defined by human languages are called the recursively enumerable sets. Chomsky proved that only Turing Machines (the most powerful model which makes sense if it is the definition of anything that can be computed) can parse human languages.
There are videos on YouTube which refer to the proofs of Gödel, Turing, and Church as showing "holes" in mathematics or even worse that "mathematics is inconsistent". I've also seen philosophers who say such things. The latter is of course simply ridiculous. If math was inconsistent we couldn't prove anything. Or rather we could prove anything because any theorem can be proven from "If False then..." and if we can prove anything then we can't usefully prove anything. As for saying that these proofs show "holes" or "problems" that is a subjective interpretation but IMO quite an improper one. On the contrary, I think one of the signs that a discipline is mature is that we begin to understand not just what we know but what we can't know. Another good example of this is the Heisenberg Uncertainty principle in physics. Understanding the limits of our knowledge is an essential part of knowledge. Which ultimately is why I think these proofs are so important, because they help us realize that even reason has fundamental limits.
Addendum: These are very complex topics and I'm not an expert. So keep in mind that everything I said above is 1) Overly simplified (e.g., there are issues such as the Continuum Hypothesis that are important but would have required a much longer article) and 2) Simplified to the point that people who are experts will disagree with some of the things I said, especially about Gödel. So if you find this interesting don't take anything above as irrefutable but instead click on some of the links and check things out for yourself. I linked mostly to the Stanford Encyclopedia of Philosophy because on these issues, their articles tend to be a bit better than Wikipedia.
Comments