'The Proofs That Provide the Foundation for Mathematics'

12/27/24. When I first published this I got some of the details wrong. Specifically my discussion of countable and uncountable infinite sets and how they apply to Turing's proof was wrong. While the difference between countable and uncountable infinite sets is a fascinating topic in itself it isn't part of Turing's proof as I mistakenly thought. Where I went wrong was that both proofs use what's called a Diagonalization proof. A colleague pointed this out to me quite a while ago, but I haven't had time to do the required reading to refresh my understanding of the proof. I've corrected... at least I've made it less wrong... the text that describes the details of Turing's proof in the paragraph near the end that starts with "Turing's proof is proof is defined in his paper: On Computable Numbers: with an application to the Entscheidungsproblem..." Happy Holidays!

I try to focus most of my posts on practical topics that will help people developing ontologies and knowledge graphs. However, every once in a while (like now) I like to go into some less practical areas. There are two related proofs that provided the formal foundation for modern programmable digital computers and that is what I'm going to discuss in this post. When I worked at the Software Engineering lab at Accenture, one of the consultants who worked with us loved the concept of meta-objects and meta-models because he had done very practical work up to that point and hadn't heard the term "meta" applied to computer science. He liked to say "let's get meta" whenever we were about to go into fairly esoteric but interesting discussions. So let's get meta.

There are some reasons which I'll get to, that the proof is especially relevant to OWL users but it also has major implications for computer science in general. The proof is that the Entscheidungsproblem is unsolvable. The Entscheidungsproblem gained prominence at the beginning of the 20th century due to a mathematician named Hilbert who was considered one of the most important mathematicians of the time. At a mathematical conference Hilbert gave the keynote speech and described what he considered the most important open questions facing mathematics. One of those was the Entscheidungsproblem. Entscheidungsproblem is simply German for "Decision Problem".

The decision the problem referred to was an algorithm that could take in any arbitrary formula in First Order Logic (FOL) and provide an answer as to whether or not it was valid. Most people thought such an algorithm must exist, we just hadn't found it yet. The reason people thought that there was such an algorithm was that such an algorithm existed for Propositional Logic and FOL is just an extension of propositional logic with two new operators. If you've taken an introductory class in Logic you've used propositional logic. Propositional logic consists of the logical connectors: or (∨), and (∧), if-then (⊃), and not (¬) along with variables such as P and Q. E.g.,

P ∨ ~P "P or not P"

is a valid formula in propositional logic. Valid means always true regardless of the values assigned to the variables. In this case regardless of whether P is true or false "P or not P" is always true. The algorithm for propositional logic is a truth table. A table that lists all the possible values for each variables and the truth of the whole statement for each alternative binding. Below is the simple truth table for "P or not P". Since the middle column representing the overall truth of the formula is always true, the formula is valid.

P	∨	¬P
T	T	F
F	T	T

Truth tables go back to the ancient Greeks. First Order Logic (FOL) is propositional logic with the addition of two quantifiers: universal (for all) and existential (there exists). These should be familiar to OWL users. Universal quantification is the Only statement in Description Logic and existential quantification is the Some statement. The symbols for these are ∀ for universal quantification and ∃ for existential quantification.

In 1936 an American and an Englishman found very different proofs that the Entscheidungsproblem had no solution. The American was Alonzo Church. The Englishman was Alan Turing. They both proved that it was impossible to have a general algorithm that could determine the validity of any set of FOL formulas. Of course, this doesn't mean you can't prove things in FOL. In fact it doesn't imply that there is any theorems you can't prove in FOL (although there are but that's a different proof by Gödel). All it means is there is no one algorithm that can determine the validity of every set of FOL formulas.

As sometimes happens in mathematics (e.g., Newton and calculus), the formalisms that Turing and Church created for their proofs turned out to be as or more important than the proof itself. Turing/Church created the first mathematical models of computation. Church created the Lambda calculus and Turing the Turing Machine. Later it was proven that the two are equivalent. Anything that can be defined in the Lambda calculus can be defined on a Turing Machine and vice versa.

The Lambda calculus was extremely important for the early years of Artificial Intelligence because one of the first and most important languages for AI was Lisp and Lisp is an implementation of the Lambda calculus. It was called the Lambda calculus because the main formalism was a lambda expression, which is similar to a function definition, except a lambda expression can also be treated as data and can itself be passed as an argument to a function. In the Lambda calculus as well as Lisp and Python you can have arbitrary pieces of code called lambda expressions that can be passed to functions as data and then be interpreted inside the context of another function. This allows all sorts of possible manipulations. Code can be generated on the fly, the language (in Lisp) can literally redefine itself as it goes.

While Church defined the first modern computer language (before there were any computers to run it). Turing's model was the model that was and is the model for all programmable digital computers: the Turing Machine. Turing took the concept of a Finite State Machine and added the ability of the machine to read and write to memory. Turing's and Church's genius realization was that data and process were different sides of the same coin. The information read from the memory of a Turing Machine could change the behavior of that machine. Thus, rather than a Finite State Machine, Turing created a model for a state machine that could have infinite states since the machine could keep on changing its memory, which would change its behavior, which would change its memory, which would change its behavior,...

Aesthetics are subjective of course, however, to me Turing's proof has the same kind of beauty as a great musical composition. Aesthetics aside this proof is especially relevant for those of us who use OWL because the OWL reasoner is an algorithm that (among other things) determines the validity of any set of OWL axioms (aka any OWL ontology). How is this possible given the Turing/Church proof? It is possible because OWL is not FOL. OWL is a subset of FOL called Description Logic. The creation of description logic goes back much further than research in OWL and was driven by the desire to find a subset of First Order Logic that was not subject to the Turing/Church proof.

In some ways the early history of AI was a struggle with the Turing/Church proof. FOL was a very logical model for AI researchers to use to model AI systems. It is very expressive and powerful. However, the Turing/Church proof meant that no language with the expressivity of FOL could support a reasoner that was guaranteed to terminate. So AI researchers experimented with various subsets of FOL. One of the first successful examples were rule-based systems and inference engines. Rules were a very limited subset of FOL. However, because they were such a limited subset it was possible to define inference engines (aka reasoners) that could operate very efficiently, which was critical back in the days when the most powerful computers had less CPU power and memory than modern phones.

An interesting side note is: why are people so concerned about decidability? After all we use languages that can define programs that won't terminate all the time. E.g., a loop in a Python program that never terminates. This is the position of John Sowa and others who think that we should just use First Order Logic, decidability be damned, what matters is expressivity. I don't think there is a right or wrong answer to this question but I think Sowa makes a very interesting point, that many people just take it as a given that reasoners must be guaranteed to terminate and that isn't necessarily a constraint we have to abide by. Like so much in software engineering, it is a trade-off and the decision should be dictated by requirements, not something we assume a priori.

Turing's proof is proof is defined in his paper: On Computable Numbers: with an application to the Entscheidungsproblem. In this paper Turing defines what it means to be a computable number. This is another important contribution of the Turing/Church proof, it provided the first definition of what we mean by saying something is computable. I.e., that it can be the output of a Turing Machine or a deduction of the Lambda calculus. Before this, everyone had an intuitive understanding of what "computable" means but mathematicians don't like to rely on intuitions, they like rigorous definitions. This is known as the Turing-Church thesis because it is not a proof, in fact it can't be proven, because it is a definition rather than a theorem. However most mathematicians believe it is the best definition of computability and since Turing/Church no one has come up with an alternative definition of computability.

Turing’s proof of the unsolvability of the Entscheidungsproblem begins with his demonstration that the Halting Problem cannot be solved. The Halting Problem asks whether there exists an algorithm (or Turing machine) that can take as input the description of any arbitrary Turing machine M and an input x, and decide whether M halts (finishes running) or loops forever when x is the input. Turing showed that no such algorithm can exist.

The proof uses a diagonalization argument to construct a machine D that leads to a contradiction. Suppose a "halting algorithm" H exists that determines whether M halts on x. Turing then defines D as a machine that:

Uses H to decide whether M halts on M’s own description.
Does the opposite: if H predicts M halts, D loops forever; if H predicts M doesn’t halt, D halts.

This self-referential construction leads to a contradiction when D is applied to itself, showing that H cannot exist. This result proves that the Halting Problem is undecidable.

Turing then extended this idea to the Entscheidungsproblem. If the Entscheidungsproblem were solvable, there would exist an algorithm to decide whether any formula in first-order logic is valid. However, Turing showed that the validity of logical formulas can encode the Halting Problem. Since the Halting Problem is undecidable, so too is the Entscheidungsproblem. In other words, the failure to solve the Halting Problem directly implies the impossibility of solving the Entscheidungsproblem.

The work of Turing/Church was part of what is now known as the theory of computation which was begun by Gödel who proved that there must be some valid theorems that can't be proven. Turing and Church built on Gödel's work. The theory of computation was continued by Chomsky who defined the Chomsky language hierarchy: a definition of various computing models and the ever more complex sets of languages they can parse. Chomsky's first book Syntactic Structures was written at a time (the 1950's) when computers were just being applied to symbolic problems and people had great hopes that finite state machines could parse natural language. Chomsky proved that finite state machines could only parse regular expressions and that human language is far more complex than regular expressions. The sets defined by human languages are called the recursively enumerable sets. Chomsky proved that only Turing Machines (the most powerful model which makes sense if it is the definition of anything that can be computed) can parse human languages.

There are videos on YouTube which refer to the proofs of Gödel, Turing, and Church as showing "holes" in mathematics or even worse that "mathematics is inconsistent". I've also seen philosophers who say such things. The latter is of course simply ridiculous. If math was inconsistent we couldn't prove anything. Or rather we could prove anything because any theorem can be proven from "If False then..." and if we can prove anything then we can't usefully prove anything. As for saying that these proofs show "holes" or "problems" that is a subjective interpretation but IMO quite an improper one. On the contrary, I think one of the signs that a discipline is mature is that we begin to understand not just what we know but what we can't know. Another good example of this is the Heisenberg Uncertainty principle in physics. Understanding the limits of our knowledge is an essential part of knowledge. Which ultimately is why I think these proofs are so important, because they help us realize that even reason has fundamental limits.

The Turing/Church Proofs: How Logicians Created Computer Science

Related Posts