Posted: March 27, 2023.
Or, an introduction to metalogic. Note: this article is aimed at those who already have some exposure to logic and mathematics, as it explores ideas that require some prior knowledge of proof systems and proofs.
If you've taken a proofs course, a logic course from your philosophy department, or had any introduction to symbolic logic, you may have seen an inference rule that looks something like this: \[ \begin{aligned} & \begin{aligned} & \begin{aligned} 1.\quad& A \\ ?.\quad& \dots \\ n-1.\quad& B \end{aligned} \\ n.\quad& A \to B \end{aligned} & \qquad & \begin{aligned} & \text{Assumption} \\ & \text{???} \\ & \text{Some Rule: 1...$n-1$} \\ & \qquad \text{Conditional Proof} \end{aligned} \end{aligned} \] Conditional proofs are an essential part of proof systems, and mathematics in general. It allows us to prove mathematical properties without yet knowing whether our assumption is actually true, and string some of these proofs together in such a way that the proof of one assumption (conditional) potentially leads to the truth of several other statements.
The Wikipedia page for conditional proof gives one such example of this that computer scientists (or mathematicians) may recognize: the NP-complete problems. By definition, any problem in NP (including other NP-complete problems) can be reduced to an NP-complete problem. So, if it turns out that a polynomial-time solution exists for any one NP-complete problem, then all other problems in NP can be solved in polynomial time as well.
The page also briefly mentions the Riemann hypothesis, for which the truth of is used without proof in many practical applications, specifically cryptography. It has something to do with prime number distributions and prime factorings. I quite liked reading this article by the University of Alabama, as it makes a good introduction for anyone unfamiliar with the Riemann hypothesis and its connections to security and cryptography.
Getting back to the main point of this article, we can then ask ourselves this: how do we know that conditional proof is actually a valid argument, even before considering its implications? Mathematics is based on a specific first-order logic, but a philosopher may study different systems of logic where conditional proof is not valid (such as relevanace logic). So how are we so sure that conditional proof is true in our system?
To answer this, we have to zoom out of doing proofs within our logical system, and instead, I've basically asked how we can prove something about our logical system itself. So, I will continue from here by first introducing some basic terminology and syntax for talking about a logic from a meta-theoretical perspective, and then going into how we can prove the validity of conditional proof, also called the Deduction Theorem, for basic propositional logic.
There are two symbols that are crucial to understanding metalogic. Those are \(\vDash\), standing for truth, and \(\vdash\), standing for provability. More formally, \(\Gamma \vDash \gamma\) means that the sentence \(\gamma\) is true in all assignments of truth values that make the set of sentences \(\Gamma\) wholly true. Similarly, \(\Gamma \vdash \gamma\) is also a type of logical consequence, but in this case means that there is a proof from \(\Gamma\) to \(\gamma\). Put another way, \(\vDash\) concerns semantic entailment (truth values), whereas \(\vdash\) is about syntactic entailment (proof rules and provability). When the set of premises is empty, we can simply write \(\vDash \varphi\) and \(\vdash \varphi\). Often, we will want to use these notions with respect to a specific formal system, lets say \(\mathcal{S}\), in which we will write each turnstile with a subscript, \(\Gamma \vDash_{\mathcal{S}} \gamma\) and \(\Gamma \vdash_{\mathcal{S}} \gamma\), to mean that \(\Gamma\) makes \(\gamma\) true in \(\mathcal{S}\), and \(\Gamma\) proves \(\gamma\) in \(\mathcal{S}\), respectively. However, for the sake of simplicity, we will use the non-subscripted symbols to be with respect to standard propositional logic.
To make this more concrete, we can look at an example and compare each idea with each other to see how they differ. Let's look at this simple proposition: \(A \land B\). The only assignment of values that makes \(A \land B\) true is \(A\) is true and \(B\) is true (this can be found simply by looking at a truth table). From this, since we can see that when \(A \land B\) is true, \(A\) is true, we conclude that \(A \land B \vDash A\). On the other hand, we can do a simple proof in a Fitch-style proof system like so \[ \begin{aligned} & \begin{aligned} 1.\quad& A \wedge B \\ 2.\quad& A \end{aligned} & \qquad & \begin{aligned} & \text{Premise} \\ & \land\text{E: 1} \end{aligned} \end{aligned} \] and because we have a proof from \(A \land B\) to \(A\), \(A \land B \vdash A\). Notice how each notion of entailment differs in its justification. In one, we discussed the values that we can assign to each propositional variable, whereas in the other, we gave a proof. This is the key point to understanding why both are necessary, and not at all redundant.
Simply put, the deduction theorem states that if \(B\) is a logical consequence of \(A\), then \(A \to B\) is deductible within the system. Written with our meta-logical symbols from before, this would be expressed as \(A \vdash B\) implies \(\vdash A \to B\). Now, we can see how this theorem gives grounds for conditional proof to take place. Note that I don't write \((A \vdash B) \to (\vdash A \to B)\)— this is a somewhat useful distinction, and is to avoid confusion around the usage of the meta-language, which we use to talk about logic from a meta-theoretical perspective, and the object language, which is what is used within the system itself.
So how do we go about actually proving a metatheorem? It's important to pay attention to which of the notions of entailment I used to express the deduction theorem. Remember, \(\vdash\) is syntactic consequence, and concerns provability. So we need to say something about the proof system, which can look different depending on the system you're using. The one I have used for examples is specifically Fitch-style natural deduction, but some may use other methods of proof as well, such as proof trees, which can be better for certain objectives, like building counter-models. Regardless, natural deduction will make our metalogic proof much easier by providing an intuitive way for us to approach formally representing a conditional proof.
If you've taken a proofs course, you should know what mathematical induction is, but for those who do not (likely philosophers, or if you've only taken a logic course so far), induction is a method of proving that something holds over all natural numbers (\({0, 1, 2\dots}\)). It does this in two steps, called the base case and the inductive step. Suppose you have a property \(P\) that you want to prove holds for all \(n \in \mathbb{N}\). Then, the base case is first proving \(P(0)\), and the inductive step is showing that \(P(k) \to P(k + 1)\) for an arbitrary \(k \in \mathbb{N}\). During the inductive step, \(P(k)\) is called the inductive hypothesis. When written like a chain: \[P(0) \to P(1) \to P(2) \dots\] we can intuitively see how this, with \(P(0)\), ends up proving \(P\) for all natural numbers. Here's an elementary example: let \(P(n)\) be \(3~|~3n+3\), or \(3\) divides \(3n+3\).
Base Case: \(P(0)\) is true, because \(3\) divides \(3\).
Inductive Step: Suppose \(P(k)\) is true for an arbitrary \(k \in \mathbb{N}\). We won't actually have to use the inductive step in this case, and instead we just use algebra to get the following: \[ \begin{aligned} 3(k+1) + 3 &= 3(k + 1 + 1) \\ &= 3(k + 2) \end{aligned} \] and so, by definition, \(3\) divides \(3(k+1) + 3\). Therefore, we can conclude that \(3~|~3n+3\) for all \(n \in \mathbb{N}\) by induction, and the proof is complete.
Finally, we can take a look at how this can be applied to our natural deduction proofs. Let's say we want to show that a property holds for all proofs (in our case, we want to show conditional proof is valid for an arbitrary proof). Have a look at how we enumerate a length \(n\) Fitch-style proof: \[ \begin{aligned} & \begin{aligned} 1.\quad& \Gamma \\ \dots\quad& \dots \\ n.\quad& \gamma \end{aligned} & \qquad & \begin{aligned} & \text{Premises} \\ & \dots \\ & \text{Some Rule: 1$\dots n-1$} \end{aligned} \end{aligned} \] And take note that the numbering used for each line of proof. Perhaps that will give you a good idea of how to prove something over arbitrary proofs—we do induction on proofs of arbitrary length \(n\).
Before we get to actually proving the deduction theorem for propositional logic, we will do some setup by way of defining terms, introducing useful axioms of propositional logic, and formalizing a statement of the deduction theorem in the meta-language. Firstly, these are some particular axiom schemas (patterns) that will make proving the deduction theorem as easy as possible for us: \[ \begin{aligned} \text{Axiom 1: }& \alpha \to (\beta \to \alpha) \\[0.7em] \text{Axiom 2: }& (\alpha \to (\beta \to \gamma)) \to ((\alpha \to \beta) \to (\alpha \to \gamma)) \\[0.7em] \text{Modus Ponens: }& \text{from $\alpha$ and $\alpha \to \beta$, infer $\beta$} \end{aligned} \] Where \(\alpha\), \(\beta\), and \(\gamma\) are formulas, and substitution can be applied as long as the schema, matches. Although we don't prove axioms, these can be confirmed as tautologous simply by building their truth tables. We can also somewhat intuit their meanings by saying them in plain English. For axiom 1, if \(A\), then anything implies \(A\). For axiom 2, if \(A\) implies that \(B\) implies \(C\), then if \(A\) implies \(B\), \(A\) implies \(C\).
Now, the deduction theorem, again. I (sort of) lied in the first section about the deduction theorem. The full deduction theorem is actually stronger than what I said at that time, and can be applied to any set of sentences. Like before, we will use \(\Gamma\) to be an arbitrary theory, or set of sentences/formulas. Then, we will also let \(A\) and \(B\) be formulas. We will use the set union operator, \(\cup\), which is used to "put together" two sets, and \(\{A\}\) is the set containing \(A\). The full statement of the deduction theorem is \(\Gamma \cup \{A\} \vdash B\) implies \(\Gamma \vdash A \to B\).
At last, we have all the information and setup that we need to prove the deduction theorem for propositional logic. So, using all the definitions introduced in the previous section, suppose \(\Gamma \cup \{A\} \vdash B\). Our goal is to use this and our axiom schemas to show \(\Gamma \vdash A \to B\) by induction on proof length \(n \in \mathbb{N}\). That is, our property, \(P(n)\), is that \(\Gamma \cup \{A\} \vdash B\) implies \(\Gamma \vdash A \to B\) for proofs of length \(n\).
Base Case: When proof length is \(n = 1\), \(B\) must be a member of \(\Gamma \cup \{A\}\). One possibility is that \(B=A\), so \(A \to B\) is the same as \(A \to A\), and this can be proven from Axiom 1 by substituting to get \(A \to (A \to A)\), and using modus ponens to get \(A \to A\), so \(\Gamma \vdash A \to B\). The other is that \(B\) is a member of \(\Gamma\), so \(\Gamma \vdash B\), in which we again use substitution with Axiom 1, getting \(\Gamma \vdash B \to (A \to B)\), and then \(\Gamma \vdash A \to B\) by modus ponens.
Inductive Step: Now, suppose that \(\Gamma \cup \{A\} \vdash B\) implies \(\Gamma \vdash A \to B\) for all proofs of length \(k \in \mathbb{N}\) where \(1 \leq k \leq n\). This is our inductive hypothesis. This time, let \(\Gamma \cup \{A\} \vdash B\) in \(n + 1\) steps. Similar to before, there are two potential ways this came to be:
Therefore, we get that \(\Gamma \vdash A \to B\) in every case for proofs of length \(n + 1\), and the deduction theorem is proven by induction.
Starting at 1: Whether or not 0 is a natural number is actually a point of dispute for some. I, personally, would argue that 0 is, in fact, a natural number. However, the proof I showed started at 1. This is simply for the fact that a proof cannot be of length 0. So since we are doing induction on the length of the proof, it makes sense for the base case to be \(n = 1\) instead of \(n = 0\). We could actually amend the proof very slightly to instead say that \(n\) is the number of lines of the proof after \(\Gamma\). If that were the case, we would use \(n=0\) as the base case, but nothing else changes.
Induction: You may have noticed that the induction used in the proof doesn't quite look like the induction that I showed prior to it, but the idea is still the same. Formally, the induction I first showed uses the following argument (the bar and three dots mean "therefore"): \[\frac{P(0), \forall k \in \mathbb{N} (P(k) \to P(k + 1))}{\therefore \forall n \in \mathbb{N}(P(n))}\] Whereas the argument used in the proof goes like this: \[\frac{P(0), \forall k \in \mathbb{N} (\forall j \in \mathbb{N}(0 \leq j \leq k \to P(j)) \to P(k + 1))}{\therefore \forall n \in \mathbb{N}(P(n))}\] Or, in English, given the base case \(P(0)\) and that \(P\) holds for all values between \(0\) and an arbitrary \(k\), if we can show that \(P(k+1)\) holds, then we have proven \(P\) for all natural numbers by induction. This is called strong induction, whereas the first version that I showed is called ordinary induction. The difference it makes in the actual proof is simple. In ordinary induction, we assume \(P(k)\) for some arbitrary \(k\). In strong induction, we assume \(P(j)\) for all \(j\) between (inclusive) \(0\) and \(k\).
Why \(\Gamma \cup \{A\}\)? Initially, we are only starting with the set of sentences \(\Gamma\) as premises, or givens. When you begin a conditional proof, you make an assumption, say \(A\), and that assumption is added to the set of premises, for the duration of the subproof. After the conditional proof is complete, that is, we have arrived at some conclusion \(B\), we must discharge the assumption \(A\), and can no longer use it. But what can we then do with the subproof that we have completed? As already well known, we infer \(A \to B\), and that is what the deduction theorem seeks to justify.
Chicken or egg? You might be wondering how these meta-logical proofs are valid to begin with. After all, it seems like we're using some logic to prove statements about the logic. Since the logic is not yet solidly verified at the time of meta-logical proof, how are we able to use it? This StackExchange post was where I went the first time I asked myself this question, and those who answered offered some viewpoints that can help understand this. If you're reading that and have no idea what the incompleteness theorems are, they are basically a pair of theorems that concern provability of truths in a formal system. You can do more reading on them in IGT by Peter Smith.