# Closures

• We have considered the reflexive, symmetric, and transitive properties of relations.
• We know that some relations have these properties and some don't.
• The question here is: how can we turn a relation into one that does have these properties?
• That is, given a relation $$R$$, what is the least we can add to it to make it reflexive, symmetric, or transitive?
• … or maybe so it has some other property?
• More formally, for a property $$\mathbf{P}$$ of relations, the closure of $$R$$ for property $$\mathbf{P}$$ is a relation $$R_c$$ with…
• $$R_c$$ has property $$\mathbf{P}$$.
• $$R\subseteq R_c$$.
• For every relation $$S$$ with $$R\subseteq S$$, if $$S$$ has property $$\mathbf{P}$$ then $$R_c\subseteq S$$.
• Note: not every relation and property has a closure, but we can find them for the ones we're interested in.

# Reflexive Closure

• To make a relation reflexive, all we need to do are add the “self” relations that would make it reflexive.
• For a relation on a set $$A$$, we will use $$\Delta$$ to denote the set $$\{(a,a)\mid a\in A\}$$.
• Theorem: The reflexive closure of a relation $$R$$ is $$R\cup \Delta$$.

Proof Idea: Any reflexive relation containing $$R$$ must contain $$R\cup \Delta$$. If not it would either not be a superset of $$R$$ or not be reflexive.

• For example, the reflexive closure of the $$<$$ relation is: $\{(a,b)\mid a<b\}\cup \{(a,b)\mid a=b\} = \{(a,b)\mid a\le b\}\,.$
• That is, the reflexive closure of $$<$$ is $$\le$$
• Corollary: If $$R$$ is a reflexive relation, the reflexive closure of $$R$$ is $$R$$.

Proof: If $$R$$ is reflexive, then $$\Delta\subseteq R$$. Thus, $$R\cup \Delta= R$$.

• For example, $$\le$$ is its own reflexive closure.

# Symmetric Closure

• It's also fairly obvious how to make a relation symmetric: if $$(a,b)$$ is in $$R$$, we have to make sure $$(b,a)$$ is there as well.
• We already have a way to express all of the pairs in that form: $$R^{-1}$$.
• Theorem: The symmetric closure of a relation $$R$$ is $$R\cup R^{-1}$$.

Proof Idea: Any symmetric relation containing $$R$$ must contain $$R\cup R^{-1}$$. If not it would either not be a superset of $$R$$ or not be symmetric.

• For example, the symmetric closure of $$<$$ is: $\{(a,b)\mid a<b\} \cup \{(a,b)\mid a>b\} =\{(a,b)\mid a\ne b\}\,.$
• So, the symmetric closure of $$<$$ is $$\ne$$.
• Corollary: If $$R$$ is a symmetric relation, the symmetric closure of $$R$$ is $$R$$.

Proof: If $$R$$ is symmetric, then by an earlier theorem, $$R=R^{-1}$$. Thus, $$R\cup R^{-1}= R$$.

• For example, $$=$$ is its own symmetric closure.

# Transitive Closure

• Based on the above, your first thought about how to make the transitive closure might be: if both $$(a,b)$$ and $$(b,c)$$ are in the relation, then add $$(a,c)$$ to make it transitive.
• But that doesn't work. Consider the relation with this graph:
• If we add all of the edges as described, we get this:
• That's still not transitive: we still need $$(a,d)$$ in the relation to make it transitive.
• We're going to have to be a little more clever…
• If we have a directed graph $$G$$, then a path in that graph is a sequence of edges $$(x_0,x_1),(x_1,x_2),\ldots,(x_{n-1},x_n)$$.
• We will say that this is a path of length $$n$$ from $$x_0$$ to $$x_n$$.
• If $$x_0=x_n$$, then we will call it a cycle or circuit.
• In the first example graph above, there is one path from $$a$$ to $$d$$.
• In the second, there are three: $$a,b,c,d$$ and $$a,b,d$$ and $$a,c,d$$.
• Note that we haven't said anything about the $$x_k$$ being unique in our paths.
• Paths are allowed to cross themselves, or even use the same edges multiple times.
• For example, in this graph, to get from $$a$$ to $$d$$, there are paths of length 3, 5, 7, 9, 11,…
• Theorem: Let $$R$$ be a relation on $$A$$. There is a path from $$a$$ to $$b$$ of length $$n$$ in the graph of $$R$$ iff $$(a,b)\in R^n$$.

Proof: For $$n=1$$, the path must be a single edge. Such a path will exist if and only if $$(a,b)\in R$$.

Now assume for induction that the theorem is true for paths of length $$n$$ with $$n\ge 1$$. If there is a path of length $$n+1$$ from $$a$$ to $$b$$, then let $$c$$ be the last vertex in that path, so $$(c,b)$$ is the last edge, and the remainder of the path is length $$n$$ from $$a$$ to $$c$$. Thus $$(a,c)\in R^n$$ and by definition of relation composition, $$(a,b)\in R^{n+1}$$.

If $$(a,b)\in R^{n+1}$$, then there is a $$c$$ such that $$(a,c)\in R^{n}$$ and $$(c,b)\in R$$. From the inductive hypothesis, there is a path of length $$n$$ from $$a$$ to $$c$$ and adding $$(c,b)$$ gives a path of length $$n+1$$ from $$a$$ to $$b$$.

So, we have a path of length $$n+1$$ iff $$(a,b)\in R^{n+1}$$, and by induction for all $$n\ge 1$$, we have a path of length $$n$$ iff $$(a,b)\in R^n$$.

• For a relation $$R$$, we will define $$R^{*}$$ as the connectivity relation as the relation of pairs $$(a,b)$$ such that you can get from $$a$$ to $$b$$ in $$R$$ using a path of any length.
• That is, $$R^{*} = \bigcup_{n=1}^{\infty} R^n$$.
• In the flights-between-cities example, finding $$R^2$$, $$R^3$$, etc told us something useful.
• $$R^n$$ was the cities you can get between in exactly $$n$$ flights.
• We have now proved that is really true in the above theorem.
• It's then perfectly reasonable to ask what cities we can get between in any number of flights.
• $$R^{*}$$ gives us a notation for that (and formal definition of what we're talking about).
• If $$(a,b)\in R^{*}$$, then it is possible to take a series of flights to get from $$a$$ to $$b$$ (without taking the bus in between or something).
• Lemma: For a transitive relation $$R$$, all $$R^n$$ are transitive.

Proof: Consider any $$(a,b),(b,c)\in R^n$$. From the above theorem, there is a path of length $$n$$ from $$a$$ to $$b$$ and from $$b$$ to $$c$$ in $$R$$. Then we have a path of length $$2n$$ from $$a$$ to $$c$$ in $$R$$.

Since $$R$$ is transitive, we can replace the edges in this path with their two-step counterparts (they must exist since $$R$$ is transitive). The $$n=3$$ case would look like this:

This constructs a path of length $$n$$ from $$a$$ to $$c$$. Thus, $$(a,c)\in R^n$$ and $$R^n$$ is transitive.

• Theorem: The transitive closure of a relation $$R$$ is $$R^{*}$$.

Proof: In order for $$R^{*}$$ to be the transitive closure, it must contain $$R$$, be transitive, and be a subset of in any transitive relation that contains $$R$$. By the definition of $$R^{*}$$, it contains $$R$$.

If there are $$(a,b),(b,c)\in R^{*}$$, then there are $$j$$ and $$k$$ such that $$(a,b)\in R^j$$ and $$(b,c)\in R^k$$. Then $$(a,c)\in R^{j+k}\subseteq R^{*}$$. Thus, $$R^{*}$$ is transitive.

Now suppose we have a transitive relation $$S$$ with $$R\subseteq S$$. Since $$S$$ is transitive, $$S^n$$ is transitive for all $$n$$ (from the lemma) and then from an earlier theorem, $$S^n\subseteq S$$. It then follows that $$S^{*}=\bigcup_{n=1}^{\infty} S^n\subseteq S$$.

Since $$R\subseteq S$$, we know that any path in $$R$$ must also be a path in $$S$$ so $$R^{*}\subseteq S^{*}$$. Thus, $$R^{*}\subseteq S^{*}\subseteq S$$. So, $$R^{*}$$ is a subset of any transitive relation that contains $$R$$ and it meets all of the criteria required to be the transitive closure.

• Corollary: If $$R$$ is a transitive relation, the transitive closure of $$R$$ is $$R$$.

Proof: From the earlier theorem, if $$R$$ is transitive, then $$R^n\subseteq R$$ for all $$n$$. Thus $$R^{*}=\bigcup_{n=1}^{\infty} R^n\subseteq R$$. We also obviously have $$R\subseteq R^{*}$$, so $$R=R^{*}$$.

• Example: We had a relation with this matrix earlier: $\mathbf{M}_R=\left[\begin{smallmatrix}1&0&1&0 \\ 0&0&1&0 \\ 1&1&1&0 \\ 0&0&0&1 \end{smallmatrix}\right]$
• We found that the matrix of $$R^2$$ was: $\mathbf{M}_{R^2}=\left[\begin{smallmatrix}1&1&1&0 \\ 1&1&1&0 \\ 1&1&1&0 \\ 0&0&0&1 \end{smallmatrix}\right]$
• This is also the transitive closure: $$R\subseteq R^2$$, and if we continue to compose with $$R$$ the matrix doesn't change: $${R^2}={R^3}={R^4}=\cdots$$.
• Our “plus one” relation over the integers was $$R=\{(a,a+1)\mid a\in\mathbf{Z}\}$$.
• It's not transitive, so $$R^*$$ might be interesting.
• We established that $$R^2$$ was the relation containing all pairs $$(a,a+2)$$.
• Similarly, $$R^k$$ is the set of $$(a,a+k)$$ for each integer $$a$$.
• The union of all of these is $$R^{*}=\{(a,b)\mid a<b\}$$.
• That is, it is the relationship “less than” over the integers.
• That's a little surprising.

# Calculating the Transitive Closure

• The above theorems give us a method to find the transitive closure of a relation.
• Unfortunately, since it's a union of infinitely many things, it's not exactly practical to compute.
• But it turns out that we don't actually need to compute an infinite number of $$R^n$$ to get the transitive closure (of a finite set).
• Theorem: Let $$R$$ be a relation on $$A$$ with $$|A|=n$$. If there is a path from $$a$$ to $$b$$ in $$R$$, then there is a path with at most $$n$$ steps.

Proof: Suppose there is a path from $$a$$ to $$b$$ in $$R$$. Let $$m$$ be the length of the shortest such path. Assume for contradiction that $$m> n$$.

Since there are only $$n$$ vertexes in the graph of $$R$$, by the pigeonhole principle, the path from $$a$$ to $$b$$ must pass by at least one of them twice. That is, there must be a loop in the graph:

We can form a shorter path from $$a$$ to $$b$$ by simply removing the loop. This contradicts the assumption that the path was the shortest. Thus, $$m\le n$$.

• This theorem tells us that if we calculate $$R,R^2,R^3,\ldots,R^n$$ then we have found every connection within $$R$$.
• There may be more paths of length $$>n$$, but they connect elements that were already connected by some other path.
• Thus, for a relation on $$n$$ elements, the transitive closure of $$R$$ is $$\bigcup_{k=1}^{n} R^k$$.
• We can finally write an algorithm to compute the transitive closure of a relation that will complete in a finite amount of time.
• Let's assume we're representing our relation as a matrix as described earlier.
• When we use $$R$$ in the algorithm, we mean $$\mathbf{M}_R$$: the $$n\times n$$ matrix described earlier.
• $$\odot$$ denotes the operation to compute the composition of two relations.
• Calculating one entry in the matrix for $$A\odot B$$ requires computing $$(a_{i1}\wedge a_{1j})\vee (a_{i2}\wedge a_{2j})\vee\cdots \vee (a_{in}\wedge a_{nj})$$.
• So computing one entry takes $$\Theta(n)$$ steps, and calculating the whole matrix for a composition takes $$\Theta(n^3)$$.
• Our algorithm is:
procedure transitive_closure(R)
C = R
P = R
for k from 2 to n:
P = P ⊙ R             # P = R^k
C = C ∨ P             # C = R ∪ R^2 ∪ ... ∪ R^k
return C
• Since we do the $$\odot$$ operation $$n$$ times here, this algorithm has running time $$\Theta(n^4)$$.
• That's pretty bad, but the best I could do.
• Fortunately, there are people smarter than me.

# Warshall's Algorithm

• It turns out that we can do better than $$\Theta(n^4)$$ to find the transitive closure.
• Consider a relation $$R$$ on a set $$A$$ with $$|A|=n$$.
• Label the elements of $$A$$ as $$a_1,a_2,\ldots,a_n$$.
• If there is a path from $$a$$ to $$b$$ in that relation then, then…
• The path will be something like $$a,i_1,i_2,\ldots,i_k,b$$, where $$(a,i_1),(i_1,i_2),\ldots,(i_k,b)\in R$$.
• We will call the $$i_j$$ interior vertices of that path.
• That is, things in the path that aren't the endpoints.
• We can now look at a relation and ask if there is a path from $$a$$ to $$b$$ using only a particular set of interior vertices.
• In particular, we are interested in path that use only the first $$k$$ elements of $$A$$ as interior vertices.
• There is a path from $$a$$ to $$b$$ using only $$a_1,a_2\ldots,a_k$$ as interior vertices if either:
• There is a path using $$a_1,a_2\ldots,a_{k-1}$$, or
• There is a path that uses $$a_k$$: $$a,\ldots,a_k,\ldots,b$$.
• We can create an algorithm using this.
• Let $$\mathbf{W}_k$$ be the binary matrix where there is a 1 in entry $$i,j$$ if you can get from $$a_i$$ to $$a_j$$ using only interior vertices $$a_1,\ldots,a_k$$.
• We'll call the $$i,j$$ element of this matrix $$w_{ij}^{(k)}$$.
• $$\mathbf{M}_R$$ (the matrix of the relation) is the matrix of edges in the relation
• It is the elements you can get between using no interior vertices.
• That is, $$\mathbf{M}_R=\mathbf{W}_0$$.
• If we can find $$\mathbf{W}_n$$, it is the transitive closure.
• It gives the nodes we can get between using any combination of interior vertices.
• From the above, $$w_{ij}^{(k)}$$ is one if either:
• $$w_{ij}^{(k-1)}$$ is one, or
• $$w_{ik}^{(k-1)} \wedge w_{j,k}^{(k-1)}$$ is one.
• Now, it's easy enough to find $$\mathbf{W}_n$$. This is the Warshall Algorithm to find the transitive closure:
procedure warshall(R):
W = R
for k from 1 to n
for i from 1 to n
for j from 1 to n
w[i,j] = w[i,j] ∨ (w[i,k] ∧ w[k,j])
return W
• At the end of each iteration of the “for k”, W is $$\mathbf{W}_k$$
• [Actually, it might be somewhere between $$\mathbf{W}_k$$ and $$\mathbf{W}_{k+1}$$ since we might overwrite other zeros with ones as we process the matrix.]
• The running time here is pretty obvious: $$\Theta(n^3)$$.
• Our naïve algorithm above was $$\Theta(n^4)$$.
• A pretty good improvement.
• Let's try it with an example relation: $\mathbf{M}_R=\left[\begin{smallmatrix}1&1&0&0 \\0&0&1&0 \\0&0&1&1 \\0&0&0&0 \end{smallmatrix}\right]$
• Aside: with fairly minor modifications, the Warshall Algorithm can find the shortest path, not just decide whether one exists or not.
• Not very useful for transitive closure, but useful as we move on to graph theory.