# Paths

• A path is a sequence of adjacent edges.
• That is, a path of length $$n$$ is formed by edges between vertices $$v_0,v_1,\ldots,v_n$$.
• We'll say that this is a path from $$v_0$$ to $$v_n$$.
• The path passes through the vertices $$v_1,\ldots,v_{n-1}$$.
• It traverses the edges $$\{v_0,v_1\},\{v_1,v_2\},\ldots,\{v_{n-1},v_n\}$$.
• We might also write a path as its vertex sequence ($$v_0,v_1,\ldots,v_n$$) or its edge sequence ($$e_1,e_2,\ldots,e_n$$).
• For digraphs, we must follow the edges in the correct direction.
• We would probaby have to write that traverses edges $$(v_0,v_1),(v_1,v_2),\ldots,(v_{n-1},v_n)$$.
• That is, digraph edges are ordered pairs. Graph edges are essentially two-element sets (or one element for a loop) since order doesn't matter.
• It is a circuit if $$v_0=v_n$$.
• It is a simple path if it does not contain the same edge more than once.
• In general, when we say “path”, it might loop back on itself.
• A simple path is allowed to contain the same vertex more than once, just not the same edge.
• For example, in this graph there is a path of length 3 from $$a$$ to $$d$$ highlighted.
• There are also paths of length 2: $$a\rightarrow c\rightarrow d$$ and $$a\rightarrow b\rightarrow d$$.
• And a path or length 7: $$a\rightarrow b\rightarrow c\rightarrow a\rightarrow e\rightarrow b\rightarrow c\rightarrow d$$.
• In this digraph, there is no path from $$a$$ to $$d$$.
• There are paths from $$c$$ to $$e$$: $c\rightarrow e \\ c\rightarrow d\rightarrow b\rightarrow e \\ c\rightarrow d\rightarrow d\rightarrow d\rightarrow b\rightarrow c\rightarrow d\rightarrow b\rightarrow e$
• For example…
• In the digraphs of relations, a path of length $$n$$ from $$a$$ to $$b$$ meant that $$(a,b)$$ was in $$R^n$$.
• In a social network's graph, a path of length $$n$$ means you know this person through $$n-1$$ connections.
• i.e. I have a friend $$X$$ and they know $$Y$$. I am connected to $$Y$$ by a path of length 2.
• LinkedIn actually shows you these paths: I am a “2nd degree contact” when there is a path of length 2.
• (Apparently, I'm degree 2 to the Flickr founders, degree 3 to basically everybody else in the tech world.)
• For the LinkedIn graph, $$|V|\approx 2.25\times 10^{8}$$ so whatever algorithm finds those connections had better be low-big-O.
• In fact, it can't even be $$\Theta(|V|)$$: that would be too big to compute when I look at somebody's profile.

# Connectedness

• An undirected graph is called connected if there is a path between every pair of nodes in the graph.
• So obviously, unconnected graphs are ones where there are some sections of the graph with no edges between them.
• Most of our examples have been connected. The graph of an equivalence relation wasn't:
• A connected component in a graph is a subgraph that is connected.
• … and is not a subgraph of a larger connected subgraph of the original.
• For example, in the above, the subgraph with vertices $$\{2,5,8,11\}$$ (and all of the edges between) is a connected component.
• The subgraph with vertices $$\{2,5,8\}$$ is not a connected component since is is a proper subgraph of a connected part of the original.
• Is the graph of a social network connected?
• Probably not: there are users who sign up and never make any friends. They form a connected component with one vertex.
• There are probably some size 2 components: people who sign up, friend each other, and then do nothing else.
• If we exclude those not-really-active users, then probably the rest is connected.
• If not, then that's really interesting.
• Theorem: In a connected (undirected) graph, there is a simple path between every pair of vertices.

Proof: Consider any two vertices $$u$$ and $$v$$. We know from the definition of connectedness that there is a shortest path $$u,x_1,x_2,\ldots,x_{n-1},v$$ between them. Suppose for contradiction that it is non-simple path with a loop.

If there is a loop, then the same vertex must appear twice in the path: for some $$i,j$$, we have $$x_i=x_j$$. We can remove the vertices $$x_i,x_{i+1},\ldots,x_j$$ from the path to create a shorter path from $$u$$ to $$v$$. This is a contradiction, so a simple path must exist.

• [That proof actually shows that the shortest path from $$u$$ to $$v$$ is simple, which is not much of a surprise.]
• Theorem: If a graph has exactly two vertices with odd degree, then there must be a path between them.

Proof: Suppose there are two odd-degree vertices $$u$$ and $$v$$ with no path between them. Then by definition, they are in separate connected components. The subgraph consisting of each component has a single vertex of odd degree. This is a contradiction: we saw earlier that each graph has an even number of odd-degree vertices.

• For a digraph, we have to be more careful with our definition of connectedness: it is possible that there is a path from $$a$$ to $$b$$ but not from $$b$$ to $$a$$.
• Or that there are edges between $$a$$ and $$b$$ but they are in the wrong direction, so there is no path.
• A directed graph is called strongly connected if for any vertices $$a$$ and $$b$$, there is a path from $$a$$ to $$b$$ and from $$b$$ to $$a$$.
• A directed graph is called weakly connected if for any vertices $$a$$ and $$b$$, there is a path from $$a$$ to $$b$$ in the corresponding undirected graph.
• That is, if we take the direction away from each edge, can we get everywhere?
• For example, here is a directed graph we saw earlier and its undirected version:
• This graph is weakly connected (since the graph on the right is connected).
• It is strongly connected because we can find a path from any node to any other on the left.
• Theorem: If a digraph is strongly connected, then it is weakly connected.

Proof idea: Take the path from the directed graph and use it as a path in the undirected graph.

• In the digraph, what does it mean if it is strongly/weakly connected?
• If not strongly connected, in $$R^{*}$$ there is no edge between some vertices.
• If not weakly connected, there are disconnected components where no pair of things from the components can be compared.
• Example: we can model a game using graphs. For example, tic-tac-toe (井字脚趾).
• Each possible state of the game will be a vertex. (e.g. empty board, board where X has moved in the upper-left,…)
• A (directed) edge will indicate that there is a legal move to get from one state to another.
• Here is a small part of the graph:
• If I imagine every possible state of the board, there are some that you can't actually reach by playing. (Like the ones adjacent to the “?”: I don't think there is a path from the empty board to them. Either it would require illegal moves, or the game would end first.)
• So if we include those states, the graph is neither strongly nor weakly connected.
• If we only include the reachable states, then it's strongly but not weakly connected. (I can't get from anywhere else to the initial empty board.)
• If I was writing a program to analyze the game, I should ignore states that aren't actually possible to reach.
• Also learned from this small part of the graph: there are some states it's possible to reach in more than one way. (e.g. the other one in the fourth row)
• So in my program, I should be careful to not analyze them many times: that would be more computation than actually needed.
• So we have learned a little about the game from drawing a graph. Other questions we could ask:
• Is the graph bipartite? Yes (if we ignore edge direction). It can be partitioned into “X about to move” and “O about to move”.
• Are there loops in the graph? No: since we can't erase moves, it's not possible to return to a previous state.
• What is $$|V|$$? Can we actually analyze the game? $$|V|<3^9=19683$$, so totally possible to check every state.
• That means that I can write a program that plays “perfectly”.
• Where are the “good” states for a player? What can X do so that every state below is a win for X?
• Example: we can model links on web pages with a digraph.
• Each page is a node; each link is an edge from the page where the link is to its destination.
• Apparently there is a giant strongly connected component in the graph: the big collection of pages that both have and receive links are all mutually-reachable.
• But there are also big chunks disconnected from that.
• e.g. a corporate site where they aren't allowed to link to outside sites (because marketing wants users to stay on the company's site and not leave) won't be strongly connected to the rest of the web since there are no paths out.
• There are also probably many small clusters: sites that link out but nobody noticed them to link in.
• Paths also give us something else to look for when checking for isomorphism.
• If there is a path of length 3 from $$a$$ to $$b$$, then after the isomorphism is applied, there will be a path of length 3 from $$f(a)$$ to $$f(b)$$.
• … and if there are three paths of length 4 before, there will be three paths of length 4 after.
• In other words, paths in the graph are invariants.
• A graph's adjacency matrix can tell us a lot about its paths.
• For example, this digraph:
• … and its adjacency matrix: $M=\left[\begin{smallmatrix}1&1&0&0&1 \\ 1&0&0&0&0 \\ 0&1&0&1&0 \\ 0&1&1&0&0 \\ 1&0&0&0&1 \end{smallmatrix}\right]$
• If we multiply the matrix by itself, we are going to calculate in the $$i,j$$ position the value $$m_{i1}m_{1j}+m_{i2}m_{2j}+\cdots+m_{in}m_{nj}$$
• Each of the terms there will be 1 if there is an edge from $$i$$ to $$k$$ and from $$k$$ to $$j$$.
• So their total is the number of paths of length 2 from $$i$$ to $$j$$.
• This matrix gives us the number of paths of length 2 between each pair of vertices: $M^2=\left[\begin{smallmatrix}3&1&0&0&2 \\ 1&1&0&0&1 \\ 1&1&1&0&0 \\ 1&1&0&1&0 \\ 2&1&0&0&2\end{smallmatrix}\right]$
• And if we keep raising the power, we get the number of paths of that length: $M^3=\left[\begin{smallmatrix}6&3&0&0&5 \\ 3&1&0&0&2 \\ 2&2&0&1&1 \\ 2&2&1&0&1 \\ 5&2&0&0&4\end{smallmatrix}\right] \quad M^4=\left[\begin{smallmatrix}14&6&0&0&11 \\ 6&3&0&0&5 \\ 5&3&1&0&3 \\ 5&3&0&1&3 \\ 11&5&0&0&9\end{smallmatrix}\right] \quad M^5=\left[\begin{smallmatrix}31&14&0&0&25 \\ 14&6&0&0&11 \\ 11&6&0&1&8 \\ 11&6&1&0&8 \\ 25&11&0&0&20\end{smallmatrix}\right]$
• So with matrix multiplication, we can
• Find shortest paths: smallest power where the entry is $$\ge 1$$.
• Find out if the graph is connected: any zeros in $$\sum_{k=1}^{n}M^k$$?
• But those might not be the best algorithms.
• Calculating $$\sum_{k=1}^{n}M^k$$ will take $$\Theta(|V|^4)$$.
• I bet we could do better than that.

# Shortest Paths

• Several of the things we have seen that can be done with graphs make paths useful:
• Graphs modelling a road map: paths are ways to get from one place to another.
• Mazes: paths are a way through the maze (or from one part of the maze to another).
• Computer networks: paths are a way to get information from one computer to another.
• In all of those cases, we're probably interested in the shortest path from one node to another.
• But “shortest” might not mean “fewest edges”.
• e.g. on a road map, a long stretch of bad road takes much longer than a short piece of highway.
• We can label the edges with a weight to create a weighted graph:
• For roads, these would be the distance/time/fuel needed to take that route.
• Our goal is now to find the path with the lowest possible edge weight along it.
• Dijkstra's Algorithm finds the shortest path between two vertices in a (connected) graph.
• We will use $$w(u,v)$$ to denote the weight of the edge from $$u$$ to $$v$$.
• We assume here that $$w(u,v)=\infty$$ if there is no edge between the two nodes, just to keep the pseudocode simple.
• The algorithm assumes there are no negative weights: $$w(u,v)\ge 0$$.
procedure Dijkstra's_Algorithm(V, w, a, z)
set distance_to[v] = ∞ for all vertices v ∈ V
distance_to[a] = 0
processed = {}
until z ∈ processed
u = a vertex in V-processed with distance_to[u] minimal
processed = processed ∪ {u}
for each v in V-processed
if distance_to[u] + w(u,v) < distance_to(v)
distance_to[v] = distance_to[u] + w(u,v)
arrived_from[v] = u
return distance_to(z)

• The basic idea:
• For each node you have looked at, keep track of the shortest path you have found to get there.
• At each step, find the node closest to $$a$$ that you haven't examined closely.
• We know there is no closer way to get there: any un-examined path would be longer than the one we have already found.
• See if that node can be used to form a shorter path to anywhere.
• Since it was the closest to $$a$$, after we examine all of its edges, there's no need to look at it again.
• Let's try it with the graph above…
• The algorithm is actually really close to finding the shortest path from $$a$$ to everywhere.
• Running time?
• The inner loop runs $$(|V|-1) + (|V|-2)+\cdots+2+1$$ times (or until the “until” ends the outer loop).
• So, it takes $$O(|V|^2)$$ time worst case.
• Its memory usage: processed and distance_to both take $$O(|V|)$$ extra storage.
• But actually…
• We don't actually have to examine every vertex in the inner loop.
• If we had an adjacency list, we could just look at those vertices and ignore the $$w(u,v)=\infty$$ cases that don't matter.
• That change would get us $$O(|E|)$$ time in the worst case.
• The same in general (if there are a lot of edges), but probably better for most graphs.
• And another example to try: