Tarjan’s Algorithm: Finding Strongly Connected Components (SCC) step-by-step
Master graph partitioning by deriving and implementing Tarjan's linear-time SCC algorithm in Java.

Abstract Algorithms
Helping engineers master software engineering topics.
TLDR: In directed graphs, a Strongly Connected Component (SCC) is a maximal subset of vertices where every vertex is reachable from any other vertex in the subset. Finding these components is crucial for resolving circular dependencies. Tarjan's algorithm finds all SCCs in a single Depth-First Search (DFS) pass, achieving an optimal $O(V + E)$ time complexity. This guide details its mechanics, mathematical models, and provides a full Java implementation.
📖 Concept: Graph Reachability and Strongly Connected Components
Graph reachability is a fundamental concept in computer science. In a directed graph, we say a vertex $v$ is reachable from a vertex $u$ if there exists a directed path starting at $u$ and ending at $v$. Reachability is not always symmetric; in a directed graph, the existence of a path from $u$ to $v$ does not guarantee a return path from $v$ to $u$.
However, when mutual reachability does exist between a set of vertices, they form a Strongly Connected Component (SCC). Formally, an SCC of a directed graph $G = (V, E)$ is a maximal subgraph $G' = (V', E')$ such that for every pair of vertices $u, v \in V'$, there is a directed path from $u$ to $v$ and a directed path from $v$ to $u$.
Identifying SCCs is essential for analyzing networks and solving software engineering challenges. For instance:
- Circular Dependency Resolution: In build systems (like Maven or Gradle) or module bundlers (like Webpack), finding circular references between packages or files is equivalent to finding SCCs containing more than one vertex.
- Social Network Subgroups: Identifying highly cohesive communities where information can flow mutually between all members.
- Simplifying Complex Networks: We can compress any directed graph into a Directed Acyclic Graph (DAG) by collapsing each SCC into a single super-vertex, simplifying downstream topological routing.
⚙️ Mechanics: DFS Traversal and Stack Constraints
To find SCCs, we run a Depth-First Search (DFS) traversal. The key challenge is to identify the "entry point" of each SCC during our traversal. We call this entry point the root of the strongly connected component.
Tarjan's algorithm achieves linear time execution by tracking two integer values for each node during a single DFS pass:
- DFS Index (
ids[u]): A unique, incrementing ID assigned to vertex $u$ in the order it is first discovered by DFS. - Low-Link Value (
low[u]): The smallest DFS index reachable from $u$, including $u$ itself, through a path of zero or more tree edges followed by at most one back-edge.
As we traverse the graph, we push nodes onto an auxiliary stack. When the DFS returns from traversing the neighbors of a vertex $u$, if its low-link value equals its DFS index (low[u] == ids[u]), it means $u$ is the root of an SCC. We then pop all nodes off the stack down to and including $u$. All these popped nodes belong to the same SCC.
📊 Flow: Graph Partitioning Sequence
The flowchart below visualizes how Tarjan's algorithm traverses a directed graph, updates node attributes, and extracts SCCs using the stack:
flowchart TD
Start[Start DFS on unvisited Node u] -->|1. Assign ids and low-link| Assign[ids u = counter, low u = ids u]
Assign -->|2. Push to Stack| Push[Push u to stack]
Push -->|3. For each neighbor v| Loop[Check neighbor state]
Loop -->|v not visited| Recurse[Recurse DFS v]
Recurse -->|On return| UpdateUnvisited[low u = min low u, low v]
Loop -->|v on stack| UpdateOnStack[low u = min low u, ids v]
Loop -->|Loop finished| RootCheck{low u == ids u?}
RootCheck -->|Yes| PopStack[Pop stack until u retrieved]
PopStack -->|Collect SCC| SaveSCC[Save popped nodes as SCC]
RootCheck -->|No| Return[Return to parent DFS]
To see this in action, let us trace a query on a directed graph with 4 nodes: 0 -> 1 -> 2 -> 0 (forming a cycle) and an edge 2 -> 3.
The table below traces the state modifications of each node during execution:
| Traversal Step | Node | Operation | Stack State | IDs Table | Low-Link Table | Action |
| 1 | 0 | Visit | [0] | [0, -, -, -] | [0, -, -, -] | Recurse to 1 |
| 2 | 1 | Visit | [0, 1] | [0, 1, -, -] | [0, 1, -, -] | Recurse to 2 |
| 3 | 2 | Visit | [0, 1, 2] | [0, 1, 2, -] | [0, 1, 2, -] | Recurse to 3 |
| 4 | 3 | Visit | [0, 1, 2, 3] | [0, 1, 2, 3] | [0, 1, 2, 3] | No neighbors. low[3] == ids[3] |
| 5 | 3 | Pop | [0, 1, 2] | [0, 1, 2, 3] | [0, 1, 2, 3] | SCC 1 collected: {3} |
| 6 | 2 | Check 2 -> 0 | [0, 1, 2] | [0, 1, 2, 3] | [0, 1, 0, 3] | 0 is on stack. low[2] = min(2, ids[0]) = 0 |
| 7 | 1 | Return | [0, 1, 2] | [0, 1, 2, 3] | [0, 0, 0, 3] | low[1] = min(1, low[2]) = 0 |
| 8 | 0 | Return | [0, 1, 2] | [0, 1, 2, 3] | [0, 0, 0, 3] | low[0] = min(0, low[1]) = 0. low[0] == ids[0] |
| 9 | 0 | Pop | [] | [0, 1, 2, 3] | [0, 0, 0, 3] | SCC 2 collected: {2, 1, 0} |
🧠 Deep Dive: Tree Back-Edges and Cycle Identification
Implementing Tarjan's algorithm efficiently requires understanding how back-edges detect cycles.
Low-Link Value Tracking Internals
The low-link value of a node is updated dynamically based on whether a neighbor is currently on the stack.
- If a neighbor $v$ has not been visited, we recursively traverse it. When the traversal returns, we update the parent's low-link value:
$$ \text{low}[u] = \min(\text{low}[u], \text{low}[v]) $$
This propagates low-link values back up the DFS tree. - If a neighbor $v$ has already been visited and is on the stack, it means we have found a back-edge (a cycle). We update the parent's low-link value using the neighbor's DFS index:
$$ \text{low}[u] = \min(\text{low}[u], \text{ids}[v]) $$
Mathematical Model of Tarjan's Equivalence Classes
Let $G = (V, E)$ be a directed graph. The mutual reachability relation $\sim$ is defined on $V$ as: $$ u \sim v \iff (u \to^ v) \land (v \to^ u) $$ where $\to^$ denotes a directed path of length $\ge 0$. The relation $\sim$ is an *equivalence relation because it satisfies:
- Reflexivity: $u \sim u$ (a node can always reach itself via a path of length 0).
- Symmetry: If $u \sim v$, then $v \sim u$.
- Transitivity: If $u \sim v$ and $v \sim w$, then $u \sim w$.
Therefore, the mutual reachability relation partitions the vertex set $V$ into disjoint equivalence classes. These equivalence classes are exactly the strongly connected components of the graph. Tarjan's algorithm calculates these equivalence classes in linear time by mapping them to subtree roots in the DFS execution tree.
Performance Analysis of Linear Time DFS
The time complexity of Tarjan's algorithm is $O(V + E)$, where $V$ is the number of vertices and $E$ is the number of edges. This is optimal because the algorithm visits each vertex and examines each edge exactly once during the DFS traversal.
The space complexity is $O(V)$ to store the array states (ids, low, and stack flags) and the recursion stack, matching the memory requirements of standard DFS.
🏗️ Advanced Concepts: Cross-Edges and Stack Containment Rules
During DFS, we categorize edges into four types: tree edges, forward edges, back-edges, and cross-edges.
- Back-Edges: Point from a node to one of its ancestors in the DFS tree. These edges always represent cycles, so we use them to update low-link values.
- Cross-Edges: Point from a node to a previously visited node that is not its ancestor.
A cross-edge $u \to v$ can connect different subtrees. If the target node $v$ has already been processed and popped from the stack, it belongs to a completed SCC. In this case, we must ignore the edge and not update the low-link value of $u$, because there is no path back from $v$ to $u$.
This is why we maintain a boolean onStack[] array. We only update low-link values if the target node is currently on the stack (onStack[v] == true).
🌍 Applications: From Circular Dependencies to Social Graphs
- Dependency Analysis in Package Managers: Checking for dependency cycles in package structures (e.g. npm dependencies) to prevent compilation loops.
- Garbage Collection Reference Tracing: Modern runtimes use graph reachability to find cyclic object references and clean up unreferenced memory groups.
- Optimizing Database Queries: Breaking recursive SQL queries down into independent DAG components to parallelize database executions.
⚖️ Trade-offs and Failure Modes
- Recursion Depth Limits: For massive graphs (e.g., millions of nodes in a social network), a standard recursive DFS can exhaust the JVM call stack, causing a
StackOverflowError. - Mitigation: Implement Tarjan's algorithm using an iterative DFS structure. This replaces the recursive JVM stack with a custom heap-allocated array stack.
🧭 Decision Guide: Tarjan vs. Kosaraju vs. Gabow
Use this table to choose the right SCC algorithm for your graph application:
| Algorithm | Single DFS Pass? | Space Complexity | Practical Advantage |
| Tarjan's | Yes | $O(V)$ | Fast execution; finds components in a single DFS pass. |
| Kosaraju's | No (requires 2 passes) | $O(V)$ | Easier to conceptualize; requires transposing the graph. |
| Gabow's | Yes | $O(V)$ | Bypasses low-link integer tracking, using a second stack to track path limits. |
🧪 Practical Implementation: Java Reference Code
Here is a complete, production-ready Java implementation of Tarjan's Strongly Connected Components algorithm.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Stack;
public class TarjansSCC {
private int idCounter = 0;
private int[] ids;
private int[] low;
private boolean[] onStack;
private Stack<Integer> stack;
private List<List<Integer>> sccs;
public List<List<Integer>> findSCCs(int numVertices, List<List<Integer>> adjacencyList) {
if (numVertices <= 0 || adjacencyList == null) {
return new ArrayList<>();
}
ids = new int[numVertices];
low = new int[numVertices];
onStack = new boolean[numVertices];
stack = new Stack<>();
sccs = new ArrayList<>();
// Initialize discovery IDs with a sentinel value (-1)
Arrays.fill(ids, -1);
for (int i = 0; i < numVertices; i++) {
if (ids[i] == -1) {
dfs(i, adjacencyList);
}
}
return sccs;
}
private void dfs(int u, List<List<Integer>> adj) {
ids[u] = idCounter;
low[u] = idCounter;
idCounter++;
stack.push(u);
onStack[u] = true;
for (int v : adj.get(u)) {
if (ids[v] == -1) {
// Neighbor not visited yet: recurse
dfs(v, adj);
low[u] = Math.min(low[u], low[v]);
} else if (onStack[v]) {
// Back-edge found: update low-link value using neighbor's index
low[u] = Math.min(low[u], ids[v]);
}
// If ids[v] != -1 and !onStack[v], this is a cross-edge to a completed SCC. Ignore it.
}
// If u is the root of a strongly connected component, collect the nodes
if (low[u] == ids[u]) {
List<Integer> component = new ArrayList<>();
while (true) {
int node = stack.pop();
onStack[node] = false;
component.add(node);
if (node == u) {
break;
}
}
sccs.add(component);
}
}
}
📚 Lessons Learned: Common Graph Traversal Bugs
- Forgetting to check the
onStackconstraint: If you update the low-link value of a node using a cross-edge to a completed SCC (whereonStack[v]is false), you will corrupt the low-link values, leading to incorrect SCC partitions. Always checkonStack[v]before updatinglow[u]. - Resetting
idCounterinside DFS loops: TheidCountermust be a global variable that increments continuously across all DFS calls. Resetting it for each unvisited root component will cause duplicate IDs, breaking the low-link verification logic. - Array index out of bounds on Graph Representation: When mapping nodes to arrays, ensure the vertex IDs are normalized to the range
[0, numVertices - 1]. If vertex IDs are arbitrary integers, use aMap<Integer, Integer>to map them to contiguous indices before running the algorithm.
📌 Summary: The Tarjan's Algorithm Cheatsheet
- Definition: An SCC is a maximal subset of vertices where every vertex can reach every other vertex.
- Complexity: Time complexity is $O(V + E)$; space complexity is $O(V)$.
- Internal State Variables:
ids[u]: The order index in which vertexuwas first visited.low[u]: The lowest DFS index reachable fromu.
- SCC Root Identification: Occurs during traversal backtracking when
low[u] == ids[u]. - Stack Collection: Pop nodes off the stack down to and including the root node to collect all members of the SCC.
AI-generated article quiz
Test your understanding
Ready to test what you just learned?
Generate four focused questions from this article. Answers include immediate explanations.
Guided series path
Data Structures and Algorithms
Reader feedback
Was this article useful?
Rate it if it helped, then continue with the next deep dive when you are ready.
Sign in to save your rating.
Article metadata