Graphs in Python: Implementation & Key Algorithms

Graph: Non-Linear Data Structure

A Graph is a fundamental non-linear data structure consisting of a set of vertices (nodes) and a set of edges that connect pairs of vertices. Unlike trees (which are acyclic graphs with a hierarchical structure), graphs can have cycles, multiple connections between nodes, and no fixed root—making them ideal for modeling real-world relationships like social networks, road maps, and computer networks.

I. Core Concepts & Terminology

1. Basic Definitions

Vertex (Node): A fundamental unit of the graph (e.g., a city in a road map, a user in a social network).
Edge: A connection between two vertices (e.g., a road between two cities, a friendship between two users).
Weight: A numerical value assigned to an edge (e.g., distance between cities, bandwidth of a network link).
Path: A sequence of vertices where each consecutive pair is connected by an edge (e.g., A → B → C).
Cycle: A path that starts and ends at the same vertex (e.g., A → B → C → A).

2. Graph Types

Type	Description
Undirected Graph	Edges have no direction (e.g., friendships—if A is friends with B, B is friends with A).
Directed Graph (Digraph)	Edges have a direction (e.g., Twitter follows—A follows B does not mean B follows A).
Weighted Graph	Edges have numerical weights (e.g., road maps with distance).
Unweighted Graph	Edges have no weights (only presence/absence of connection).
Cyclic Graph	Contains at least one cycle (e.g., A → B → C → A).
Acyclic Graph	No cycles (e.g., DAG—Directed Acyclic Graph, used in task scheduling).
Connected Graph	Every pair of vertices is connected by a path (undirected).
Disconnected Graph	Contains at least two vertices with no path between them (undirected).

3. Graph Representation

Two common ways to represent graphs in code:

A. Adjacency List (Most Common)

Uses a dictionary (or list) where each vertex maps to a list of adjacent vertices (and weights, for weighted graphs).
Pros: Efficient for sparse graphs (few edges), uses O(V + E) space.
Cons: Less efficient for dense graphs (many edges) when checking adjacency (O(degree(V)) time).

B. Adjacency Matrix

Uses a V×V matrix where matrix[i][j] is 1 (or weight) if there is an edge between vertex i and j, else 0.
Pros: O(1) time for adjacency checks, simple to implement for dense graphs.
Cons: Uses O(V²) space (inefficient for sparse graphs).

II. Graph Implementation (Python)

We’ll implement an undirected, weighted graph using an adjacency list (the most practical choice for most applications). The implementation supports core operations like adding vertices/edges, and traversal algorithms.

1. Graph Class

python

运行

class Graph:
    def __init__(self, directed=False):
        self.graph = {}  # Adjacency list: {vertex: [(neighbor, weight), ...]}
        self.directed = directed  # True for digraph, False for undirected

    # -------------------------- Add Vertices & Edges --------------------------
    def add_vertex(self, vertex):
        """Add a vertex to the graph (no duplicates)"""
        if vertex not in self.graph:
            self.graph[vertex] = []
            return True
        return False  # Vertex already exists

    def add_edge(self, v1, v2, weight=1):
        """Add an edge between v1 and v2 (with optional weight)"""
        # Check if both vertices exist
        if v1 not in self.graph or v2 not in self.graph:
            raise ValueError("Both vertices must be in the graph")
        # Add edge v1 → v2
        self.graph[v1].append((v2, weight))
        # If undirected, add edge v2 → v1
        if not self.directed:
            self.graph[v2].append((v1, weight))

    # -------------------------- Traversal Algorithms --------------------------
    def dfs_recursive(self, start_vertex, visited=None):
        """Depth-First Search (DFS) → recursive (returns traversal order)"""
        if visited is None:
            visited = set()  # Track visited vertices (avoids cycles)
        if start_vertex not in self.graph:
            raise ValueError("Start vertex not in graph")
        
        visited.add(start_vertex)
        traversal = [start_vertex]

        # Recursively visit all unvisited neighbors
        for neighbor, _ in self.graph[start_vertex]:
            if neighbor not in visited:
                traversal.extend(self.dfs_recursive(neighbor, visited))
        
        return traversal

    def dfs_iterative(self, start_vertex):
        """Depth-First Search (DFS) → iterative (uses stack)"""
        if start_vertex not in self.graph:
            raise ValueError("Start vertex not in graph")
        
        visited = set()
        stack = [start_vertex]
        traversal = []

        while stack:
            vertex = stack.pop()  # LIFO: pop last element
            if vertex not in visited:
                visited.add(vertex)
                traversal.append(vertex)
                # Push neighbors in reverse order to maintain same order as recursive DFS
                for neighbor, _ in reversed(self.graph[vertex]):
                    if neighbor not in visited:
                        stack.append(neighbor)
        
        return traversal

    def bfs(self, start_vertex):
        """Breadth-First Search (BFS) → iterative (uses queue)"""
        if start_vertex not in self.graph:
            raise ValueError("Start vertex not in graph")
        
        visited = set()
        queue = [start_vertex]
        traversal = []

        while queue:
            vertex = queue.pop(0)  # FIFO: pop first element
            if vertex not in visited:
                visited.add(vertex)
                traversal.append(vertex)
                # Enqueue all unvisited neighbors
                for neighbor, _ in self.graph[vertex]:
                    if neighbor not in visited:
                        queue.append(neighbor)
        
        return traversal

    # -------------------------- Shortest Path (Unweighted Graph) --------------------------
    def shortest_path_unweighted(self, start_vertex, end_vertex):
        """Find shortest path in unweighted graph (BFS) → returns path as list of vertices"""
        if start_vertex not in self.graph or end_vertex not in self.graph:
            raise ValueError("Both vertices must be in the graph")
        if start_vertex == end_vertex:
            return [start_vertex]
        
        visited = set()
        queue = [(start_vertex, [start_vertex])]  # (current_vertex, path_so_far)

        while queue:
            vertex, path = queue.pop(0)
            visited.add(vertex)

            for neighbor, _ in self.graph[vertex]:
                if neighbor == end_vertex:
                    return path + [neighbor]  # Return path when end is found
                if neighbor not in visited:
                    queue.append((neighbor, path + [neighbor]))
        
        return None  # No path exists

    # -------------------------- Helper Methods --------------------------
    def get_vertices(self):
        """Return list of all vertices"""
        return list(self.graph.keys())

    def get_edges(self):
        """Return list of all edges (as tuples: (v1, v2, weight))"""
        edges = []
        for v1 in self.graph:
            for v2, weight in self.graph[v1]:
                # Avoid duplicate edges in undirected graphs (store (v1, v2) only if v1 < v2)
                if not self.directed and v1 > v2:
                    continue
                edges.append((v1, v2, weight))
        return edges

    def __str__(self):
        """String representation of the graph (adjacency list)"""
        result = []
        for vertex in self.graph:
            neighbors = ", ".join([f"{neighbor} (w={weight})" for neighbor, weight in self.graph[vertex]])
            result.append(f"{vertex}: [{neighbors}]")
        return "\n".join(result)

2. Test the Graph

python

运行

if __name__ == "__main__":
    # Create an undirected, weighted graph
    g = Graph(directed=False)

    # Add vertices
    vertices = ["A", "B", "C", "D", "E", "F"]
    for v in vertices:
        g.add_vertex(v)

    # Add edges (v1, v2, weight)
    g.add_edge("A", "B", 2)
    g.add_edge("A", "C", 1)
    g.add_edge("B", "D", 3)
    g.add_edge("C", "D", 1)
    g.add_edge("C", "E", 4)
    g.add_edge("D", "E", 1)
    g.add_edge("D", "F", 2)
    g.add_edge("E", "F", 1)

    # Print graph
    print("=== Graph Adjacency List ===")
    print(g)
    print("\nVertices:", g.get_vertices())
    print("Edges:", g.get_edges())

    # Traversals
    print("\n=== DFS Recursive (Start: A) ===")
    print(g.dfs_recursive("A"))  # Output: ['A', 'B', 'D', 'C', 'E', 'F'] (order may vary)
    print("\n=== DFS Iterative (Start: A) ===")
    print(g.dfs_iterative("A"))  # Output: ['A', 'C', 'E', 'F', 'D', 'B'] (order may vary)
    print("\n=== BFS (Start: A) ===")
    print(g.bfs("A"))  # Output: ['A', 'B', 'C', 'D', 'E', 'F'] (level-order)

    # Shortest path (unweighted)
    print("\n=== Shortest Path (A → F) ===")
    print(g.shortest_path_unweighted("A", "F"))  # Output: ['A', 'C', 'D', 'F'] (length 3 edges)

III. Key Algorithms for Graphs

Graphs power many critical algorithms—below are the most important ones, with use cases and high-level explanations:

1. Traversal Algorithms

A. Depth-First Search (DFS)

Strategy: Explore as far as possible along a branch before backtracking (uses a stack or recursion).
Time Complexity: O(V + E) (visits each vertex and edge once).
Space Complexity: O(V) (stack/visited set stores up to V vertices).
Use Cases:
- Finding connected components (e.g., identifying clusters in social networks).
- Detecting cycles (e.g., in task scheduling to avoid circular dependencies).
- Topological sorting (for DAGs).
- Maze solving and backtracking problems (e.g., Sudoku).

B. Breadth-First Search (BFS)

Strategy: Explore all vertices at the current depth level before moving to vertices at the next depth level (uses a queue).
Time Complexity: O(V + E).
Space Complexity: O(V) (queue/visited set stores up to V vertices).
Use Cases:
- Finding the shortest path in unweighted graphs (e.g., shortest path between two cities with unweighted roads).
- Level-order traversal (e.g., analyzing hierarchical relationships in a graph).
- Flood fill algorithms (e.g., image editing to fill a region with color).

2. Shortest Path Algorithms (Weighted Graphs)

A. Dijkstra’s Algorithm

Purpose: Find the shortest path from a single source vertex to all other vertices in a weighted graph with non-negative weights.
Strategy: Uses a priority queue to always expand the shortest known path first.
Time Complexity: O((V + E) log V) (with a binary heap).
Use Cases:
- GPS navigation (finding shortest/fastest route).
- Network routing (finding path with minimum latency).

B. Bellman-Ford Algorithm

Purpose: Find the shortest path from a single source vertex to all other vertices in a weighted graph with negative weights (but no negative cycles).
Strategy: Relaxes all edges V-1 times (since the longest shortest path has V-1 edges).
Time Complexity: O(V×E).
Use Cases:
- Detecting negative cycles (e.g., in financial models to identify arbitrage opportunities).
- Graphs with negative weights (e.g., energy networks with gain/loss).

C. Floyd-Warshall Algorithm

Purpose: Find the shortest path between all pairs of vertices in a weighted graph (supports negative weights, no negative cycles).
Strategy: Uses dynamic programming to build a distance matrix where dist[i][j] is the shortest path from i to j.
Time Complexity: O(V³).
Use Cases:
- Small graphs where all-pairs shortest paths are needed (e.g., network design).

3. Minimum Spanning Tree (MST) Algorithms

Purpose: Find a subset of edges that connects all vertices with no cycles and minimum total edge weight (for undirected, connected graphs).
Use Cases:
- Designing minimum-cost networks (e.g., laying fiber-optic cables, electrical grids).
- Cluster analysis (machine learning).

A. Kruskal’s Algorithm

Strategy: Sorts all edges by weight, then adds edges one by one (avoiding cycles using Union-Find).
Time Complexity: O(E log E) (dominated by edge sorting).

B. Prim’s Algorithm

Strategy: Builds MST incrementally from a starting vertex, adding the shortest edge connecting the current MST to a non-MST vertex.
Time Complexity: O((V + E) log V) (with a priority queue).

IV. Time & Space Complexity Summary

Operation/Algorithm	Time Complexity	Space Complexity
Add Vertex/Edge	O(1)	O(1)
DFS (Recursive/Iterative)	O(V + E)	O(V)
BFS	O(V + E)	O(V)
Dijkstra’s Algorithm	O((V + E) log V)	O(V)
Bellman-Ford Algorithm	O(V×E)	O(V)
Floyd-Warshall Algorithm	O(V³)	O(V²)
Kruskal’s Algorithm	O(E log E)	O(V + E)
Prim’s Algorithm	O((V + E) log V)	O(V)

V. Practical Applications of Graphs

Graphs are ubiquitous in computer science and real-world systems—here are key use cases:

Social Networks: Model users (vertices) and friendships/follows (edges) for recommendations, community detection, and pathfinding (e.g., “mutual friends”).
Navigation Systems: Model roads (edges) and intersections (vertices) with weights (distance/time) to find shortest/fastest routes (Dijkstra’s Algorithm).
Computer Networks: Model routers (vertices) and connections (edges) to optimize data routing, detect bottlenecks, and ensure redundancy.
Web Crawling: Model web pages (vertices) and hyperlinks (edges) to crawl the web (BFS/DFS) and build search engine indexes.
Task Scheduling: Use DAGs to model tasks (vertices) and dependencies (edges) to determine the order of execution (topological sorting).
Recommendation Systems: Use graph-based collaborative filtering to suggest products/movies (e.g., “users who liked X also liked Y”).
Biology: Model protein interactions (vertices = proteins, edges = interactions) to study biological pathways.

VI. Graph vs. Tree (Key Differences)

Feature	Graph	Tree
Structure	Non-hierarchical (no root, cycles allowed)	Hierarchical (single root, no cycles)
Edges	Can have 0 or more edges per vertex	Exactly V-1 edges (connected tree)
Cycles	Allowed (cyclic graphs)	Not allowed (acyclic)
Connectivity	Can be disconnected	Must be connected (by definition)
Direction	Directed (digraph) or undirected	Usually undirected (rooted trees have implicit direction)