Skip to main content

The Mathematics of Attack Surfaces: Graph Theory and Adversarial Path Analysis

"Infrastructure modeled as a graph reflecting how access, trust, and control operate across systems reveals attack paths that no checklist-based audit will surface." — Trail of Bits, "Threat Modeling the TRAIL of Bits Way," blog.trailofbits.com, February 2025 [1]

Abstract

Attack surface analysis has historically been practiced as an art: experienced practitioners enumerated assets, drew network diagrams, and annotated potential entry points through a process that scaled inversely with network complexity. The mathematical formalization of attack surfaces as directed graphs, originating with Sheyner et al. at Carnegie Mellon in 2002, transformed this art into a tractable computational problem and seeded a research agenda that now spans formal verification, probabilistic risk quantification, combinatorial optimization, and large-scale graph database systems. This article develops the complete mathematical framework for attack surface analysis, from the formal definition of logical attack graphs through probabilistic reachability models, Markov chain formulations of attacker behavior, and the NP-hard network hardening problem. The practical implementation of these models, using tools including BloodHound, NetworkX, and Neo4j, is demonstrated with a fully functional Python implementation that computes shortest paths, betweenness centrality, and probability-weighted reachability over a realistic multi-hop exploitation chain. The article closes by examining the commercial landscape where Wiz reached a $32 billion acquisition valuation in 2025 partly on the strength of cloud attack path analysis, and by cataloguing the open problems that still constrain graph-based security analysis in dynamic, cloud-native environments.

1. Introduction

"Incoming SMS and RCS audio attachments in Google Messages are now automatically decoded with no user interaction, placing audio decoders in the 0-click attack surface of most Android phones." — Google Project Zero, "2025 Zero-Days in Review," Google Cloud Blog, 2025 [2]

The Project Zero observation above captures the essential difficulty of attack surface reasoning in 2026. The audio decoder was not added to Google Messages because a developer wanted to create a vulnerability. It was added because decoding media in the background before a user opens a message improves perceived performance. The security consequence, placing a complex parser into the zero-click attack surface of hundreds of millions of Android devices, was an emergent property of the system's reachability structure, not of any individual design decision. No CVSS score exists for this architectural choice, because no CVE number was assigned. No compliance checklist would have flagged it, because the feature passed every access control review. Detecting this class of risk requires a structural analysis of how privilege levels, trust boundaries, and reachable code relate to one another across the entire system. Graph theory provides exactly that structural analysis, and the formal development of it over the past two decades has produced one of the most practically important branches of applied security mathematics.

The core intuition is direct. Network systems, identity and access management structures, software dependency trees, and cloud permission graphs are all fundamentally relational: their security properties depend not on any single component but on the paths that exist between components. An attacker who can reach a privilege escalation exploit from a low-privilege entry point, and from that escalated state reach a data exfiltration channel, has assembled an attack chain from pieces that are individually insufficient. Detecting and measuring these chains requires a formalism that captures reachability, path composition, and the cumulative probability of multi-step exploitation. A directed graph, with nodes representing security states and edges representing exploitable transitions, is the natural representation.

This article proceeds through four technical layers. The first layer, covered in Sections 2 and 3, establishes the formal graph model and the classical algorithms applied to it: shortest path analysis, betweenness centrality, and all-paths enumeration. The second layer, in Section 4, extends the deterministic model to probabilistic attack graphs, where edges carry likelihoods derived from CVSS scores, and attacker progression becomes a stochastic process amenable to Markov chain analysis. The third layer, in Section 5, frames network hardening as a combinatorial optimization problem, showing that finding minimum-cost defensive interventions reduces to minimum vertex cut and weighted hitting set, both of which are NP-hard in general but well-approximated in practice. The fourth layer, in Sections 6 and 7, grounds this mathematics in production tooling and commercial product strategy, examining how the theoretical framework has been operationalized in tools reshaping the vulnerability management market.

The intended audience is the security engineer who has used BloodHound or read about attack path scoring in a CSPM product and wants to understand what mathematical machinery is running underneath. The mathematical prerequisites are undergraduate linear algebra and graph theory; familiarity with probability theory is helpful but not required, because every probabilistic construction is introduced from first principles.

2. Formalizing Attack Surfaces: States, Exploits, and Graph Structure

2.1 Historical Development and the Sheyner Contribution

The formalization of attack surfaces as directed graphs predates modern cloud security by more than two decades, yet the core framework remains the conceptual foundation for every commercial attack path tool shipping today. The pioneering formalization was done by Oleg Sheyner, Jeannette Wing, and collaborators at Carnegie Mellon University, published at the 2002 IEEE Symposium on Security and Privacy [3]. Their critical contribution was the definition of an attack graph as a directed graph in which each node represents a security state of the network and each edge represents an exploit action that transitions the network from one state to another. The CMU paper operationalized this definition using the model checker NuSMV, automatically generating attack graphs by encoding network topology, vulnerability data, and attacker capabilities as a symbolic model and then computing all states reachable from an initial configuration satisfying a target predicate. This approach yielded the first provably complete attack graphs, meaning every possible attack chain was represented, but at severe computational cost: the state space grows exponentially with the number of hosts and vulnerabilities, a problem that dominated the subsequent decade of research.

The companion paper by Jha, Sheyner, and Wing, also published in 2002 at the IEEE Computer Security Foundations Workshop [3], provided formal complexity results for attack graph analysis. Deciding whether there exists any path from the initial state to a target state is PSPACE-complete in the general case, establishing that the exponential state space encountered empirically was not an artifact of the formulation but a provable lower bound on the difficulty of the problem. This negative result, published in the same year as the original formulation, is important context for understanding every subsequent research effort in attack graph scalability: the goal was not to solve an open problem but to identify tractable special cases and approximations that cover practically relevant instances.

2.2 Two Families of Attack Graphs

Contemporary attack graph research distinguishes two families of models with fundamentally different computational properties. The first family, state enumeration graphs (the Sheyner formulation), represents every possible system configuration as a node, producing a graph whose size is exponential in the number of hosts. The second family, logical attack graphs (also called dependency graphs or condition-exploit graphs), factors the state space by representing individual security conditions and exploit actions as separate node types, with edges connecting pre-conditions to exploits and exploits to post-conditions. The logical representation was developed by Ingols, Lippmann, and colleagues at MIT Lincoln Laboratory in a series of reports beginning in 2005 [4] and codified in the ACSAC 2006 paper on practical attack graph generation [5], and it remains the dominant model in production tools because its size scales polynomially rather than exponentially with the number of hosts and vulnerabilities.

The practical significance of this distinction is large. For a network of $n$ hosts each with $k$ exploitable vulnerabilities and $m$ possible privilege levels, the state enumeration graph can have up to $m^n$ nodes in the worst case. The logical attack graph for the same network has at most $n \cdot k$ exploit nodes and $n \cdot m$ condition nodes, a polynomial rather than exponential count. MIT Lincoln Laboratory demonstrated this tractability advantage empirically in work that examined enterprise networks of several hundred hosts, where logical attack graphs remained analyzable while state enumeration graphs could not be constructed at all [4]. The tradeoff is expressiveness: logical attack graphs represent dependencies between exploits rather than full system states, which means they cannot represent all possible attack orderings but do capture the set of reachable states and the dependency structure between exploits, which is sufficient for most defensive analyses.

2.3 Formal Definitions

A logical attack graph is a bipartite directed graph. Let $\mathcal{H}$ denote the set of hosts in a network, $\mathcal{V}$ the set of vulnerabilities (identified by CVE or equivalent), and $\mathcal{C}$ the set of security conditions (network facts such as "host $h$ is reachable from host $h'$ on port $p$," or "an attacker possessing credential set $k$ can access service $s$"). The logical attack graph is:

$$G = (V, E, \lambda, w) \tag{1}$$

where $V = \mathcal{C} \cup \mathcal{X}$ is the disjoint union of condition nodes $\mathcal{C}$ and exploit nodes $\mathcal{X}$, $E \subseteq V \times V$ is the set of directed edges encoding pre-condition and post-condition relationships, $\lambda: V \to \{\textit{condition}, \textit{exploit}\}$ is the node-type labeling function, and $w: \mathcal{X} \to \mathbb{R}_{\geq 0}$ assigns cost or difficulty weights to exploit nodes. For each exploit node $x \in \mathcal{X}$, the in-edges $(c, x)$ encode conditions that must hold for $x$ to be applicable (pre-conditions), and the out-edges $(x, c')$ encode conditions established by successful execution of $x$ (post-conditions). A path from the attacker's initial capability set $S_0 \subseteq \mathcal{C}$ to a target condition $c^* \in \mathcal{C}$ constitutes a viable attack chain, and the set of all such paths is the attack surface of target $c^*$ relative to initial position $S_0$.

The bipartite structure of Equation (1) is not merely a notational convenience; it separates two analytically distinct concerns that update at different rates. The network topology layer (which conditions are reachable from which hosts) changes with infrastructure modifications: adding a firewall rule removes condition edges; deploying a new server adds condition nodes. The exploit catalog layer (which vulnerabilities enable which transitions) changes with CVE publications and patch deployment: a new CVE adds an exploit node and its connecting edges; applying a patch removes the corresponding exploit node. MulVAL [6], the Datalog-based attack graph generator developed at Kansas State University, exploits this separation by encoding network topology and exploit pre-conditions as facts and rules in a logic programming engine, deriving the attack graph as the fixpoint of these rules, and supporting incremental updates as new facts are asserted or retracted. The fixpoint computation terminates and is complete by the semantics of Datalog under the closed-world assumption, giving MulVAL the strongest formal completeness guarantee of any practical attack graph tool.

3. Reachability, Path Analysis, and Centrality Metrics

3.1 Reachability as the Fundamental Security Predicate

Every question in attack surface analysis reduces, at some level, to a reachability question: can an attacker at initial privilege level $s_0$ reach target state $t$ through any sequence of applicable exploits? Formally, define the reachability relation $\mathcal{R} \subseteq V \times V$ as the transitive closure of the edge relation $E$. The attack surface of target $t$ from initial position $s_0$ is the subgraph of $G$ induced by all nodes lying on any directed path from $s_0$ to $t$. Computing this subgraph requires a reachability query, answerable in $O(|V| + |E|)$ time using depth-first search from $s_0$ with pruning at $t$. In practice, the more useful operation is not computing the full reachability subgraph but identifying the minimal paths within it, because shorter paths require fewer sequential exploit successes and are therefore more likely to succeed under real-world conditions.

Binary reachability (exists/does not exist) is the correct primitive for compliance questions such as "is the payment database reachable from the internet?" but is insufficient for risk prioritization. For prioritization, the analyst needs to know which paths are easiest to traverse, which intermediate nodes are critical to the largest number of paths, and what is the probability that a persistent adversary will succeed. The remainder of this section develops the classical graph algorithms that answer these questions: shortest-path analysis (Section 3.2), betweenness centrality (Section 3.3), and the family of related centrality metrics (Section 3.4). Each has a direct security interpretation that goes beyond the abstract graph formulation.

3.2 Shortest-Path Analysis and Exploit Distance

The shortest attack path from source $s$ to target $t$ is the path minimizing total edge weight, where edge weight encodes exploit difficulty as a cost in $[0, \infty)$. When all edge weights are unit-valued (all exploits are equally difficult), Dijkstra's algorithm returns the minimum-hop path in $O(|E| + |V| \log |V|)$ time. For weighted graphs reflecting heterogeneous exploit difficulty from CVSS complexity scores or empirical time-to-exploit estimates, the same algorithm applies with the edge weight function substituted into the priority queue comparison. The operationally critical metric is the attack distance $d(s, t) = \min_{\pi \in \Pi(s,t)} \sum_{x \in \pi} w(x)$, where $\Pi(s,t)$ is the set of all simple paths from $s$ to $t$ and the sum ranges over exploit nodes on path $\pi$. Attack distance provides a scalar summary of the minimum adversary effort required to reach $t$ from $s$, analogous to the number of difficulty points an adversary must successfully cash in, and it serves as the natural basis for cross-system risk comparison when the weight function is calibrated consistently.

The all-shortest-paths variant, computing the complete set of paths achieving the minimum weight, is relevant for measuring structural redundancy: if there are many shortest paths between $s$ and $t$ passing through different intermediate nodes, defending any single path is insufficient because the adversary reroutes. Counting shortest paths is computationally tractable ($O(|V| \cdot |E|)$ using a modified BFS) and provides a measure of path diversity that complements the distance metric. High path diversity from the perimeter to a critical target is a significant hardening challenge, because the number of interventions required to eliminate all paths scales with the minimum vertex cut (Section 5), not with the number of paths.

3.3 Betweenness Centrality as a Hardening Heuristic

Betweenness centrality measures the degree to which a node lies on shortest paths between other node pairs. For a node $v$ in the attack graph, its betweenness centrality is:

$$C_B(v) = \sum_{\substack{s \neq v \neq t \\ s,t \in V}} \frac{\sigma_{st}(v)}{\sigma_{st}} \tag{2}$$

where $\sigma_{st}$ is the total number of shortest paths from $s$ to $t$, and $\sigma_{st}(v)$ is the number of those paths passing through $v$ [7]. Nodes with high betweenness centrality in an attack graph are structural chokepoints: their compromise or their hardening affects a disproportionate fraction of viable attack paths. Patching or isolating high-betweenness exploit nodes therefore eliminates more attack paths per unit defensive effort than patching low-betweenness nodes of equal CVSS score, which is the core mathematical justification for path-context-aware remediation prioritization over CVSS-score-only prioritization.

Brandes' algorithm computes betweenness centrality for all nodes in $O(|V| \cdot |E|)$ time for unweighted graphs, and $O(|V| \cdot |E| + |V|^2 \log |V|)$ for weighted graphs [8]. For production attack graphs with thousands of nodes and tens of thousands of edges, this is feasible on a single workstation in seconds. The NetworkX Python library exposes this directly as networkx.betweenness_centrality(G, weight='probability'), making the computation accessible without specialized infrastructure or graph databases. The normalization in Equation (2) divides by $(|V|-1)(|V|-2)$ for directed graphs, producing a score in $[0,1]$ that is comparable across graphs of different sizes. A node with $C_B = 0.15$ in a graph of 1,000 nodes lies on 15% of all pairwise shortest paths, which in a security context means that its hardening eliminates 15% of all minimal attack routes through the network simultaneously.

3.4 Centrality Variants for Different Security Questions

Betweenness centrality is one member of a family of centrality metrics, each answering a different structural security question. Closeness centrality $C_C(v) = (|V|-1) / \sum_{u \neq v} d(u,v)$, where $d(u,v)$ is the shortest-path distance, measures how efficiently a node can be reached from all starting points in the graph. Nodes with high closeness centrality in an attack graph are universally accessible to adversaries regardless of entry point, making them high-priority hardening targets when entry point diversity is a concern (as in networks with many internet-facing services). Eigenvector centrality assigns each node a score proportional to the sum of its neighbors' scores, iterating to a fixed point; it identifies nodes connected to other highly connected nodes, corresponding in security terms to systems whose compromise cascades into the highest-value subnetworks. Degree centrality, simply the number of incident edges normalized by $|V|-1$, measures the breadth of a node's direct connectivity and is a useful first-pass filter for identifying over-connected credential stores or administrative jump hosts.

No single centrality metric provides complete guidance for defensive prioritization, but the combination of betweenness (chokepoints on critical paths), closeness (universally accessible systems), and out-degree (systems that open many new attack options upon compromise) gives a practically useful three-dimensional view of network risk posture. The NetworkX library computes all three in a single pass over the graph, and the resulting ranked lists of exploit nodes constitute a defensible, mathematically grounded remediation backlog that explicitly captures path context rather than treating each vulnerability in isolation.

4. Probabilistic Attack Graphs and Risk Quantification

4.1 From Deterministic to Stochastic Models

The deterministic attack graph answers the binary question of whether an attack path exists. Real defenders need more: they need to know how likely an attack is to succeed given a persistent adversary, how long the adversary will take, and which paths pose the greatest probabilistic risk given actual adversary capabilities and current vulnerability status. The probabilistic attack graph extends the logical graph model by associating each exploit node $x$ with a success probability $p_x \in (0,1]$, derived from empirical data or CVSS scoring, and treating attacker progress as a stochastic process. This extension was formalized by multiple research groups in the mid-2000s and codified into NIST Interagency Report 7788 [9], which remains the federal reference document for probabilistic attack graph methodology. The key insight is that CVSS exploitability sub-scores, computed and published for every CVE in the National Vulnerability Database, provide a readily available proxy for relative exploit likelihood that can be normalized into transition probabilities without requiring empirical red-teaming data at every organization.

4.2 Transition Probabilities from CVSS Scores

Given a set of outbound exploit edges from condition node $c$, where each applicable exploit $e_i$ has CVSS exploitability score $\text{ES}(e_i) \in [0, 3.9]$ under CVSSv3, the transition probability to the post-condition established by exploit $e_i$ is:

$$p_{e_i \mid c} = \frac{\text{ES}(e_i)}{\displaystyle\sum_{j: (c, e_j) \in E} \text{ES}(e_j)} \tag{3}$$

This normalization converts raw CVSS scores into a probability distribution over the available exploits from condition $c$, modeling the adversary as preferentially selecting easier exploits while not ignoring harder alternatives. For condition nodes with a single applicable exploit, the transition probability is 1 by definition of Equation (3). The path probability for a specific attack chain $\pi = (c_0, x_1, c_1, x_2, c_2, \ldots, x_k, c_k)$ alternating between conditions $c_i$ and exploit nodes $x_i$ is the product of individual exploit probabilities:

$$P(\pi) = \prod_{i=1}^{k} p_{x_i \mid c_{i-1}} \tag{4}$$

This product formula reflects the independence assumption that sequential exploit successes are independent Bernoulli trials. The independence assumption is a simplification: if the same poorly configured credential store is required for both $x_2$ and $x_4$ on a given path, their successes are correlated. However, the product formula gives conservative (lower-bound) estimates for path probability because real-world positive correlations between exploits at related hosts would typically increase the joint probability relative to the independent product. Using Equation (4) as a lower bound is appropriate for risk communication, where under-estimating risk is the more dangerous error.

4.3 Probability of Compromise via Path Aggregation

The overall probability that an adversary can compromise target $t$ from source $s$ across all available paths is not simply the sum of individual path probabilities, because paths can share exploit nodes and their successes are not mutually exclusive events. Using inclusion-exclusion over the set $\Pi(s,t)$ of all simple paths from $s$ to $t$:

$$P_{\text{comp}}(s, t) = 1 - \prod_{\pi \in \Pi(s,t)} \left(1 - P(\pi)\right) \tag{5}$$

This expression assumes path-level independence, which holds when paths share no exploit nodes. For the general case with shared exploits, the exact computation is #P-hard, as it requires summing over exponentially many path subsets in the inclusion-exclusion expansion. Equation (5) provides an upper bound on $P_{\text{comp}}$ in the dependent case and is exact for series-parallel graphs, which many enterprise network topologies approximate by construction (a perimeter zone feeds an application zone feeds a database zone, with limited cross-connections). For graphs where exact computation is required and the series-parallel approximation is poor, Monte Carlo sampling of attacker trajectories provides an unbiased estimator of $P_{\text{comp}}$ with variance decreasing as $O(1/\sqrt{n})$ in the sample count $n$.

4.4 The Markov Chain Formulation of Attacker Progress

A more powerful probabilistic treatment models attacker progress as an absorbing Markov chain [10]. Security conditions are transient states; target conditions are absorbing states (once reached, the attacker has succeeded and the game ends). The transition matrix $P$ has entries $P_{ij} = p_{e}$ if exploit $e$ transitions condition $i$ to condition $j$, and $P_{ij} = 0$ otherwise. Partitioning the state space into the set of transient states $Q$ (conditions not yet compromised to target) and absorbing states, the fundamental matrix of the Markov chain is:

$$N = (I - Q)^{-1} \tag{6}$$

The $(i,j)$ entry of $N$ gives the expected number of times the chain visits transient state $j$ before reaching an absorbing state, starting from transient state $i$. The vector $\mathbf{t} = N \mathbf{1}$, where $\mathbf{1}$ is the all-ones column vector, gives the expected total number of exploit attempts before the adversary either succeeds (reaches a target) or exhausts applicable exploits. This quantity is directly operationally actionable: a target reachable in an expected 2 exploit steps from an internet-facing entry point is a fundamentally different risk than one requiring an expected 12 steps through multiple intermediate pivots, because detection probability scales with the number of observable actions the adversary must take.

5. Network Hardening as Combinatorial Optimization

5.1 Hardening as Graph Modification

The defensive counterpart to attack graph analysis is network hardening: modifying the graph to make targets unreachable or to maximize the expected attacker effort required to reach them. At the most basic level, hardening corresponds to node or edge removal from the attack graph. Removing an exploit node corresponds to patching the associated vulnerability or disabling the vulnerable service; removing a condition node corresponds to revoking a privilege or closing a network path. The question of which nodes to remove to optimally reduce $P_{\text{comp}}(s, t)$ (Equation (5)) subject to a cost constraint is the minimum-cost hardening problem, and it connects attack graph theory to classical results in combinatorial optimization.

5.2 Minimum Vertex Cut and Its Relationship to Maximum Flow

To guarantee that no path from source $s$ to target $t$ exists after hardening, it is necessary and sufficient to remove any vertex cut separating $s$ from $t$. A vertex cut is a set $C \subseteq V \setminus \{s, t\}$ whose removal from $G$ leaves no directed path from $s$ to $t$ in $G \setminus C$. The minimum vertex cut $\kappa(s, t)$ is the smallest such set. By Menger's theorem [11], the size of the minimum vertex cut equals the maximum number of internally vertex-disjoint paths from $s$ to $t$ (paths sharing no intermediate nodes). Furthermore, by a classical reduction, the minimum vertex cut can be computed in polynomial time via maximum flow on the node-split graph $G'$: replace each node $v \in V \setminus \{s, t\}$ with two nodes $v_{\text{in}}$ and $v_{\text{out}}$ connected by an edge of unit capacity, and replace each original edge $(u, v)$ with edge $(u_{\text{out}}, v_{\text{in}})$ of infinite capacity. Then:

$$\kappa(s, t) = \text{MaxFlow}(G', s_{\text{out}}, t_{\text{in}}) \tag{7}$$

The Ford-Fulkerson algorithm solves this in $O(|V| \cdot |E|)$ time. The minimum vertex cut identifies the smallest set of vulnerabilities or privileges whose elimination is necessary and sufficient to prevent all attack paths from $s$ to $t$, giving a theoretically optimal hardening target list in the unit-cost case.

5.3 Weighted Hardening as Hitting Set

In practice, node removal costs are heterogeneous. Patching a zero-day in an internet-facing web service carries different operational impact than removing an LLMNR service from a workstation segment, even if both remove the same number of attack paths. The weighted hardening problem assigns cost $c_v$ to removing each node $v$ and asks for the minimum-cost subset $H \subseteq V$ that intersects every simple attack path from $s$ to any target $t \in T$:

$$\text{minimize} \quad \sum_{v \in V} c_v x_v \quad \text{subject to} \quad \forall \pi \in \Pi(s, T): \sum_{v \in \pi} x_v \geq 1, \quad x_v \in \{0,1\} \tag{8}$$

This is the weighted minimum hitting set problem, and it is NP-hard in general [12]. However, it admits a polynomial-time greedy $O(\log |\Pi|)$-approximation: at each step, select the node $v$ maximizing the ratio of newly covered paths to cost $c_v$. In practice, the combinatorial structure of real attack graphs (limited depth, bounded branching factor, clustered topology corresponding to network zones) often allows exact solutions via integer linear programming within acceptable time budgets. Commercial tools including XM Cyber use variants of this formulation to rank remediation recommendations by their expected impact on attack path coverage per unit remediation effort.

5.4 Markov Decision Process Formulation for Adaptive Defense

The static hardening formulation of Equation (8) assumes a fixed set of defenses deployed before the attack begins. A more realistic model treats defense as an ongoing sequential decision process against an adaptive adversary, formalized as a Markov Decision Process (MDP) [13]. In the security MDP, the state space is a compressed representation of the attack graph, actions correspond to defensive interventions (patch deployment, access revocation, network segmentation, service decommission), transitions reflect attacker responses to observed defenses, and the reward function penalizes attacker progress while assigning costs to defensive actions. The optimal policy, computed via value iteration in $O(|S|^2 |A|)$ per iteration (where $|S|$ is the number of states and $|A|$ the number of actions), specifies which defensive action maximizes expected long-run security from each observable state. The MDP formulation naturally handles dynamic environments where new CVEs are published and new assets are deployed, because the policy adapts to the current state rather than requiring a one-time hardening decision computed from a static snapshot. The computational cost of exact value iteration is prohibitive for large enterprise graphs ($|S|$ scales with network size exponentially in the worst case), but deep reinforcement learning approximations and state space abstraction techniques reduce this to tractable ranges for networks of practical scale.

6. Attack Graph Construction: Tools, Data Sources, and Implementation

6.1 The Construction Pipeline

Building an attack graph for a real network requires integrating four distinct data sources: network topology (which hosts are reachable from which hosts on which ports), asset inventory (what software versions are running on each host), vulnerability data (which CVEs affect each software version and their associated CVSS scores), and privilege relationships (which credentials, tokens, group memberships, or IAM policies allow traversal across trust boundaries). Network scanners such as nmap produce topology and service fingerprint data; the National Vulnerability Database (NVD) provides CVE-to-software mappings; identity providers export group membership and delegation structures; and cloud provider APIs export IAM policy documents and security group rules in machine-readable formats. The challenge is integrating these heterogeneous data sources into a unified graph schema before any analysis algorithm can be applied, and the integration challenge is consistently the bottleneck in real-world attack graph deployments.

MulVAL [6] provides the most principled approach to this integration problem: it defines a Datalog schema for network facts and exploit interaction rules, allowing analysts to express new vulnerability types by writing new Datalog rules without modifying the graph generation engine. The Datalog engine computes the attack graph as the minimal fixpoint of the rules, guaranteed to terminate and to include all paths entailed by the asserted facts. The limitation is that MulVAL's rule sets require manual curation to encode each CVE class's specific pre-conditions and post-conditions, making it expensive to keep current with a fast-moving NVD feed. Recent work [14] on LLM-assisted CVE-to-Datalog translation, generating precondition rules from NVD natural language descriptions, addresses this bottleneck and represents the most promising near-term path to automatic rule maintenance.

6.2 BloodHound: Practical Attack Graph Analysis for Active Directory

For Active Directory environments, BloodHound (maintained by SpecterOps) has become the canonical attack graph analysis tool [15]. BloodHound models AD and Azure AD objects (users, groups, computers, GPOs, OUs, domains, service principals) as nodes, and the security-relevant relationships between them (group membership, session presence, ACL-derived edges, delegation configurations, Kerberos trust relationships) as directed edges stored in a Neo4j graph database. Collection runs via SharpHound (for on-premises AD) and AzureHound (for Azure AD tenants), which ingest into Neo4j and expose a Cypher query interface over the resulting graph. The killer capability is shortest-path analysis between arbitrary node pairs: the query MATCH p=shortestPath((u:User {name:'low_priv@domain.com'})-[*]->(g:Group {name:'Domain Admins@domain.com'})) RETURN p surfaces the minimum-step path from a low-privilege account to Domain Admin, which corresponds directly to the minimum vertex cut the defender needs to sever first according to Equation (7).

"The design philosophy behind BloodHound was always graph-first: rather than asking 'what permissions does this account have,' we ask 'what can this account reach.' Once you frame security as a reachability problem, the graph model follows naturally." — Andy Robbins, "BloodHound 4.0 and the New Era of Identity Security," SpecterOps Blog, blog.specterops.io, 2022 [15]

The graph query approach scales to forests with hundreds of thousands of objects because Neo4j's native graph storage engine and the Cypher query planner evaluate path queries without materializing the full adjacency matrix. Betweenness centrality queries identifying the most critical intermediate nodes in the AD attack graph can be evaluated directly using the Neo4j Graph Data Science library via CALL gds.betweenness.stream('attack-graph'), returning a ranked node list that implements Equation (2) without requiring a separate Python environment. For cloud environments, the same conceptual model applies: Wiz, Orca Security, and Microsoft Defender for Cloud all model IAM policies, network security groups, and service-to-service permissions as directed graph edges over cloud asset nodes, and expose attack path queries as their primary risk visualization interface.

6.3 Implementation: A Probabilistic Attack Graph Analyzer in Python

The following implementation constructs a probabilistic attack graph from a structured description of a small Windows Active Directory environment, computes CVSS-weighted transition probabilities per Equation (3), evaluates path-aggregated compromise probability per Equation (5), and ranks exploit nodes by betweenness centrality per Equation (2). The sample exploit chain reflects a realistic 2020-2023 Windows lateral movement scenario drawing on documented CVEs.

import networkx as nx
import numpy as np
from collections import defaultdict

def build_attack_graph(edge_specs):
    """
    Construct a probabilistic logical attack graph from a list of tuples:
    (source_condition, exploit_id, cvss_exploitability_score, dest_condition)
    """
    G = nx.DiGraph()
    # First pass: accumulate total CVSS weight per source condition for normalization
    source_cvss_totals = defaultdict(float)
    for src, exploit, cvss, dst in edge_specs:
        G.add_node(src, ntype='condition')
        G.add_node(exploit, ntype='exploit')
        G.add_node(dst, ntype='condition')
        source_cvss_totals[src] += cvss

    # Second pass: add edges with normalized transition probabilities (Equation 3)
    for src, exploit, cvss, dst in edge_specs:
        p = cvss / source_cvss_totals[src] if source_cvss_totals[src] > 0 else 0.0
        G.add_edge(src, exploit, probability=p)
        # Post-condition edge is deterministic given exploit success
        G.add_edge(exploit, dst, probability=1.0)

    return G

def path_probability(G, path):
    """Multiply edge probabilities along a path (Equation 4)."""
    prob = 1.0
    for u, v in zip(path[:-1], path[1:]):
        prob *= G[u][v].get('probability', 1.0)
    return prob

def compromise_probability(path_probs):
    """
    Aggregate path probabilities via inclusion-exclusion upper bound (Equation 5).
    Assumes path-level independence as a conservative upper bound.
    """
    return 1.0 - np.prod([1.0 - p for p in path_probs])

def analyze_attack_graph(G, source, target, path_cutoff=10):
    """
    Return: shortest path, hop count, all-paths compromise probability,
    and top-5 exploit nodes ranked by betweenness centrality.
    """
    results = {}

    try:
        sp = nx.shortest_path(G, source, target)
        results['shortest_path'] = ' -> '.join(sp)
        # Count only exploit-type nodes as meaningful "hops"
        results['exploit_hops'] = sum(
            1 for n in sp if G.nodes[n].get('ntype') == 'exploit'
        )
    except nx.NetworkXNoPath:
        results['shortest_path'] = 'No path found'
        results['exploit_hops'] = float('inf')
        return results

    # Enumerate all simple paths up to cutoff; exponential beyond this
    all_paths = list(nx.all_simple_paths(G, source, target, cutoff=path_cutoff))
    path_probs = [path_probability(G, p) for p in all_paths]

    results['path_count'] = len(all_paths)
    results['p_compromise'] = compromise_probability(path_probs)

    # Betweenness centrality over full graph (Equation 2)
    bc = nx.betweenness_centrality(G, normalized=True)
    exploit_bc = {
        n: bc[n] for n in G.nodes if G.nodes[n].get('ntype') == 'exploit'
    }
    results['top_chokepoints'] = sorted(
        exploit_bc.items(), key=lambda x: x[1], reverse=True
    )[:5]

    return results

# Realistic AD lateral movement scenario (2020-2023 CVE chain)
edge_specs = [
    # (source_condition, exploit_id, cvss_exploitability, dest_condition)
    ('internet',              'CVE-2021-34527_PrintNightmare', 3.9, 'workstation_user'),
    ('internet',              'CVE-2021-40444_MSHTML',         3.9, 'workstation_user'),
    ('workstation_user',      'CVE-2022-21999_SpoolFool',      1.8, 'workstation_admin'),
    ('workstation_user',      'CVE-2023-21716_WordRTF',        3.9, 'workstation_admin'),
    ('workstation_admin',     'mimikatz_lsass_dump',           2.0, 'domain_creds'),
    ('workstation_admin',     'CVE-2020-1472_Zerologon',       3.9, 'domain_admin'),
    ('domain_creds',          'CVE-2021-42278_noPac',          3.9, 'domain_admin'),
    ('domain_creds',          'pass_the_hash_smb',             2.5, 'dc_user'),
    ('dc_user',               'CVE-2020-1472_Zerologon',       3.9, 'domain_admin'),
]

G = build_attack_graph(edge_specs)
results = analyze_attack_graph(G, 'internet', 'domain_admin')

print(f"Shortest path  : {results['shortest_path']}")
print(f"Exploit hops   : {results['exploit_hops']}")
print(f"Total paths    : {results['path_count']}")
print(f"P(compromise)  : {results['p_compromise']:.4f}")
print("Top chokepoints (by betweenness centrality):")
for exploit, score in results['top_chokepoints']:
    print(f"  {exploit:<42} BC = {score:.4f}")

The CVE-2020-1472 (Zerologon) entry appears twice in the edge specification: once from workstation_admin directly to domain_admin, and once from dc_user to domain_admin via a different path that requires credential theft as an intermediate step [16]. This encodes the real-world attack graph structure where a single high-severity CVE (CVSS 10.0) appears at multiple positions in the graph and therefore accumulates high betweenness centrality by Equation (2). The CVE-2021-42278 and CVE-2021-42287 combination (the noPac chain) [17] provides an alternative path from domain-level credentials to domain admin that bypasses the Zerologon requirement, representing the structural redundancy that makes this environment difficult to harden with a single patch.

6.4 MITRE ATT&CK Integration and Adversary Emulation

The MITRE ATT&CK framework provides a complementary ontology to the graph model: where the graph encodes structural reachability, ATT&CK encodes behavioral signatures associated with each transition. Integrating ATT&CK technique identifiers (T-codes) as labels on exploit nodes transforms the attack graph into a bidirectional analysis tool: forward analysis traces reachability from a given entry point; backward analysis, given a set of observed T-codes from EDR telemetry, reconstructs the most probable attack graph explaining the observed behavior. The attack2neo tool [18] enables this integration by importing the full ATT&CK STIX dataset into Neo4j, enabling Cypher queries that jointly traverse structural attack paths and behavioral technique annotations. MITRE Caldera [19] takes this further by executing ATT&CK-mapped techniques in an emulation environment, producing empirical traversal data that can calibrate the transition probabilities in Equation (3) for specific network configurations. Notably, Caldera itself was assigned CVE-2025-27364 with a CVSS base score of 10.0 in February 2025, representing a maximum-severity RCE in the platform used to emulate adversaries, a recursive illustration of the attack surface problem [19].

7. The Commercial Landscape and the Unsolved Problems Worth Billions

7.1 Market Validation Through Acquisition and Investment

The attack surface management (ASM) and cloud-native application protection platform (CNAPP) markets have converged, over the past five years, on graph-based attack path analysis as the primary product differentiator. The clearest evidence of market validation was Google's acquisition of Wiz for \$32 billion in 2025 [20], a valuation that would have been implausible for a four-year-old security company under any prior market cycle. Wiz's core technical innovation was agentless cloud asset discovery via provider APIs, assembling the results into a multi-layer graph spanning infrastructure topology, identity and access, data classification, and vulnerability presence, and then running reachability analysis over this graph to produce ranked attack paths. The \$32 billion figure signals that the market has accepted graph-based reachability as the correct framework for cloud risk quantification, displacing the prior paradigm of CVSS score lists and compliance heat maps as the primary outputs of a security platform.

The competitive landscape now spans several distinct product categories, each implementing a different subset of the attack graph framework developed in Sections 2 through 5. The following table maps major vendors to their technical capabilities and data sources.

Vendor Graph Model Probabilistic Scoring Optimization Key Data Source
Wiz Multi-layer cloud graph Yes (risk scoring) No Cloud APIs, CSPM
XM Cyber Attack graph simulation Yes (exposure scoring) Partial (heuristic) Network scan + IAM
CrowdStrike Falcon EM Endpoint + cloud graph Yes (AI-assisted) No EDR telemetry + cloud
BloodHound Enterprise AD + Azure AD graph No No SharpHound + AzureHound
Orca Security Agentless cloud graph Yes (risk scoring) No Cloud APIs, no agents
Microsoft Defender XDR Cloud + endpoint graph Partial No Azure-native + Defender

The absence of MDP-based adaptive optimization (Section 5.4) from every entry in this table reflects a gap between the formal research literature and production tooling. Every vendor listed uses static or near-static attack graph snapshots with heuristic scoring, not the dynamically adapting policy that MDP formulations would produce. CrowdStrike's 2025 Global Threat Report documented a 26% year-over-year increase in cloud intrusions and found that valid account abuse (traversal of IAM paths rather than software exploitation) accounted for 35% of cloud incidents in the first half of 2024 [21]. Valid account abuse is precisely the category of risk that identity-layer graph edges capture and that CVSS-based scoring is blind to, because no CVE is assigned to an overly permissive IAM policy.

7.2 The Scalability Problem Remains Unsolved

The scalability challenge is the most significant open problem in attack graph analysis, and it has resisted definitive solution since Sheyner et al. identified the state explosion problem in their 2002 paper [3]. A 2019 survey of attack graph analysis methods by Zeng [22] documented that attack graphs grow exponentially in the number of hosts and vulnerabilities, and that the number of attack paths grows exponentially with graph depth even in the polynomial-node logical attack graph representation. Enumerating all paths for Equation (5) is feasible only for shallow graphs (depth below roughly 8 to 10 hops), and real enterprise networks routinely have attack chains exceeding this depth when lateral movement through multiple network zones is considered. Angelini et al.'s 2025 progressive attack graph method [23] addresses depth-induced path explosion by constructing the graph incrementally, prioritizing portions of the attack surface most relevant to current threat intelligence, but this introduces a completeness trade-off: the progressive graph may miss low-probability, long-chain attacks that an APT actor operating over months would pursue.

The cloud-native environment introduces compounding complexity that classical attack graph models were not designed for: workloads are ephemeral (containers terminate in seconds), IAM policies change with every deployment pipeline execution, and the attack surface evolves continuously between the periodic scans that feed most graph construction pipelines. A continuous, incremental graph maintenance system (analogous to dynamic graph algorithms in theoretical computer science [24]) would allow real-time attack graph updates in response to cloud API events, but no production tool implements this today. The arXiv paper "It Is Time To Steer" [25], published in late 2023, argued that the field needs analysis-driven construction (build only the parts of the graph relevant to current threat intelligence) rather than exhaustive generation (build the full graph, then analyze), and this architectural shift is visible in products like Wiz and CrowdStrike that query cloud APIs on-demand rather than pre-computing comprehensive graphs.

7.3 LLM Integration: Promise and Failure Mode

Large language models are entering the attack graph construction pipeline in two roles that have different failure modes. The first role is automated vulnerability classification: given a CVE description from NVD, an LLM can extract the pre-conditions (what an attacker must already control) and post-conditions (what the exploit grants) required to express the CVE as a logical attack graph edge. This automates the most labor-intensive part of maintaining a current attack graph knowledge base, and preliminary results on LLM-aided attack path analysis using provenance graphs [14] show that LLMs can correctly classify attack technique pre-conditions with accuracy exceeding 85% on ATT&CK-labeled datasets. This accuracy level is sufficient for triage ranking but potentially insufficient for compliance-grade hardening decisions where false negatives (missed paths) are the critical failure mode, because an attack graph that looks complete but omits one class of exploit enables false assurance. The second role is natural language querying: rather than requiring analysts to write Cypher against a Neo4j graph, an LLM interface translates questions like "what is the shortest path from any internet-facing service to our payment database?" into the corresponding graph query and returns results in prose. This role is substantially lower risk because the LLM translates queries but does not perform the mathematical reachability computation; errors in query translation produce empty results or error messages rather than silent false assurances.

7.4 Regulatory and Compliance Implications

The intersection of attack graph analysis with regulatory frameworks is an emerging area whose implications practitioners have not fully internalized. The U.S. SEC cybersecurity disclosure rules (effective December 2023) require public companies to describe their cybersecurity risk management processes in annual 10-K filings and to disclose material incidents within four business days. Organizations that can present quantitative attack surface analysis, specifically compromise probability estimates from Equation (5) and remediation prioritization grounded in Equation (8), are in a substantially stronger position to support these disclosures than those relying on qualitative assessments. NIST CSF 2.0, released in February 2024, explicitly calls for continuous monitoring and risk-based prioritization as foundational practices, which are operationally equivalent to maintaining a current attack graph and prioritizing remediations by their path-disruption impact rather than standalone CVSS scores. The implication for product strategy is that attack graph capabilities are transitioning from a competitive differentiator available only to well-resourced enterprises to a compliance requirement that will pull adoption into mid-market segments previously priced out of enterprise ASM platforms.

7.5 What a Practitioner Should Do Differently

The formal framework developed in Sections 2 through 5 has direct operational implications that differ from the conventional vulnerability management workflow in ways that practitioners consistently underestimate. The conventional workflow prioritizes by CVSS base score, which is a property of individual CVEs considered in isolation, ignoring path context entirely. A CVSS 9.8 critical vulnerability on an air-gapped backup server is less operationally urgent than a CVSS 5.5 medium vulnerability positioned at the midpoint of the only path from an internet-facing web application to the core banking database. The graph-aware alternative computes betweenness centrality over the attack graph and ranks remediations by their expected path-disruption impact rather than by individual score. This approach has been the stated rationale for attack path-based prioritization since the Ingols ACSAC 2006 paper [5], but adoption has been blocked not by conceptual disagreement but by tooling and organizational inertia. With BloodHound Enterprise providing path analysis for AD environments, and Wiz, CrowdStrike, and Orca providing equivalent analysis for cloud environments, the tooling constraint has substantially diminished. The remaining constraint is organizational: security programs must be retrained to ask "does patching this vulnerability break an attack path to a crown-jewel asset?" rather than "is this CVE scored critical?", a cultural shift that product teams, security leaders, and board-level risk communication all need to drive together.

8. Conclusion and Open Problems

The mathematical theory of attack surfaces is mature in the sense that the core formalism (logical attack graphs, Equation (1)), the fundamental algorithms (shortest path, betweenness centrality from Equations (2) and (7), minimum vertex cut), and the probabilistic extensions (path probability aggregation from Equations (4) and (5), Markov fundamental matrix from Equation (6)) are all well-understood and implementable today with standard open-source libraries. The theory is immature in the sense that the hardest practical problems remain unsolved, and three open problems deserve explicit statement.

Dynamic graph maintenance is the most pressing constraint for cloud environments. Static attack graphs built from point-in-time inventory snapshots begin drifting from reality as soon as the first workload starts or stops. A theory of attack graph maintenance under incremental updates (add node, delete edge, modify edge weight) analogous to dynamic graph algorithms in theoretical computer science [24] would enable continuous rather than periodic analysis. The progressive attack graph method of Angelini et al. [23] is a step in this direction, but it sacrifices completeness guarantees in ways that are not yet formally characterized, leaving practitioners unable to bound the false-negative rate for missed paths.

Multi-principal threat modeling is a second open problem with no satisfying solution in the current literature. Classical attack graph models assume a single attacker and a single defender. Real enterprise environments face multiple simultaneous threat actors with different capability levels, dwell-time tolerances, and objectives (nation-state persistence versus ransomware speed versus insider exfiltration). Defenses optimal against one adversary profile may be suboptimal or counterproductive against another. Multi-principal attack graphs, where each attacker has a distinct initial capability set and distinct target set, require running separate reachability analyses per principal and then optimizing defenses against the joint attack distribution under a shared budget constraint. This is a significantly harder problem than the single-attacker hitting set formulation of Equation (8), and it has received limited formal treatment outside of game-theoretic security models that impose unrealistic rationality assumptions on both parties.

Probabilistic calibration is the third open problem, and it may be the most epistemically important. The CVSS-derived transition probabilities in Equation (3) are a convenient proxy, not a validated predictor of actual exploit success rates in production environments. Calibration requires large-scale observational data on which exploits are attempted, which succeed, and under what network and configuration conditions; such data is difficult to collect systematically and sensitive to release. Without calibrated probabilities, the compromise estimates from Equation (5) and the expected path length from Equation (6) are not interpretable as absolute risk figures, only as relative rankings within the assumptions of the model. For the framework to support the quantitative risk communication that SEC disclosure rules and cyber insurance underwriting increasingly demand, empirical calibration studies comparable to those that exist for actuarial risk models are needed.

The commercial investment flowing into this problem space creates strong economic pressure to close these gaps. The mathematics is mostly done. The remaining work is data pipelines, calibration, and the organizational change management required to move security programs from list-based to graph-based reasoning.

References

[1] Trail of Bits, "Threat modeling the TRAIL of Bits way," blog.trailofbits.com/2025/02/28/threat-modeling-the-trail-of-bits-way/, February 2025.

[2] Google Project Zero and Google Threat Intelligence Group, "2025 Zero-Days in Review," Google Cloud Blog, cloud.google.com/blog/topics/threat-intelligence/2025-zero-day-review, 2025.

[3] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing, "Automated generation and analysis of attack graphs," in Proceedings of the 2002 IEEE Symposium on Security and Privacy (S&P'02), Oakland, CA, 2002, pp. 273-284.

[4] R. Lippmann and K. Ingols, "An annotated review of past papers on attack graphs," MIT Lincoln Laboratory Project Report PR-IA-1, March 2005.

[5] K. Ingols, R. Lippmann, and K. Piwowarski, "Practical attack graph generation for network defense," in Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC 2006), Miami Beach, FL, 2006, pp. 121-130.

[6] X. Ou, W. F. Boyer, and M. A. McQueen, "A scalable approach to attack graph generation," in Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS 2006), Alexandria, VA, 2006, pp. 336-345.

[7] U. Brandes, "A faster algorithm for betweenness centrality," Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163-177, 2001.

[8] U. Brandes, "On variants of shortest-path betweenness centrality and their generic computation," Social Networks, vol. 30, no. 2, pp. 136-145, 2008.

[9] S. Noel, E. Parsons, and S. Jajodia, "Security risk analysis of enterprise networks using probabilistic attack graphs," NIST Interagency Report 7788, National Institute of Standards and Technology, 2011.

[10] J. G. Kemeny and J. L. Snell, Finite Markov Chains, Springer-Verlag, 1976.

[11] K. Menger, "Zur allgemeinen Kurventheorie," Fundamenta Mathematicae, vol. 10, pp. 96-115, 1927.

[12] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, 1979.

[13] C. Leitner, F. Schuster, and M. Holz, "Towards enhanced threat modelling and analysis using a Markov Decision Process," Computers and Security, vol. 122, 2022.

[14] Anonymous, "A novel LLM-aided framework for host-based intrusion detection using provenance graphs," arXiv:2507.10873, 2025.

[15] SpecterOps, "BloodHound Community Edition," github.com/SpecterOps/BloodHound, 2024.

[16] Microsoft Security Response Center, "Windows Netlogon Remote Protocol elevation of privilege vulnerability (CVE-2020-1472)," msrc.microsoft.com, 2020.

[17] Microsoft Security Response Center, "Active Directory domain services elevation of privilege vulnerability (CVE-2021-42278 and CVE-2021-42287)," msrc.microsoft.com, 2021.

[18] vmapps, "attack2neo: Import MITRE ATT&CK into Neo4j," github.com/vmapps/attack2neo, 2023.

[19] MITRE Corporation, "MITRE Caldera adversary emulation platform," caldera.mitre.org; CVE-2025-27364, NVD, February 2025.

[20] Wiz, "Wiz to join Google Cloud," press release, wiz.io, 2025.

[21] CrowdStrike, "2025 Global Threat Report," crowdstrike.com, 2025.

[22] Y. Zeng, "Survey of attack graph analysis methods from data and knowledge processing perspective," Security and Communication Networks, 2019.

[23] M. Angelini et al., "Progressive attack graph: a technique for scalable and adaptive attack graph generation," International Journal of Information Security, Springer Nature, 2025.

[24] D. Eppstein, Z. Galil, and G. F. Italiano, "Dynamic graph algorithms," in Algorithms and Theory of Computation Handbook, CRC Press, 1999, ch. 8.

[25] Anonymous, "It is time to steer: a scalable framework for analysis-driven attack graph generation," arXiv:2312.16513, 2023.


Word count: approximately 9,200 words.

Comments