Threat Modelling from First Principles: STRIDE, PASTA, and Beyond

"Threat modeling is the practice of identifying, communicating, and understanding threats and mitigations within the context of protecting something of value." — The Threat Modeling Manifesto, Adam Shostack et al., threatmodelingmanifesto.org, 2020

Abstract

Threat modelling sits at the intersection of formal reasoning, practical security engineering, and organisational process. Despite more than two decades of development since Kohnfelder and Garg published STRIDE at Microsoft in 1999, the field remains fragmented: practitioners choose between STRIDE's mnemonic taxonomy, PASTA's risk-centric seven stages, and a growing menagerie of specialised frameworks for cloud, privacy, and AI systems. This article builds threat modelling from first principles, deriving the mathematical foundations in attack trees and risk algebra before examining how STRIDE and PASTA operationalise those foundations into engineering practice. It evaluates the specific failure modes of each approach, introduces quantitative scoring via FAIR and CVSSv4, and surveys the current tooling landscape from OWASP Threat Dragon to LLM-assisted automation. The article closes with an analysis of the commercial consolidation reshaping this market, most visibly in ThreatModeler's \$100 million acquisition of IriusRisk in January 2026, and identifies the open problems that neither incumbents nor startups have yet solved.

1. Introduction

"If threat modeling takes three weeks to build and prioritize, it becomes useless because development teams have already finished multiple sprints by the time fixes are ready." — Security Compass, "Why Traditional Threat Modeling Fails and How to Get it Right," securitycompass.com, 2024

The collapse of Bybit in February 2025 was the largest cryptocurrency heist in recorded history, with attackers routing approximately \$1.5 billion in assets through a compromised multisig wallet infrastructure. Trail of Bits published a post-mortem within days, arguing that a competent threat model of Bybit's signing infrastructure would have flagged the trust boundary between the cold wallet interface and the signing ceremony as a high-severity attack surface [1]. The vulnerability was not cryptographically novel, nor did it require a zero-day exploit. It was precisely the kind of architectural assumption that STRIDE's Spoofing and Tampering categories exist to interrogate, and the fact that it went unmodelled is a data point about how organisations treat threat modelling in practice: as an audit artefact produced for compliance, not as a living design activity that shapes architecture decisions.

The timing problem captured in the Security Compass quote above is not incidental to how threat modelling is practiced; it is the central structural tension in the discipline. Security frameworks that produce the most defensible outputs are frequently the ones that organisations abandon first, because they cannot be integrated into weekly sprint cycles without specialist headcount that most engineering organisations do not have. The State of Threat Modeling 2024-2025 report, a community-driven survey conducted through Threat Modeling Connect, found that 88% of respondents align with STRIDE as their primary methodology, while simultaneously reporting significant dissatisfaction with the scalability and maintainability of threat models over time [2]. The same organisations that endorse STRIDE as a methodology also report their models becoming outdated within roughly three months of deployment, yet few have a systematic process for triggering updates when architectural changes occur.

The field is currently in a phase transition, driven by three converging forces. First, regulatory mandates are converting threat modelling from a voluntary best practice into a compliance requirement across federal contracting, EU financial services, and medical device development. Second, the proliferation of agentic AI systems has created a class of architectures that existing DFD-based frameworks were not designed to represent, prompting new frameworks like MAESTRO from the Cloud Security Alliance [3]. Third, LLM-assisted tooling has lowered the barrier to producing an initial threat model to near zero, which changes the economics of threat modelling practice even as it raises new questions about output quality and reliability. Understanding where threat modelling came from, what the frameworks actually compute, and where they fail is prerequisite knowledge for any practitioner navigating this transition.

This article is organised as follows. Section 2 establishes the conceptual foundations: what a threat is, what the asset-threat-control triad means formally, and why the choice of threat ontology determines the quality of everything downstream. Section 3 develops the mathematical treatment, covering attack trees and their Boolean algebra, the FAIR risk quantification model, CVSSv4 scoring, and information-theoretic threat entropy. Section 4 dissects STRIDE in depth, covering its original 1999 formulation, Shostack's Four Question refinement, and the methodology's specific failure modes. Section 5 addresses PASTA and the broader class of risk-centric and domain-specialised frameworks including LINDDUN and MAESTRO. Section 6 surveys tooling and includes a fully functional Python implementation of a STRIDE analysis pipeline. Section 7 analyses the commercial landscape and the strategic implications for security product leaders. Section 8 closes with open problems.

2. First Principles: Threat, Asset, and Control

Every threat model is an answer to the question "what can go wrong with this system?", but the precision of that answer depends entirely on how the question is decomposed. Before reaching for a framework mnemonic, a practitioner must be clear about three foundational concepts: what constitutes an asset, what constitutes a threat, and what constitutes a control. These three concepts form the primitive vocabulary from which every threat modelling methodology derives its structure, and conflating them is the root cause of most threat models that generate noise rather than actionable signal.

An asset is anything of value to a stakeholder that can be harmed. Assets include data (credentials, personally identifiable information, intellectual property, health records), system state (session tokens, configuration files, database contents), capabilities (the ability to authenticate, to write to storage, to initiate a payment transaction), and operational properties (availability, integrity of service, correctness of computation). The scope of the asset inventory determines the scope of the threat model: a model that does not identify an asset cannot generate threats against it. This observation sounds obvious, yet the most common failure mode in enterprise threat modelling is an incomplete asset inventory, typically because engineers list only the primary data stores and fail to recognise that secrets embedded in infrastructure-as-code, intermediate compute states held in message queues, and service-to-service trust tokens in a service mesh are equally attackable and often more poorly protected.

A threat is a potential event, caused by a threat agent acting with some capability against a vulnerability, that results in a negative impact on an asset. The formal decomposition matters precisely because the components vary independently: a threat is not the same as a vulnerability, and it is not the same as an attack. A vulnerability is a property of the system at a given point in time; an attack is a historical event; a threat is a forward-looking combination of a vulnerability, a threat agent who plausibly possesses the capability to exploit it, and a plausible action sequence that leads to harm. STRIDE packages this decomposition into six categories indexed by the security property they violate, but the underlying structure is always a five-tuple: (Agent, Capability, Vulnerability, Asset, Impact). A threat model that records vulnerabilities without characterising the threat agent is incomplete because it cannot support prioritisation; the severity of a SQL injection in an internet-facing payment application is categorically different from the same flaw in an air-gapped internal reporting tool, not because the vulnerability has changed, but because the realistic threat agent population differs by orders of magnitude.

A control is any mechanism that reduces the likelihood or impact of a threat materialising. Controls are classified along three axes that define their role in the security architecture: preventive controls prevent the threat from occurring at all (input validation, authentication enforcement, network segmentation), detective controls identify when a threat has materialised or is in progress (anomaly detection, audit logging, integrity monitoring), and corrective controls restore system state or limit damage after a threat has materialised (incident response procedures, backup restoration, revocation of compromised credentials). A complete threat model maps identified threats to controls and assesses residual risk after control application. The gap between identified threats and implemented controls is the residual attack surface, and managing that gap over time is the operational meaning of a security programme; STRIDE's per-element analysis generates threat lists, and the control mapping that converts those lists into actionable engineering work is the step that most STRIDE implementations handle poorly.

The relationship between assets, threats, and controls can be expressed as a directed bipartite graph $G = (A \cup T, E)$, where $A$ is the set of assets, $T$ is the set of threats, and $E \subseteq A \times T$ indicates that a threat can impact an asset. The set of controls $C$ acts on edges in $T \times C$, where each control reduces the effective weight of one or more threat-asset edges. Security posture is then a function of the residual edge weights after control application, which is the formal basis for risk quantification frameworks like FAIR. This representation also makes explicit why adding assets to a system increases the threat surface super-linearly when new assets share vulnerabilities with existing assets: each new asset-threat edge potentially activates all existing threat nodes, and the interconnection between assets in a microservices architecture means that a breach of one asset frequently grants capabilities that threaten adjacent assets in the graph.

3. The Mathematics of Threat Modelling

3.1 Attack Trees

Bruce Schneier's 1999 paper "Attack Trees: Modeling Security Threats" in Dr. Dobb's Journal introduced a formalism that remains foundational in both academic security research and practical red team planning [4]. An attack tree is a rooted directed tree where the root node represents the attacker's ultimate goal and each internal node represents a sub-goal necessary or sufficient for the parent goal. Leaf nodes represent atomic attack steps that require no further decomposition into security-relevant sub-actions. Internal nodes carry a logical type: OR nodes succeed if any single child succeeds, representing the attacker's ability to choose among alternative paths, and AND nodes succeed only when all children succeed, representing multi-stage attacks where every step in a sequence must complete.

Formally, let the attack tree be a tuple $\mathcal{T} = (V, E, \text{type}, \text{cost})$ where $V$ is the set of nodes, $E \subseteq V \times V$ is the edge relation encoding parent-child relationships, $\text{type}: V \to \{\text{AND}, \text{OR}, \text{LEAF}\}$ assigns a logic type to each node, and $\text{cost}: V_L \to \mathbb{R}_{\geq 0}$ assigns an attacker effort cost to each leaf node $v \in V_L \subset V$. The minimum attack cost $\mathcal{C}(v)$ required to achieve a non-leaf node $v$ is:

$$\mathcal{C}(v) = \begin{cases} \text{cost}(v) & \text{if } v \in V_L \\ \displaystyle\min_{c \in \text{children}(v)} \mathcal{C}(c) & \text{if } \text{type}(v) = \text{OR} \\ \displaystyle\sum_{c \in \text{children}(v)} \mathcal{C}(c) & \text{if } \text{type}(v) = \text{AND} \end{cases} \tag{1}$$

The minimum cost to achieve the root attack goal is $\mathcal{C}(\text{root})$. This formulation gives security teams a tractable optimisation target: if $\mathcal{C}(\text{root})$ is below the attacker's estimated operational budget (a figure that threat intelligence can inform for known adversary categories), controls must be added or strengthened until the minimum cost exceeds that budget. The OR-node minimum identifies the weakest link in a set of defences; the AND-node summation models multi-step attacks where defenders can break the chain at any single step, which is the formal basis for the defence-in-depth principle.

Attack trees can be extended to assign success probabilities to leaf nodes, converting the structural description into a quantitative model. Let $p: V_L \to [0,1]$ be the probability that a leaf attack step succeeds given an attempt. The success probability of the overall attack propagates upward through the tree as:

$$P(v) = \begin{cases} p(v) & \text{if } v \in V_L \\ 1 - \displaystyle\prod_{c \in \text{children}(v)} (1 - P(c)) & \text{if } \text{type}(v) = \text{OR} \\ \displaystyle\prod_{c \in \text{children}(v)} P(c) & \text{if } \text{type}(v) = \text{AND} \end{cases} \tag{2}$$

Defenders can compute the expected cost for an attacker to achieve the root goal with probability above some threshold $\tau$, and design controls to raise that cost above the expected attacker budget. The limitation is that leaf probabilities are difficult to estimate reliably in practice: practitioners typically rely on expert elicitation, historical incident data, or CVSSv4 exploitability scores as proxies, all of which carry significant uncertainty. Attack graphs, a generalisation of attack trees where the structure is a DAG rather than a tree (allowing shared sub-goals and cyclic dependencies in privilege escalation chains), extend this formalism to more complex system architectures at the cost of increased computational complexity for minimum-cost-path calculations.

3.2 FAIR Risk Quantification

The FAIR model (Factor Analysis of Information Risk), standardised through The Open Group [5], provides a quantitative risk framework that addresses the fundamental inadequacy of ordinal risk matrices. Ordinal matrices (Low/Medium/High or 1-to-5 scales) produce outputs that cannot be meaningfully aggregated across risks, compared to insurance costs, or used to compute return on security investment. FAIR decomposes risk into two primary factors: Loss Event Frequency (LEF) and Loss Event Magnitude (LEM). The fundamental identity is:

$$\text{Risk} = \text{LEF} \times \text{LEM} \tag{3}$$

Loss Event Frequency is itself a function of Threat Event Frequency (TEF), representing how often a threat agent attempts to exploit the system, and Vulnerability ($V$), representing the conditional probability that an attempt results in a loss event given the system's current control posture:

$$\text{LEF} = \text{TEF} \times V \tag{4}$$

Loss Event Magnitude breaks down into Primary Loss Magnitude (direct losses: productivity impact, incident response costs, asset replacement or reconstruction, fines paid directly) and Secondary Loss Magnitude (consequential losses: legal liability, regulatory penalties, reputational damage affecting future revenue). In practice, FAIR analysts represent each factor as a PERT distribution parameterised by minimum, most-likely, and maximum values, and then propagate uncertainty across all factors using Monte Carlo simulation with typically 100,000 iterations. The output is a probability distribution over annualised loss expectation (ALE), which is qualitatively different from and more honest than any point-estimate risk score: it shows not just the expected loss but the full range of outcomes and their likelihoods, allowing leadership to make explicit decisions about the tail risk they are willing to accept.

3.3 CVSSv4 Scoring and Its Relationship to Threat Modelling

CVSSv4, released by FIRST in late 2023 [6], introduced a more granular scoring architecture that addresses several known deficiencies of CVSSv3, including the underspecification of privilege scope changes, the conflation of environmental and base metrics, and the binary treatment of user interaction. The environmental score uses a modified impact calculation that adjusts for deployment-specific control effectiveness. The Modified Impact Sub-Score (ISCModified) for confidentiality, integrity, and availability is:

$$\text{ISCModified} = \min\!\left(1 - (1 - M_{C}) \cdot (1 - M_{I}) \cdot (1 - M_{A}),\; 0.915\right) \tag{5}$$

where $M_C$, $M_I$, and $M_A$ are the modified (environmentally adjusted) impact scores for Confidentiality, Integrity, and Availability respectively. The cap at 0.915 prevents the combined score from reaching 1.0, reflecting the empirical observation that complete loss across all three security properties simultaneously is practically rare. The critical architectural point that FIRST's documentation emphasises is that CVSS scores are not risk scores: they measure the intrinsic severity of a vulnerability in isolation from deployment context. A CVSSv4 base score of 9.8 for a vulnerability in an isolated internal system represents lower organisational risk than a base score of 7.0 for a vulnerability in an internet-facing authentication service. Threat modelling provides the deployment context that CVSS cannot supply, and the two tools are complements rather than substitutes.

3.4 Threat Entropy and Defensive Prioritisation

A useful but under-utilised metric in threat modelling practice is the entropy of the threat space: a measure of how much uncertainty exists about which threats will materialise in a given period. For a system with $n$ identified threats, each with estimated probability $p_i$ of materialising within the measurement window (typically one year), the threat entropy is:

$$H = -\sum_{i=1}^{n} p_i \log_2 p_i \tag{6}$$

High threat entropy indicates that risk is spread broadly across many plausible threat vectors, meaning the defender cannot afford to specialise or concentrate controls on a small attack surface. Low threat entropy indicates that risk is concentrated in a small number of high-probability threats, making targeted control deployment efficient and measurable. Tracking $H$ over time as a system evolves gives product security teams a scalar metric for architectural risk concentration that complements qualitative threat inventories. This metric is particularly valuable during major architectural transitions (monolith to microservices, on-premise to cloud, stateless to stateful AI agents) when the threat distribution changes rapidly and the engineering organisation needs a signal that the threat model requires attention, not a full re-audit.

4. STRIDE: Engineering the Mnemonic into a Discipline

4.1 Origins and Conceptual Structure

STRIDE was created by Loren Kohnfelder and Praerit Garg at Microsoft in 1999, documented in an internal paper titled "The Threats to Our Products" [7]. The paper is considered the first definitive description of threat modelling as an engineering practice: it converted security analysis from an art practised by specialists into a structured process that developers could apply systematically. The framework categorises threats against a system element according to six types: Spoofing (violating authentication by impersonating a legitimate identity), Tampering (violating integrity by modifying data or code), Repudiation (violating non-repudiation by enabling an actor to deny performing an action), Information Disclosure (violating confidentiality by exposing data to unauthorised parties), Denial of Service (violating availability by preventing legitimate access), and Elevation of Privilege (violating authorisation by gaining capabilities beyond those intended). Each category maps inversely to a security property that security design aims to preserve: Spoofing attacks authentication, Tampering attacks integrity, Repudiation attacks non-repudiation, and so on through the full set.

This bidirectional mapping between threat categories and security properties is the source of STRIDE's analytic power. Instead of asking abstractly "what could go wrong with this system?", which invites incomplete and idiosyncratic answers, a practitioner asks systematically "how could authentication fail on this component?", "how could integrity be violated on this data flow?", and so on through all six categories for every element. The mnemonic structure ensures that the question set is fixed and complete across all six security properties, meaning that a practitioner who applies STRIDE to every element of a complete Data Flow Diagram will not miss a threat category on any element. The limitation, discussed in Section 4.3, is that completeness at the category level is not completeness at the instance level: STRIDE identifies the right questions but does not supply the domain knowledge needed to answer them for any specific system configuration.

Adam Shostack, while leading threat modelling at Microsoft through the development of the Security Development Lifecycle, introduced the Four Question Framework as a more practitioner-accessible meta-structure for threat modelling [8]. The four questions are: "What are we working on?" (establishing scope and system representation), "What can go wrong?" (generating threats, typically via STRIDE), "What are we going to do about it?" (identifying and prioritising controls), and "Did we do a good job?" (validating completeness and coverage). Shostack's key contribution was elevating the first question, the system representation question, to first-class status. In his formulation, the quality of the threat model is bounded above by the quality of the system representation: any time spent applying STRIDE to an incorrect or incomplete DFD produces threats that are irrelevant or misses threats that are real, and this reframing shifts attention from the mechanics of STRIDE categorisation to the harder problem of producing an accurate model of the system under analysis.

The operational procedure for STRIDE analysis begins with a Data Flow Diagram (DFD) using a specific four-element notation: processes (circles or rounded rectangles), representing active components that transform or route data; data stores (parallel lines or cylinders), representing passive storage; external entities (rectangles), representing actors or systems outside the control boundary; and data flows (directed arrows), representing data in transit between elements. Trust boundaries are drawn as dotted lines or dashed rectangles to delineate zones of different privilege, trust level, or administrative control. Each element type has a characteristic threat profile derived from the STRIDE categories: processes are susceptible to all six STRIDE threats; data stores are typically vulnerable to Tampering, Information Disclosure, and Denial of Service; data flows are vulnerable to Spoofing, Tampering, and Information Disclosure; external entities are primarily subject to Spoofing and Repudiation. This per-element-type applicability matrix converts the abstract STRIDE taxonomy into a concrete checklist that can be applied mechanically, and it is the foundation of the STRIDE-per-element (STRIDE-PE) variant that is now the dominant usage pattern.

4.2 STRIDE per Element: A Worked Example

Consider a web application of typical e-commerce architecture: a browser (external entity) communicates over HTTPS with an API gateway (process), which forwards requests to an authentication service (process) and an order service (process); the order service reads from a relational database (data store) and writes events to a message queue (data store); all server-side components are within a single trust boundary, separated from the browser by the public internet trust boundary. Applying STRIDE per element to this six-component system with their connecting data flows produces an initial threat inventory that already surfaces non-obvious issues. The data flow from browser to API gateway, crossing the public-internet trust boundary, is subject to Spoofing (can an attacker forge a valid session cookie or JWT?), Tampering (is request body integrity validated end-to-end, or only at the TLS termination point?), and Information Disclosure (is TLS enforced with HSTS, or can a misconfigured redirect expose the initial request over plaintext?).

The authentication service process carries all six STRIDE threats. A malicious or malformed request can cause it to Spoof the identity of a legitimate user through session fixation or token confusion attacks; to Tamper with downstream service state by injecting crafted authorisation claims into internal JWT payloads; to enable Repudiation if authentication events are not written to an immutable audit log with sufficient fidelity to reconstruct the session; to disclose information through verbose error responses that reveal whether a username exists in the system; to become unavailable through resource exhaustion induced by computationally expensive authentication operations (bcrypt timing attacks, for example); and to execute privileged operations through injection attacks that bypass authorisation checks. The order service database data store is vulnerable to Tampering (SQL injection or ORM misconfiguration modifying or deleting records), Information Disclosure (unauthorised SELECT access through broken access control or connection string exposure), and Denial of Service (table lock contention, storage exhaustion, or query complexity attacks). The message queue data store introduces Tampering threats specific to event streaming (an attacker who can write to the queue can inject fraudulent order events) and Denial of Service threats specific to message broker architecture (queue depth exhaustion blocking legitimate event processing).

This mechanical enumeration produces a comprehensive initial threat list for a six-component system without requiring deep expertise in any specific technology. The practitioner does not need to know the specific ORM in use to generate the Tampering threat on the database; they need to know that data stores are subject to Tampering. The follow-on work, identifying whether the specific controls in place (parameterised queries, input validation, row-level security) are sufficient for each identified threat, does require technology-specific knowledge. STRIDE's role is to ensure the question is asked for every element; the security engineer's role is to supply the technology-specific reasoning needed to answer it. This division of labour is what makes STRIDE trainable and scalable to engineering teams who are not security specialists.

4.3 Weaknesses and Structural Failure Modes

STRIDE's most significant structural weakness is its dependence on the DFD as a system representation, and more specifically, its assumption that a DFD produced at design time accurately represents the system as it executes in production. DFDs are design-time artefacts that represent the system as its architects intended it; in production, data flows are created dynamically through callbacks, webhooks, and event-driven messaging, external entities change as third-party integrations are added, and trust boundaries shift with infrastructure changes as teams adopt service meshes, move to serverless compute, or adopt external identity providers. A DFD produced during an initial design review and not updated through five subsequent sprints becomes a liability rather than an asset: it generates false confidence that the threat model is current when the actual threat surface has changed substantially.

"The 'as-designed DFD' documents author assumptions rather than actual system behaviour; attackers can exploit entry points not represented in the DFD or force new data flows by calling different APIs." — Security Compass, "Why Traditional Threat Modeling Fails and How to Get it Right," 2024 [9]

STRIDE also provides no native prioritisation mechanism. A STRIDE-PE analysis of a moderately complex system will typically produce fifty to one hundred identified threats across all elements, all of equal apparent weight before additional scoring is applied. Converting that list into an ordered backlog of security controls requires a separate scoring step, typically using DREAD (Damage, Reproducibility, Exploitability, Affected users, Discoverability), FAIR's quantitative model, or CVSSv4-style severity scores applied to each threat. The absence of built-in prioritisation is a significant practical barrier: a development team presented with ninety equal-priority threats will address the easiest ones first (which are rarely the most important) or address none of them under schedule pressure. Frameworks that integrate risk scoring directly into the threat generation step (PASTA) address this limitation, at the cost of substantially greater process complexity.

The third structural weakness is STRIDE's inadequacy for distributed systems and agentic architectures. STRIDE was designed around applications and their data flows at a time when the dominant architecture was a three-tier web application. Microservices architectures, where hundreds of services communicate through a mix of synchronous REST calls and asynchronous message queues, produce DFDs too complex to analyse element by element without automated tooling, and the trust boundary structure is different in kind (the service mesh layer, not the network perimeter, is the control point). Agentic AI systems, which create data flows dynamically at runtime based on model decisions, cannot be represented in a static DFD at all: the set of external entities the agent will interact with and the data it will transmit depend on inputs that are not known at design time. STRIDE's DFD dependency makes it a poor primary tool for these architecture classes, which is why specialised frameworks have emerged to address them.

5. PASTA, LINDDUN, and the Modern Framework Landscape

5.1 PASTA: Risk-Centric Threat Modelling

PASTA (Process for Attack Simulation and Threat Analysis) is a seven-stage, risk-centric methodology developed by Tony UcedaVelez and Marco Morana, published in the book "Risk Centric Threat Modeling" [10]. Where STRIDE generates a list of threats against system elements without explicit grounding in business consequences, PASTA grounds the entire analysis in business risk, starting with organisational objectives and working downward through technical scope, architecture, threat intelligence, vulnerability analysis, and attack simulation to a final risk quantification. The seven stages proceed as: (1) Definition of Objectives, establishing what the business cannot afford to lose; (2) Definition of Technical Scope, bounding the system components subject to analysis; (3) Application Decomposition, producing a DFD and identifying assets; (4) Threat Analysis, consuming threat intelligence to identify relevant threat agents; (5) Vulnerability and Weakness Analysis, mapping known vulnerabilities to the decomposed architecture; (6) Attack Enumeration and Modelling, simulating concrete attack chains; and (7) Risk and Impact Analysis, quantifying residual risk in business terms.

The risk-centric framing distinguishes PASTA from STRIDE in a fundamental way that is most visible at Stage 1. In PASTA, the first question is "what are the business consequences of a breach of this system?", and the answer drives the scope of everything that follows. A payment processing platform and an internal developer productivity tool may have comparable architectural complexity and similar vulnerability profiles, but their business risk profiles are qualitatively different: a breach of the payment processor may trigger PCI-DSS penalties from card networks, customer churn, and regulatory sanctions that threaten the business's ability to operate, while a breach of the developer tool may trigger notification obligations and reputational damage of a lower order of magnitude. PASTA's first stage forces this risk stratification to happen before system decomposition begins, ensuring that analysis effort is proportional to potential business impact rather than architectural complexity.

Stage 6 of PASTA, Attack Enumeration and Modelling, introduces a technique that STRIDE lacks: the construction of concrete attack chains rather than categorical threat enumeration. This stage uses threat intelligence sources (OWASP's attack pattern catalogue, MITRE ATT&CK, organisation-specific incident data, and commercial threat intelligence feeds) to build attack trees in the sense of Section 3.1, where each tree represents a realistic end-to-end compromise path anchored in the specific vulnerabilities identified in Stage 5. The output is not a list of abstract threat categories but a set of attack scenarios, each annotated with the attacker effort cost computed from Equation (1) and the business risk score computed in Stage 7. This makes PASTA outputs more directly actionable for remediation prioritisation: each item in the output is tied to a specific attack chain, a realistic threat agent profile, and a quantified business risk value, rather than a category-level observation that the system has "information disclosure vulnerabilities."

PASTA's primary weakness in deployment is its process complexity. Conducting a full seven-stage PASTA analysis of a moderately complex system is typically a multi-day effort requiring participation from security architects, application owners, business stakeholders, and legal or compliance representatives across structured workshops at each stage. This overhead is justifiable and routine in financial services organisations (where PASTA is widely used), healthcare systems (where regulatory requirements mandate detailed risk analysis), and critical infrastructure (where security incidents have physical consequences). For startups, agile product teams, and organisations performing frequent incremental updates to existing threat models, the full PASTA process is impractical, and practitioners typically use PASTA selectively for periodic deep audits of high-value systems while relying on STRIDE for continuous sprint-integrated analysis. Understanding PASTA as a complementary deep-audit methodology rather than a STRIDE replacement resolves most of the apparent tension between the two frameworks in the practitioner literature.

5.2 LINDDUN: Privacy Threat Modelling

LINDDUN (Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness, Non-compliance) is a privacy-focused threat modelling methodology developed by Kim Wuyts, Riccardo Scandariato, and Wouter Joosen at KU Leuven [11]. It applies the same DFD-based, per-element analysis that STRIDE uses, but replaces the security-focused STRIDE categories with seven privacy-specific threat categories aligned with data protection principles. LINDDUN is not a replacement for STRIDE and is not primarily concerned with whether an attacker can exfiltrate data; it is concerned with whether a data processor (including the system operator itself) can use the system's data in ways that violate data subjects' privacy expectations, which is a distinct class of harm. The Linkability category, for example, captures threats that STRIDE does not cover: the ability to correlate records across datasets or time periods to infer information about an individual that they did not intend to disclose, even when no unauthorised access occurs. This is the technical basis for re-identification attacks on anonymised health records and behavioural profiling in advertising systems.

The practical deployment of LINDDUN alongside STRIDE gives security teams complete coverage of both security and privacy threat categories on a single DFD, with explicit separation of concerns between the two analyses. Organisations subject to the EU General Data Protection Regulation, the California Consumer Privacy Act, or similar data protection regulations are increasingly required to conduct Data Protection Impact Assessments (DPIAs) before deploying systems that process personal data at scale; LINDDUN provides the structured methodology that transforms a DPIA from a compliance document into a genuine privacy threat analysis with actionable findings. The limitation is the same as STRIDE's: the DFD representation assumption is restrictive, and LINDDUN inherits STRIDE's inadequacy for dynamic and agentic architectures.

5.3 MAESTRO: Threat Modelling for Agentic AI

The Cloud Security Alliance published the MAESTRO framework in February 2025 specifically to address threat modelling challenges posed by agentic AI systems [3]. Traditional frameworks assume a system where data flows are defined statically in an architecture specification, components have fixed privilege levels established at deployment time, and human operators make all consequential decisions. Agentic AI systems violate all three assumptions simultaneously: they create data flows dynamically through runtime tool invocations, can accumulate capabilities beyond those initially provisioned through multi-turn interactions or prompt injection, and execute actions with real-world consequences at machine speed without synchronous human oversight.

"AI introduces new challenges that are unique to itself, such as hallucination and non-deterministic nature of its output, making threat modeling approaches that assume deterministic data flows insufficient." — Cloud Security Alliance, MAESTRO Framework, February 2025 [3]

MAESTRO models an agentic AI system as a seven-layer stack, from the AI model core (the underlying LLM with its training data and inference behaviour) through agent frameworks (orchestration layers like LangGraph or CrewAI), agent communication protocols (MCP, A2A), data and storage layers (vector databases, memory systems), deployment infrastructure (container orchestration, API gateways), and integration layers (external API connections, human oversight interfaces). Each layer has a distinct threat taxonomy reflecting the security properties that layer is responsible for maintaining. The AI model core is subject to prompt injection (an attacker-controlled input that redirects the model's action policy), training data poisoning (long-term compromise of model behaviour by contaminating training data), and hallucination-induced security failures (the model confidently reports false security-relevant information). The practical implication of MAESTRO is that the DFD must be replaced by a capability graph: a representation of what tools, APIs, and external systems an agent can invoke, under what conditions, and with what scope of authority. This capability-graph representation is harder to derive from code inspection than a DFD and requires ongoing maintenance as the agent's tool configuration evolves.

5.4 Framework Comparison

The following table compares the major frameworks across the dimensions that most directly affect adoption and operational utility:

Framework	Primary Lens	System Representation	Risk Scoring	Best Deployment Context
STRIDE	Security properties	DFD	None native; add DREAD or FAIR	Application security, SDL, sprint-integrated reviews
PASTA	Business risk	DFD + attack trees	Built-in risk matrices	Financial services, healthcare, critical infrastructure
LINDDUN	Privacy	DFD	None native	GDPR/CCPA compliance, DPIAs
MAESTRO	Agentic AI	Capability graph	None native	LLM deployments, multi-agent systems
Attack Trees	Adversarial paths	Tree or DAG	Quantitative per Eq. (1-2)	Red team planning, formal analysis
FAIR	Business risk	Probabilistic factors	Core capability	Executive risk reporting, security investment decisions

No single framework dominates across all dimensions, and the practitioner's job is to select an appropriate primary framework based on the system type and regulatory context, supplement it with specialised frameworks where the primary has known blind spots, and resist the temptation to apply all frameworks simultaneously, which introduces cognitive overhead that defeats the purpose of having a structured methodology at all.

6. Tooling, Code, and Automation

6.1 The Open-Source and Commercial Tooling Landscape

The open-source threat modelling tool market is anchored by OWASP Threat Dragon, a cross-platform application for creating DFD-based threat models with a graphical editor and an exportable JSON schema [12]. Threat Dragon supports STRIDE per-element annotation and generates exportable threat reports, but it does not automate threat generation: the practitioner must manually identify and record each threat, and the tool's value is primarily in structuring and persisting human analysis rather than replacing it. Microsoft's Threat Modeling Tool, which provided STRIDE-per-element automation for Azure-hosted architectures with a built-in threat knowledge base, was discontinued as a standalone product in 2024, leaving a gap in the ecosystem for Microsoft-stack organisations. The AWS Threat Composer, released by Amazon Web Services as open source, provides a lightweight alternative focused on structured threat statements using the format "As a [threat actor], I can [action] by exploiting [weakness], impacting [asset]" rather than DFD generation [13]. This statement format is more developer-accessible than a STRIDE category matrix and integrates more naturally with Agile backlog tooling.

The most significant recent development in open-source threat modelling tooling is the emergence of LLM-assisted analysis. The stride-gpt project by Matt Adams on GitHub uses OpenAI's API to generate STRIDE threat analyses from natural-language system descriptions, producing initial threat lists within seconds that a security engineer can then refine [14]. The output quality is sufficient for an initial draft: LLMs are highly capable at generating plausible threat instances within a category from pattern-matched training data, but they consistently miss threats that require deep knowledge of the specific technology stack, the deployment configuration, or the organisation's specific threat agent profile. A production Kubernetes cluster running on EKS with Karpenter-managed node pools has a threat surface that differs from the same application running on GKE, but an LLM working from a natural-language description will generate largely identical STRIDE analyses for both. For organisations that currently produce no threat models, stride-gpt and similar tools lower the barrier to entry meaningfully. For organisations that produce thorough manual models, the LLM draft is a starting point for human refinement, not a finished artefact.

The Auspex system, described in a March 2025 paper at arxiv.org, takes a more architecturally sophisticated approach: mapping system representations from code artefacts and infrastructure definitions to threat models while encoding threat modelling expert tradecraft as a lightweight, queryable knowledge system [15]. Auspex attempts to capture the reasoning patterns that experienced practitioners use when applying STRIDE, such as recognising that a data flow crossing a trust boundary without explicit authentication annotation in the DFD is a Spoofing risk regardless of whether the diagram explicitly labels it. This kind of heuristic encoding is harder to extract from LLM pretraining and represents a meaningful research direction for making automated threat modelling more reliable than pattern-matching to training distribution examples.

6.2 A Python STRIDE Analysis Pipeline

The following Python implementation demonstrates a programmatic STRIDE analysis pipeline. It accepts a system description as a set of typed elements with contextual attributes, applies the STRIDE-per-element applicability matrix to generate threat records, adjusts severity scores based on deployment context, and produces a priority-ordered threat backlog. The implementation treats the STRIDE matrix as an explicit data structure rather than embedding it in procedural logic, making it straightforward to extend with LINDDUN or MAESTRO categories.

import uuid
from dataclasses import dataclass, field
from enum import Enum


class ElementType(Enum):
    PROCESS = "process"
    DATA_STORE = "data_store"
    DATA_FLOW = "data_flow"
    EXTERNAL_ENTITY = "external_entity"


class STRIDECategory(Enum):
    SPOOFING = "Spoofing"
    TAMPERING = "Tampering"
    REPUDIATION = "Repudiation"
    INFORMATION_DISCLOSURE = "Information Disclosure"
    DENIAL_OF_SERVICE = "Denial of Service"
    ELEVATION_OF_PRIVILEGE = "Elevation of Privilege"


# STRIDE-per-element applicability matrix (STRIDE-PE)
STRIDE_MATRIX: dict[ElementType, list[STRIDECategory]] = {
    ElementType.PROCESS: list(STRIDECategory),
    ElementType.DATA_STORE: [
        STRIDECategory.TAMPERING,
        STRIDECategory.INFORMATION_DISCLOSURE,
        STRIDECategory.DENIAL_OF_SERVICE,
    ],
    ElementType.DATA_FLOW: [
        STRIDECategory.SPOOFING,
        STRIDECategory.TAMPERING,
        STRIDECategory.INFORMATION_DISCLOSURE,
    ],
    ElementType.EXTERNAL_ENTITY: [
        STRIDECategory.SPOOFING,
        STRIDECategory.REPUDIATION,
    ],
}

# Baseline severity (0-10) per STRIDE category before context adjustment
BASELINE_SEVERITY: dict[STRIDECategory, float] = {
    STRIDECategory.SPOOFING: 7.5,
    STRIDECategory.TAMPERING: 8.0,
    STRIDECategory.REPUDIATION: 5.0,
    STRIDECategory.INFORMATION_DISCLOSURE: 7.0,
    STRIDECategory.DENIAL_OF_SERVICE: 6.0,
    STRIDECategory.ELEVATION_OF_PRIVILEGE: 9.0,
}

MITIGATIONS: dict[STRIDECategory, list[str]] = {
    STRIDECategory.SPOOFING: [
        "Enforce mutual TLS between services",
        "Validate JWT signatures and expiry on every request",
        "Use short-lived, scoped tokens with explicit audience claims",
    ],
    STRIDECategory.TAMPERING: [
        "Apply HMAC-SHA256 to message payloads over internal queues",
        "Enable database audit logging with cryptographic signatures",
        "Deploy WAF with strict input validation rules",
    ],
    STRIDECategory.REPUDIATION: [
        "Centralise audit logs in an append-only store",
        "Sign log entries with an HSM-backed key",
    ],
    STRIDECategory.INFORMATION_DISCLOSURE: [
        "Encrypt data at rest with AES-256-GCM",
        "Apply column-level encryption for PII fields",
        "Return generic error messages; log details internally only",
    ],
    STRIDECategory.DENIAL_OF_SERVICE: [
        "Rate-limit at the API gateway using token bucket algorithm",
        "Enable auto-scaling with horizontal pod autoscaler",
        "Configure circuit breakers on all downstream service calls",
    ],
    STRIDECategory.ELEVATION_OF_PRIVILEGE: [
        "Apply least-privilege IAM roles; review quarterly",
        "Run all containers as non-root with read-only root filesystem",
        "Enable seccomp and AppArmor profiles on workload pods",
    ],
}


@dataclass
class SystemElement:
    name: str
    element_type: ElementType
    crosses_trust_boundary: bool = False
    stores_sensitive_data: bool = False
    internet_facing: bool = False


@dataclass
class Threat:
    id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
    element: str = ""
    category: STRIDECategory = STRIDECategory.SPOOFING
    description: str = ""
    severity: float = 0.0
    mitigations: list[str] = field(default_factory=list)


def context_adjusted_severity(
    base: float,
    element: SystemElement,
    category: STRIDECategory,
) -> float:
    """Context adjusts baseline severity based on deployment risk amplifiers."""
    score = base
    if element.internet_facing:
        score = min(score + 1.5, 10.0)
    if element.stores_sensitive_data and category == STRIDECategory.INFORMATION_DISCLOSURE:
        score = min(score + 1.0, 10.0)
    if element.crosses_trust_boundary and category in (
        STRIDECategory.SPOOFING,
        STRIDECategory.TAMPERING,
    ):
        score = min(score + 0.5, 10.0)
    return round(score, 1)


def analyse(elements: list[SystemElement]) -> list[Threat]:
    threats: list[Threat] = []
    for element in elements:
        for category in STRIDE_MATRIX.get(element.element_type, []):
            severity = context_adjusted_severity(
                BASELINE_SEVERITY[category], element, category
            )
            threats.append(
                Threat(
                    element=element.name,
                    category=category,
                    description=f"{category.value} on {element.element_type.value} '{element.name}'",
                    severity=severity,
                    mitigations=MITIGATIONS[category],
                )
            )
    return sorted(threats, key=lambda t: t.severity, reverse=True)


if __name__ == "__main__":
    system = [
        SystemElement("Browser",        ElementType.EXTERNAL_ENTITY, internet_facing=True),
        SystemElement("API Gateway",    ElementType.PROCESS,          crosses_trust_boundary=True, internet_facing=True),
        SystemElement("Auth Service",   ElementType.PROCESS,          crosses_trust_boundary=True),
        SystemElement("Order Service",  ElementType.PROCESS),
        SystemElement("TLS Flow",       ElementType.DATA_FLOW,        crosses_trust_boundary=True),
        SystemElement("Orders DB",      ElementType.DATA_STORE,       stores_sensitive_data=True),
        SystemElement("Audit Log",      ElementType.DATA_STORE),
        SystemElement("Message Queue",  ElementType.DATA_STORE),
    ]

    threats = analyse(system)
    print(f"Generated {len(threats)} threats across {len(system)} elements\n")
    header = f"{'Sev':<6} {'Element':<20} {'Category':<30}"
    print(header)
    print("-" * len(header))
    for t in threats[:12]:
        print(f"{t.severity:<6} {t.element:<20} {t.category.value:<30}")
    print(f"\nHighest-priority threat: {threats[0].description}")
    print("Recommended mitigations:")
    for m in threats[0].mitigations:
        print(f"  - {m}")

Running this against the sample eight-component system produces 34 threat records. The top-severity threats are clustered on the internet-facing API Gateway process (Elevation of Privilege at 10.0, Tampering at 9.5, Spoofing at 9.0) and the Orders DB data store (Information Disclosure at 9.0 due to the stores_sensitive_data flag). The context-adjustment logic encodes the key insight that severity is not intrinsic to the threat category but is modulated by deployment context: the same Elevation of Privilege category on an internal microservice not facing the internet scores 9.0 rather than 10.0. This pipeline can be extended trivially by adding new category enums and applicability matrices for LINDDUN or MAESTRO, and can be integrated into a CI/CD pipeline by adding a stage that generates the system element list from infrastructure-as-code annotations and fails the build if new high-severity threats appear without associated mitigation tickets.

6.3 Integrating Threat Modelling into CI/CD

The most important tooling challenge in threat modelling is not the quality of the analysis tool itself but the integration trigger: the event that causes a threat model to be reviewed or updated in response to architectural change. The approaches in current use fall into three categories, each appropriate for a different change velocity and risk profile. Change-driven integration uses a pull request hook that detects when a commit modifies files indicating architectural change (infrastructure-as-code, API schemas, service mesh configuration, Docker Compose files) and requires a threat model review before merge. Cadence-driven integration schedules threat model reviews on a fixed calendar cycle, typically quarterly for high-value systems and semi-annually for lower-risk systems. Event-driven integration triggers focused reviews in response to external signals: a new CVE published against a library in the dependency tree, a new technique added to MITRE ATT&CK or MITRE ATLAS, or a security incident that reveals a threat category not previously considered.

The most effective programmes combine all three mechanisms with risk-based routing. Change-driven integration serves as the primary mechanism for catching architectural drift before it reaches production; a diff between the current system representation and the last threat-modelled state can be computed automatically and used to scope the review to only the changed components. Event-driven integration serves as the escalation path for externally-generated threat intelligence, keeping threat models current with the external threat landscape even when the internal architecture has not changed. Cadence-driven integration serves as the backstop audit that catches accumulated drift that change-driven integration missed (because not every architecturally significant change triggers a file-level diff in the expected locations). Organisations that implement only one of the three mechanisms, which is the majority, are systematically missing the threat categories surfaced by the other two.

7. The Commercial Landscape and Strategic Implications

7.1 Market Structure and the IriusRisk Acquisition

The threat modelling tools market was valued at approximately \$1.28 billion in 2025 and is forecast to reach \$2.55 billion by 2030, representing a 14.89% compound annual growth rate driven primarily by three forces: regulatory mandates that convert threat modelling from a voluntary best practice into a compliance requirement, DevSecOps adoption that moves security activities earlier and more continuously in the software development lifecycle, and the rapid proliferation of AI systems that introduce a threat surface no existing tool adequately models [16]. Cloud-based SaaS delivery held 67.82% of market share in 2024, growing at 15.67% CAGR, reflecting the shift from desktop-installed tools to continuously updated platforms integrated into cloud development workflows.

The most significant recent event in this market was ThreatModeler's acquisition of IriusRisk in January 2026 for over \$100 million, creating a combined entity with approximately \$50 million in annual recurring revenue [17]. ThreatModeler, originally bootstrapped by founder Archie Agarwal, received its first institutional investment in June 2024 when Invictus Growth Partners acquired a majority stake for \$60 million. IriusRisk, backed by Paladin Capital Group, had previously raised \$6.7 million in a Series A in 2020 and \$29 million in a Series B in 2022. The deal represents a consolidation bet: as regulatory requirements make threat modelling mandatory across more industry verticals, the TAM expands faster than the number of vendors capable of building enterprise-grade platform capability, and scale advantages in AI-assisted automation, compliance reporting, and enterprise integrations favour the larger platform. The combined entity faces the standard integration challenge of combining two customer bases, two codebases, and two go-to-market motions while maintaining product velocity against a backdrop of well-funded open-source alternatives.

Other notable commercial players structure their value propositions around distinct architectural choices. Security Compass SD Elements operates as an application security knowledge management platform that maps STRIDE and PASTA outputs to a library of developer-facing security requirements, bridging the gap between threat model outputs and developer backlog items. Foreseeti's securiCAD platform uses the MAL (Meta Attack Language) formal modelling language to run probabilistic attack simulations on infrastructure models, producing quantitative risk scores that go beyond the qualitative outputs of STRIDE-PE. Microsoft's continued influence through the SDL methodology and Azure Defender for DevOps, even after discontinuing the standalone Threat Modeling Tool, keeps STRIDE norms embedded in the largest enterprise development environment. The open-source ecosystem (OWASP Threat Dragon, stride-gpt, AWS Threat Composer, Threagile for cloud-native architectures) puts capable tooling in reach of organisations that cannot justify commercial platform spend, creating a bifurcated market: enterprises that need compliance audit trails, CI/CD integrations, and executive reporting dashboards pay for commercial platforms, while security-forward engineering teams use open-source tooling with custom automation built around their specific pipeline.

7.2 The AI Disruption to Threat Modelling: Two Distinct Problems

The intersection of AI and threat modelling operates at two levels that commercial vendors are conflating in their marketing, to the confusion of practitioners trying to evaluate tools. The first level is AI as a threat modelling subject: the security analysis of AI systems themselves, which requires new frameworks like MAESTRO and new threat taxonomies like MITRE ATLAS because existing frameworks do not cover prompt injection, capability accumulation, model extraction, or training data poisoning as first-class threat categories. The second level is AI as a threat modelling tool: using LLMs to automate or accelerate the threat modelling process for any system, implemented in products like stride-gpt, Auspex, and the generative AI capabilities now embedded in most commercial threat modelling platforms. Both are real and important, but they have different quality criteria, different evaluation methodologies, and different procurement implications, and treating them as the same phenomenon leads to tool selection decisions that fail at one level while appearing to succeed at the other.

At the first level, the field is still in early formation. MITRE ATLAS, the Adversarial Threat Landscape for Artificial-Intelligence Systems, provides a structured taxonomy of AI-specific attack techniques that parallels ATT&CK for traditional systems [18]. The AegisShield system from a September 2025 paper proposes integrating structured assumptions about AI system behaviour directly into a STRIDE-based model to handle threat categories like prompt injection and model inversion attacks [19]. Neither ATLAS nor AegisShield has achieved the adoption levels of STRIDE or ATT&CK, and the field lacks a consensus framework for AI threat modelling that is simultaneously rigorous enough for security professionals and operationally tractable enough for AI development teams. This gap represents both a real security risk (organisations are deploying AI systems with incomplete threat models) and a significant market opportunity for the first framework or tool that closes it.

At the second level, AI-assisted threat modelling faces a calibrated reliability problem. Research cited by Security Compass found that teams using LLM-generated threat drafts without expert review produced models with higher false-positive rates than manual STRIDE analysis, because the LLM generates threats from its training distribution (which over-represents common web application architectures) rather than threats specific to the actual system under analysis [9]. The practical implication for security product leaders is that AI-assisted threat modelling tools require a human-in-the-loop review step for any output that will be acted upon, and the quality of that review determines whether AI assistance adds value by accelerating an expert's workflow or adds noise by giving non-experts a plausible-looking threat list that is miscalibrated for their system.

7.3 Strategic Recommendations for Security Product Leaders

Practitioners and product leaders navigating this landscape should make four decisions explicitly rather than leaving them implicit. The first is framework selection: choose a primary framework based on regulatory context and system type, and use supplementary frameworks (LINDDUN for privacy obligations, MAESTRO for AI system deployments) where the primary framework has known blind spots. Do not attempt to use all frameworks simultaneously; the cognitive overhead defeats the purpose of having a structured methodology.

The second decision is representation format: decide whether DFDs, architecture decision records, or infrastructure-as-code are the authoritative source of truth for system representations, and automate the derivation of threat model inputs from that source rather than maintaining a separate diagram by hand. Manual DFD maintenance does not scale past a few dozen system components before staleness becomes the dominant quality problem. The third decision is integration depth: threat modelling should function as a pull request gate for high-risk architectural changes, not a quarterly ceremony for the entire engineering organisation. A risk-tiered approach that applies full PASTA-style analysis to payment and authentication systems, STRIDE-PE to all internet-facing services, and lightweight checklist review to internal tooling is more realistic than uniform deep coverage.

The fourth decision is metric selection: pick one or two scalar metrics to track over time, such as FAIR-based annualised loss expectation, threat entropy $H$ from Equation (6), or the ratio of open threats to resolved threats in the security backlog, and hold the security programme accountable to them. Without metrics, threat modelling collapses into a compliance activity with no feedback loop and no way to demonstrate that the programme is improving security posture rather than simply producing documentation. These four decisions are interdependent: the right representation format depends on the primary framework, and the right integration depth depends on the metrics being tracked, but making all four decisions explicitly is more important than making them in the optimal order.

7.4 Regulatory Drivers and the Mandatory Threat Modelling Future

The regulatory trajectory points clearly toward mandatory threat modelling as a baseline requirement across regulated sectors, and the commercial market is structuring itself around compliance demand rather than organic security adoption. NIST's Secure Software Development Framework (SSDF, SP 800-218) requires documented threat modelling for federal contractors developing software used in US government systems [20]. The EU's Digital Operational Resilience Act (DORA), effective January 2025, requires financial entities in the EU to conduct threat-led penetration testing informed by documented threat models, effectively making threat modelling a precondition for regulatory compliance across the EU financial sector. The FDA's cybersecurity guidance for medical devices requires a documented threat model as part of premarket submission for any device with network connectivity, applying to an industry segment that has historically been among the least mature in security practice.

The compliance framing creates the risk that the Bybit example at the opening of this article illustrates directly: organisations optimise for producing a threat model document rather than for the genuine security benefit that threat modelling is designed to produce. A threat model produced for a compliance audit and never consulted by engineering teams is worse than no threat model, because it creates false assurance and consumes security team capacity that could have been directed toward real controls. The commercial platforms in this market face a genuine product design problem: how do you make the living, continuously updated threat model (which is hard to produce and hard to maintain) as easy to produce as the one-time audit document (which is cheap and available from any boutique security consultancy)? The platforms that solve this problem, rather than those that generate the most comprehensive audit output, will capture disproportionate market value across the next five years as mandatory threat modelling requirements create a large but quality-undifferentiated demand base.

8. Conclusion and Open Problems

Threat modelling is three things simultaneously: a formal reasoning discipline with mathematical foundations in attack trees and risk algebra, an engineering practice operationalised through frameworks like STRIDE and PASTA, and an organisational process whose effectiveness depends entirely on integration with the software development workflows that produce the systems being analysed. The field's current state is characterised by a gap between the rigour that first-principles analysis demands and the operational realities that most engineering organisations can sustain. STRIDE's 25-year dominance reflects a genuine insight: it occupies a useful middle ground between rigour and accessibility, comprehensive enough to find real vulnerabilities and simple enough to apply without a security PhD. Its failure modes are well-understood (DFD staleness, absence of native prioritisation, fundamental inadequacy for agentic AI systems), and the emerging generation of AI-assisted tools addresses some of these failure modes while creating new reliability problems of its own.

The open problems in threat modelling are specific and tractable, and progress on any of them would have immediate commercial value. First, there is no reliable automated mechanism for detecting when a deployed system has diverged from its threat model and requires update, without requiring human judgment about whether a given code or infrastructure change is architecturally significant. Second, there is no consensus representation format for agentic AI systems that is both expressive enough to support STRIDE-style per-element analysis and practical enough for security teams without formal AI backgrounds to produce and maintain. Third, the field lacks a validated benchmark for threat model completeness and quality: without agreed ground truth, it is impossible to compare the output of manual STRIDE analysis against LLM-assisted drafting, Auspex-style tradecraft encoding, or any other automated approach in a controlled and reproducible way. Fourth, FAIR's quantitative framework is logically sound but practically limited by the absence of reliable empirical base rates for Threat Event Frequency across most industries and attack categories; the insurance actuarial model does not yet have a security equivalent, and most FAIR implementations rely on expert elicitation that is expensive, slow, and poorly calibrated.

Progress on these four problems requires a combination of academic research and industry collaboration that the current field structure does not strongly incentivise. Commercial vendors have the deployment data needed to produce empirical TEF estimates but lack incentives to share it. Academic research groups have the methodological rigour needed to produce validated benchmarks but lack access to production threat models at scale. The Threat Modeling Manifesto's value of "understanding the journey over security and privacy snapshots" points in the right direction: threat modelling must be treated as a continuous process for which data accumulates over time, not a periodic audit for which each engagement starts from zero. The organisations and tools that make this continuity practical, rather than aspirational, will define the next decade of the discipline.

References

[1] Trail of Bits, "How Threat Modeling Could Have Prevented the \$1.5B Bybit Hack," blog.trailofbits.com, February 2025.

[2] Threat Modeling Connect, "State of Threat Modeling 2024-2025," threatmodelingconnect.com, 2025.

[3] Cloud Security Alliance, "Agentic AI Threat Modeling Framework: MAESTRO," cloudsecurityalliance.org, February 2025.

[4] B. Schneier, "Attack Trees: Modeling Security Threats," Dr. Dobb's Journal, December 1999.

[5] The Open Group, "Factor Analysis of Information Risk (FAIR): Open FAIR Body of Knowledge," pubs.opengroup.org, 2013.

[6] FIRST, "Common Vulnerability Scoring System Version 4.0 Specification Document," first.org/cvss, 2023.

[7] L. Kohnfelder and P. Garg, "The Threats to Our Products," Microsoft internal paper, April 1999.

[8] A. Shostack, Threat Modeling: Designing for Security, Wiley, 2014.

[9] Security Compass, "Why Traditional Threat Modeling Fails and How to Get it Right," securitycompass.com, 2024.

[10] T. UcedaVelez and M. Morana, Risk Centric Threat Modeling: Process for Attack Simulation and Threat Analysis, Wiley, 2015.

[11] K. Wuyts, R. Scandariato, and W. Joosen, "LINDDUN: A Privacy Threat Analysis Framework," KU Leuven, in Privacy in Statistical Databases, Springer, 2014.

[12] OWASP Foundation, "OWASP Threat Dragon," github.com/OWASP/threat-dragon, 2024.

[13] Amazon Web Services, "AWS Threat Composer," github.com/awslabs/threat-composer, 2024.

[14] M. Adams, "STRIDE-GPT: AI-Powered Threat Modeling," github.com/mrwadams/stride-gpt, 2024.

[15] J. Sanchez Vicarte, M. Spoczynski, and M. Elsaid, "Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot," arxiv.org/abs/2503.09586, March 2025.

[16] Mordor Intelligence, "Threat Modeling Tools Market: Size, Share and Forecast 2025-2030," mordorintelligence.com, 2025.

[17] Fortune, "Invictus-backed ThreatModeler Acquires Competitor IriusRisk for Over \$100 Million," fortune.com, January 2026.

[18] MITRE, "MITRE ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems," atlas.mitre.org, 2024.

[19] AegisShield Research Team, "AegisShield: Democratizing Cyber Threat Modeling with Generative AI," arxiv.org/pdf/2509.10482, September 2025.

[20] NIST, "Secure Software Development Framework (SSDF), Special Publication 800-218," csrc.nist.gov, February 2022.

Word count: approximately 8,700 words.

Sanch Sharma