Skip to main content

The Adversary's Taxonomy: Mapping Threat Actor Capabilities

"Multiple groups within ATT&CK use the same techniques, and for this reason, it is not recommended to attribute activity solely based on the ATT&CK techniques used. Attribution to a group is a complex process involving all parts of the Diamond Model, not solely on techniques."

Blake E. Strom et al., MITRE ATT&CK: Design and Philosophy, March 2020

Abstract

Adversary taxonomy is the practice of classifying threat actors by capability, motivation, sponsorship, and behavioral signature so that defenders can prioritize investment and attribute incidents with defensible confidence. Despite two decades of community effort, the field remains operationally fragmented: vendors maintain incompatible naming schemes, attribution methodologies diverge, and the formal foundations for expressing confidence remain ad hoc. This article treats adversary taxonomy as an engineering problem with tractable, formal solutions rather than as a catalogue exercise delegated to human analysts. We survey the dominant classification frameworks (MITRE ATT&CK, the Diamond Model of Intrusion Analysis, and STIX 2.1), derive formal models for capability representation and attribution confidence, analyze the tooling ecosystem centered on OpenCTI and MISP, implement a working Python actor-profiling system using IDF-weighted cosine similarity over the ATT&CK technique space, and close with a product and market analysis of the commercial threat intelligence platform landscape. The core argument is that the field has adequate conceptual vocabulary but lacks calibrated probability methodology: the difference between "moderate confidence" and "high confidence" in a published attribution report currently means whatever the analyst writing it wants it to mean, and closing that gap is the most consequential open problem in applied threat intelligence.


1. Introduction

"We are announcing a new strategic collaboration to bring clarity to threat actor naming, providing security professionals with faster insights and greater confidence in their decision-making. We recognize that customers benefit most when organizations like ours (who often independently investigate and attribute the same threat actors) can provide consistent, actionable information."

Microsoft Security Blog, June 2, 2025 [1]

The paragraph above, co-signed by Microsoft, CrowdStrike, Palo Alto Networks, and Google Mandiant, tells you almost everything you need to know about the state of adversary taxonomy in 2025. The four largest threat intelligence organizations in the commercial world found it necessary to announce a formal coordination effort just to achieve consistency in what they call the same threat actors, a decade after the inconsistency was first widely acknowledged as a professional problem. The practical consequence of this fragmentation is that the same intrusion cluster has gone by at least four major names: APT29 (the Mandiant-originated designator inherited from their APT numbering scheme), Cozy Bear (CrowdStrike's animal-themed label for Russian actors), UNC2452 (Mandiant's interim label for the SolarWinds intrusion cluster before full attribution), and Midnight Blizzard (Microsoft's weather-themed designation after their 2023 naming reorganization). A security operations center that ingests threat intelligence from multiple vendors and does not maintain an alias resolution table will encounter these names in different playbooks, different detection rules, and different vendor advisories without a machine-readable way to know they all refer to the same adversary.

This naming fragmentation is a symptom, not the disease. The underlying disease is that adversary taxonomy in practice lacks the formal foundations that would enable consistent representation, probabilistic attribution, and capability assessment at scale. Names are just the surface expression of a deeper problem: when two organizations describe the same threat actor, they may be representing different subsets of that actor's observed activities (because their telemetry differs), using different technique classification schemes (even within ATT&CK, which technique is applied to which observed behavior is a judgment call), and operating with different prior probabilities about which nation-state is likely to be the sponsor (because their access to intelligence beyond open-source reporting differs). A shared naming convention makes the outputs look consistent; it does not make the underlying analytical process consistent. The 2025 collaboration is the right first step, but the second step, which the industry has not yet taken, is a shared formal methodology for converting behavioral evidence into calibrated attribution confidence.

This article develops that formal methodology. Section 2 establishes the conceptual foundations of adversary classification and the critical distinctions between naming, clustering, and attribution. Section 3 derives formal models for capability representation, actor similarity, and capability tier assignment. Section 4 examines the STIX 2.1 and TAXII data standards and where they fall short of what a rigorous taxonomy requires. Section 5 describes the attribution kill chain in practice, including the false flag problem and the aliasing problem at scale. Section 6 presents the open-source tooling ecosystem with a working Python implementation. Section 7 analyzes the commercial platform landscape, open problems, and practitioner recommendations. The target reader is a senior security engineer, threat intelligence analyst, or security product leader who wants to reason rigorously about the foundations of adversary taxonomy rather than consume vendor summaries of it.


2. Foundations of Adversary Classification

2.1 What a Taxonomy Must Support

A useful adversary taxonomy is not a list of code names. It is a structured classification system that enables four distinct operations: attribution (mapping observed behaviors to a known actor with associated confidence), prediction (extrapolating likely future behaviors from documented past patterns), prioritization (ranking adversary threats for defensive resource allocation), and collaboration (allowing multiple organizations to exchange intelligence about the same actor without ambiguity). The Mandiant APT1 report of 2013, which brought threat actor classification into mainstream security consciousness, optimized heavily for attribution and had acceptable human readability, but its formal properties for the other three operations were weak [2]. A code name like "APT1" functions as a primary key into a database of associated techniques, infrastructure, and targeting patterns, and the analytical value of the taxonomy scales directly with the quality and completeness of what that key retrieves. Without a rigorous ontology backing the naming scheme, the database becomes internally inconsistent as different analysts make different judgments about what belongs to which actor.

The Diamond Model of Intrusion Analysis, introduced by Caltagirone, Pendergast, and Betz in their 2013 technical report, was the first attempt to place threat actor analysis on a relational footing [3]. The model defines four core features of any intrusion event: the adversary (the operator or director behind the attack), the capability they employ (their tools, exploits, and procedures), the infrastructure through which they act (IP addresses, domains, C2 servers), and the victim against whom the activity is directed (the target organization, specific system, or data). Every intrusion event instantiates a relational tuple $(a, c, i, v)$, and analysts trace relationships between components across multiple events to construct activity threads. The Diamond Model's most important analytical insight is that these four elements are mutually constraining: if you know who the victim is, you have information about which adversaries are likely to target them; if you know what infrastructure is used, you can search for other victims and other capabilities connected to it. This relational structure makes the model significantly more powerful than a simple attribute list.

MITRE ATT&CK, first published in 2015 following Strom and colleagues' work cataloguing post-exploitation behaviors observed in real network monitoring data, addressed the capability dimension of the Diamond Model by building a comprehensive knowledge base organized into matrices of adversary tactics, techniques, and sub-techniques [4]. Version 18.1, released in December 2025, documents more than 600 techniques and sub-techniques across matrices covering Enterprise, Mobile, Cloud, Containers, and ICS environments, alongside more than 100 named threat actor groups with technique-to-group associations derived from public reporting. The framework's practical impact has been enormous: most commercial security products now express detection coverage in ATT&CK terms, making it the de facto vocabulary for adversary behavior description across the industry. The framework was explicitly designed as a knowledge base for defenders, not as a classification system for adversaries, which is why it does not provide a formal algorithm for assigning actors to tiers or categories and why it explicitly warns against using technique overlap as a sole attribution signal.

STIX 2.1 (Structured Threat Information eXpression), the OASIS standard stabilized in 2021, provides the data model that spans both the Diamond Model and ATT&CK-style analysis [5]. STIX defines a rich collection of domain objects including threat-actor, intrusion-set, campaign, attack-pattern, malware, tool, and indicator, connected by relationship objects that capture the semantic links between them. STIX was designed to be comprehensive enough to represent any threat intelligence scenario while being flexible enough to support incremental, partial data. The standard is the closest thing the field has to a shared formal language, and its adoption across both commercial platforms and open-source tools (OpenCTI, MISP) means that data modeled in STIX can in principle be exchanged, merged, and compared across organizational boundaries. In practice, the controlled vocabularies that STIX provides for key fields like actor sophistication and motivation are not anchored to observable criteria, which means that different analysts applying the same vocabulary to the same actor routinely produce inconsistent outputs.

2.2 Motivation and Sponsorship as Classification Axes

The most common primary classification of threat actors in industry practice divides adversaries by motivation: nation-state actors pursuing espionage, sabotage, or influence objectives on behalf of a government; financially motivated criminals pursuing profit through ransomware, fraud, or theft; hacktivists pursuing ideological objectives; and insider threats pursuing personal goals from within the target organization. This four-way split provides a useful first approximation for initial triage but collapses important distinctions within each category. Nation-state actors vary by roughly three orders of magnitude in technical capability: a small Eastern European intelligence service and a Tier 5 capability unit within a major power's signals intelligence apparatus are both "nation-state actors" but operate with entirely different funding, infrastructure, zero-day development capability, and operational security discipline. Similarly, the ransomware-as-a-service ecosystem has produced a layered supply chain in which malware developers, infrastructure brokers, access brokers, and affiliate operators all qualify as "financially motivated criminals" while playing completely different roles in the intrusion kill chain and therefore requiring different defensive countermeasures.

Recorded Future's threat actor taxonomy, documented in a white paper by its Insikt Group, takes a more granular approach by using national flag colors combined with NATO phonetic alphabet code words to create mnemonic identifiers for actors from high-priority countries, while reserving generic cluster identifiers for unattributed activity [6]. The color-code approach encodes nation-state affiliation into the primary identifier in a way that is immediately legible to analysts, while the NATO phonetic alphabet provides a large, memorable namespace. The principal drawback is that the approach requires a nation-state attribution judgment at naming time, which means the identifier must change if the attribution changes. This is the same structural instability that affects ATT&CK's group naming policy: when Mandiant reclassified the SolarWinds cluster from UNC2452 to APT29, every downstream product, playbook, and detection rule that referenced UNC2452 required updating. Encoding uncertain attribution claims into identifiers creates technical debt that grows with every naming decision.

CrowdStrike's adversary taxonomy organizes actors primarily by a combination of nation-state sponsorship and criminal motivation using an animal-themed naming scheme: bears for Russia, pandas for China, kittens and chollimas for Iran and North Korea respectively, spiders for cybercriminal groups, and jackals for hacktivist groups [7]. The animal naming is highly memorable and has become well understood among practitioners, which is a genuine usability achievement. The scheme does not, however, formally represent capability levels within a nation-state category: knowing that an actor is designated a Russian bear tells an analyst something about likely targeting objectives but nothing about expected technical sophistication, available toolset, or operational security practices, all of which are critical for proportionate defensive response planning. By the 2026 Global Threat Report, CrowdStrike was tracking more than 280 named adversaries under this scheme, a number large enough that the animal metaphors no longer carry unique discriminating information and the namespace has necessarily expanded into multi-word compound identifiers.


3. Formal Models for Capability Representation

3.1 Capability Profiles and Actor Similarity

To reason precisely about adversary capabilities, we need a formal representation that admits quantitative comparison across actors and over time. Let $\mathcal{T}$ be the complete set of known techniques in a knowledge base such as ATT&CK version 18.1. A capability profile of a threat actor $a$ is a binary vector $\mathbf{c}_a \in \{0, 1\}^{|\mathcal{T}|}$ where entry $c_{a,i} = 1$ if actor $a$ has been observed using technique $t_i$ in confirmed incident reporting, and $c_{a,i} = 0$ otherwise. The raw Jaccard similarity between two actor capability profiles measures the proportion of techniques shared relative to techniques used by either actor:

$$J(a, b) = \frac{|\mathbf{c}_a \cap \mathbf{c}_b|}{|\mathbf{c}_a \cup \mathbf{c}_b|} = \frac{\sum_{i} c_{a,i} \cdot c_{b,i}}{\sum_{i} \max(c_{a,i}, c_{b,i})} \tag{1}$$

This metric is useful for initial actor clustering and for flagging when two tracked intrusion sets share suspiciously high technique overlap. However, MITRE ATT&CK's own documentation explicitly cautions against relying on technique overlap for attribution because many techniques are used by dozens of distinct actors: both a Russian APT and a North Korean group might use spear-phishing and PowerShell, and the raw Jaccard similarity of their profiles would be misleadingly high [4]. The problem is that Equation (1) treats every technique equally, when the correct analytical weight for any technique is inversely proportional to how commonly it appears across the actor corpus.

A more discriminating approach weights techniques by their rarity in the actor corpus, analogous to inverse document frequency (IDF) weighting in information retrieval. Let $f_i$ be the fraction of all tracked actors observed using technique $t_i$, and let $|\mathcal{A}|$ denote the total number of tracked actors. The IDF weight for technique $i$ is:

$$w_i = \log \frac{|\mathcal{A}|}{1 + |\{a \in \mathcal{A} : c_{a,i} = 1\}|} \tag{2}$$

Using these weights, the IDF-weighted cosine similarity between two actor profiles gives substantially higher attribution value to rare techniques than to common ones:

$$\text{sim}_w(a, b) = \frac{\sum_i w_i \cdot c_{a,i} \cdot c_{b,i}}{\sqrt{\sum_i w_i \cdot c_{a,i}^2} \cdot \sqrt{\sum_i w_i \cdot c_{b,i}^2}} \tag{3}$$

A custom firmware implant used by only two tracked actors in the entire ATT&CK corpus carries far more similarity weight than both actors' use of cmd.exe for command execution, which is observed in hundreds of actor profiles. Practically, this means that when two activity clusters share a rare custom tool or a highly specific exploitation procedure that has been documented for only a handful of actors, the weighted similarity score flags that relationship with high confidence even if the rest of each cluster's profile is composed of commodity techniques that are common across the adversary landscape.

3.2 Bayesian Attribution

The attribution problem is, at its formal core, a problem of inference under uncertainty: given a body of evidence, what is the probability that a given actor is responsible for the observed intrusion? Let $H_a$ denote the hypothesis that actor $a$ conducted the intrusion, and let $E$ denote the totality of observed evidence including technical indicators, behavioral patterns, and contextual intelligence. Bayesian attribution applies standard conditional probability to update prior beliefs about actor identity in light of observed evidence:

$$P(H_a \mid E) = \frac{P(E \mid H_a) \cdot P(H_a)}{P(E)} = \frac{P(E \mid H_a) \cdot P(H_a)}{\sum_{b \in \mathcal{A}} P(E \mid H_b) \cdot P(H_b)} \tag{4}$$

The prior $P(H_a)$ encodes baseline knowledge about how likely actor $a$ is to target this specific sector, geography, and organization type at this moment in time, informed by historical targeting data and current geopolitical context. The likelihood $P(E \mid H_a)$ encodes how probable the full evidence set is under the assumption that $a$ conducted the attack, which requires probability distributions over the techniques, tools, infrastructure, and timing patterns documented for actor $a$ in the knowledge base. The marginal $P(E)$ in the denominator is computed by summing over all candidate actors, which normalizes the posterior into a proper probability distribution. The fundamental operational weakness of current attribution practice is that most practitioners reason through this equation implicitly and qualitatively, producing outputs like "moderate confidence" or "high confidence likely Russia" without specifying the numerical probability range those hedges correspond to, how the prior was set, or how the likelihood was computed, making independent verification essentially impossible.

The evidence set $E$ decomposes naturally into three types with different epistemological properties. Technical evidence $E_T$ includes indicators such as IP addresses, domain names, file hashes, and YARA-matchable code patterns. Technical evidence is cheap to collect and easy to automate but is also the most susceptible to adversary manipulation: infrastructure can be spoofed, malware can be copied and reused, and legitimate services can be abused such that their use leaves ambiguous indicators. Behavioral evidence $E_B$ includes observed TTPs: which techniques were used, in what sequence, with what tools, and against what target types. Behavioral evidence is harder to manipulate because changing TTPs requires retraining operators and redeveloping tooling, but it suffers from the technique overlap problem discussed in Section 3.1. Contextual evidence $E_C$ includes geopolitical intelligence, confirmed targeting patterns, timing relative to geopolitical events, signals intelligence, and human intelligence reporting. Contextual evidence typically provides the highest likelihood ratio in practice, because the probability that any given technically sophisticated nation-state actor would simultaneously target the victim's specific sector, at this timing, with this operational security posture, is much lower than the probability that they would use any particular tool or technique. Yet contextual evidence is the least formally expressed in public attribution frameworks, because it is often classified or sensitive.

3.3 Capability Tier Modeling

The binary capability profile in Section 3.1 captures what an actor has been observed doing but does not express the class of capability an actor possesses, which is often more strategically relevant than the specific technique inventory. A capability tier model assigns actors to ordered levels based on the sophistication of the highest-capability technique they have demonstrated in practice. Define the sophistication score of a technique as the complement of its prevalence across the actor corpus:

$$s(t_i) = 1 - \frac{|\{a \in \mathcal{A} : c_{a,i} = 1\}|}{|\mathcal{A}|} \tag{5}$$

This assigns high scores to techniques observed in few actors (custom silicon implants, zero-day exploitation chains against hardened targets, supply chain compromise of signing infrastructure) and low scores to commodity techniques (public exploit kit usage, phishing, credential stuffing) that appear across broad swaths of the actor landscape. The capability tier of an actor is then the maximum sophistication score across their observed profile:

$$\tau(a) = \max_{i : c_{a,i} = 1} s(t_i) \tag{6}$$

This formulation captures the asymmetry that characterizes real adversary behavior: a Tier 5 actor (a nation-state unit with zero-day development capability and custom implant development) may and frequently does operate at Tier 1 techniques during collection operations, because using commodity tooling is better operational security than exposing their most sophisticated capabilities. Observing only Tier 1 techniques from an intrusion cluster therefore does not place the responsible actor in Tier 1; the correct question for defenders is not "what is the minimum capability I have observed?" but "what is the maximum capability I should defend against given everything else I know about this actor?" The capability tier model operationalizes this asymmetry by treating the tier assignment as a property of the actor, not of any individual campaign.


4. Standards and Data Models

4.1 STIX 2.1: The Lingua Franca of Threat Intelligence

The STIX 2.1 specification, published by OASIS in 2021 and now the most widely adopted formal standard in the threat intelligence community, defines 18 domain objects (SDOs) and two relationship object types (SROs) for representing all aspects of a threat intelligence scenario [5]. The threat-actor SDO carries the following analytically relevant properties: name (string), aliases (list of strings), threat_actor_types (controlled vocabulary: nation-state, crime-syndicate, activist, terrorist, insider-threat, and others), roles (controlled vocabulary: agent, director, malware-author, sponsor), goals (free-text list), sophistication (controlled vocabulary: none, minimal, intermediate, advanced, expert, innovator), resource_level (controlled vocabulary from individual to government), and primary_motivation (controlled vocabulary including coercion, dominance, ideology, notoriety, organizational-gain, personal-gain). The sophistication vocabulary maps loosely to the capability tier model from Section 3.3, but the specification provides no operational definition of what "advanced" versus "expert" means in terms of observable technique categories, leaving assignment to analyst judgment with no formal anchor.

The distinction in STIX 2.1 between threat-actor and intrusion-set is one of the most analytically important and most commonly collapsed distinctions in the field [8]. A threat-actor is a person or organization believed to exist: asserting a threat-actor object is an attribution claim, stating that there is a specific human or organizational entity responsible for the associated activity. An intrusion-set is a grouped collection of adversarial behaviors and resources believed to represent a persistent effort, possibly by an unknown actor: asserting an intrusion-set object is a behavioral clustering claim, which carries no attribution requirement. The standard analytical workflow is to create an intrusion-set from observed behavioral clusters and to assert an attributed-to relationship connecting the intrusion set to a threat-actor only when attribution confidence justifies the claim. Most practitioners collapse this distinction when ingesting external intelligence feeds, treating "APT29" as both the cluster label and the actor label simultaneously, which conflates analytical confidence levels and loses information about what was directly observed versus what was inferred.

The controlled vocabularies in STIX 2.1 represent a genuine contribution to standardization, providing a shared namespace for expressing actor type, role, motivation, and sophistication. Their practical limitation is that they express categorical distinctions without defining the observational criteria that place an actor in one category rather than another. An intelligence analyst at one organization and an analyst at another organization looking at the same documented actor might produce different sophistication values because the STIX specification does not define which combination of observed techniques maps to "expert" as opposed to "advanced." This is not a problem that STIX can solve by specification alone: the specification is a data interchange format, not an analytical methodology. The gap is between the format and the procedure for populating it consistently, and closing that gap requires the kind of explicit procedural standards that the industry has not yet agreed upon.

TAXII 2.1 (Trusted Automated eXchange of Indicator Information), the companion transport standard, defines an HTTP-based API for discovering, accessing, and publishing collections of STIX objects [9]. The TAXII model centers on API roots (base URLs for TAXII servers), collections (named, addressable sets of STIX objects), and channels (push-based distribution endpoints). TAXII solves the distribution problem: an organization with a TAXII server can make its threat intelligence available to authorized clients in a machine-readable format that interoperates with any STIX-aware tool. What TAXII does not solve is the provenance and methodology transparency problem: when a client receives a STIX threat-actor object from a TAXII feed, the object contains no machine-readable metadata about the analytical methodology behind the attribution, the confidence calibration scale used, or how to compare the producing organization's "high confidence" assessment against another organization's "moderate confidence" assessment of the same actor. The result is that intelligence consumers who ingest from multiple TAXII feeds face the same analytical incompatibility problem at the data layer that the 2025 naming standardization effort was attempting to address at the naming layer.

4.2 The ATT&CK Data Model and Its Limits

MITRE ATT&CK's data is publicly accessible through the ATT&CK STIX repository, which provides the full knowledge base as STIX 2.0 bundles, and through the mitreattack-python library that wraps the STIX data in a Pythonic API. The framework maps groups (threat actors) to techniques through STIX uses SROs, with each association backed by one or more citations to public incident reports. The ATT&CK team models the evidentiary basis for each group-to-technique association carefully, citing the specific public reporting that documents the observed use. However, the framework does not attempt to express the probability that any given association is correctly attributed: a technique association backed by a single unverified vendor blog post appears in the data identically to one backed by five independent incident response reports from different organizations [4].

The ATT&CK Navigator, the browser-based visualization tool that has become the standard way to display and communicate ATT&CK data, allows analysts to construct layer files that highlight which techniques a given actor uses, compare coverage across actors, and compute detection coverage metrics against a given actor's profile. The Navigator layer format has become a de facto interchange format for ATT&CK-based intelligence, with commercial and open-source tools generating Navigator layers as output and accepting them as input. One significant limitation for capability assessment is that the Navigator treats all technique associations as equally evidenced at the visual level: a confirmed technique association and a speculative one appear identically in the heat map, which gives consumers no signal about the reliability of the data they are acting on. Extending the Navigator layer format to include evidence-strength weights, represented as a third dimension in the layer color scheme, would require only minor changes to the JSON schema but would substantially improve the actionability of actor profiles for defenders planning detection coverage investments.


5. Attribution in Practice: From Indicators to Actor Profiles

5.1 The Attribution Kill Chain

In practice, attributing an observed intrusion to a known threat actor follows a sequential analytical process that benefits from being treated as a formal workflow. The first step is technical clustering: aggregating raw indicators (IP addresses, domains, file hashes, code patterns, certificate fingerprints) from the intrusion into candidate intrusion sets using automated correlation tools. At this step, the analyst is not yet making attribution claims; they are grouping observations that appear to share a common origin based on technical similarity and temporal co-occurrence. The second step is behavioral profiling: mapping the clustered activity to ATT&CK techniques and comparing the resulting weighted capability profile against known actor profiles using the similarity metrics from Section 3.1. High weighted similarity to a well-documented actor is a strong analytical signal; high raw Jaccard similarity is a weak one for the reasons described in Section 3.1. The third step is infrastructure pivoting: tracing the adversary's command-and-control infrastructure through passive DNS records, certificate transparency logs, WHOIS history, and hosting provider data to identify registration patterns and infrastructure reuse that connect the current cluster to previously documented activity threads.

The fourth step is contextual fusion: integrating technical and behavioral analysis with geopolitical intelligence, historical targeting patterns, operational timing relative to geopolitical events, and any corroborated signals from other intelligence disciplines. This is the step where Bayesian reasoning as described in Section 3.2 is most relevant and is currently least formally practiced. Contextual fusion requires access to information that is often classified, proprietary, or simply not available to smaller organizations, which creates a systematic gap in attribution quality between organizations with deep intelligence access and those dependent entirely on open-source reporting. The fifth step is confidence calibration: expressing the attribution result as a probability estimate or structured confidence level against a defined scale, documenting the key evidence, and noting the assumptions and alternative hypotheses that were considered and rejected. Most publicly available attribution reports skip or compress this step, which is why public attribution assessments are so difficult to compare and so susceptible to the "take my word for it" criticism that has followed the field since at least the Mandiant APT1 report.

Most attribution failures occur in steps four and five, not in the earlier technical steps. Technical clustering and behavioral profiling are increasingly well-supported by automated tooling and produce reliable results for well-documented actor groups whose technique profiles are distinctive. The difficulty emerges when the same technical cluster and behavioral profile are consistent with multiple actor hypotheses, particularly when adversaries deliberately employ deception through infrastructure sharing or false flag techniques. A published case study that illustrates this clearly is the VPNFilter campaign of 2018, subsequently attributed to Sandworm (a unit of Russian military intelligence), where the initial technical analysis was consistent with multiple nation-state actors and early public reports pointed in conflicting directions. Confident attribution became possible only after contextual intelligence, including law enforcement and intelligence community reporting, was incorporated, demonstrating that the technical analysis was necessary but not sufficient to close the attribution question [11].

5.2 False Flags and the Manipulation of Attribution

The possibility that sophisticated adversaries deliberately manipulate the attribution process introduces a second-order problem that formal taxonomy must be equipped to handle. An actor who understands that defenders use YARA rules, ATT&CK technique profiles, and infrastructure graphs for attribution can seed false indicators into their operations to redirect attribution toward a different actor, gaining the operational benefit of the attack while imposing the geopolitical and reputational costs on the framed actor. The Olympic Destroyer malware, deployed to disrupt the 2018 Winter Olympics opening ceremony in Pyeongchang, contained meticulously constructed false flag elements: code strings and API call sequences that closely resembled tools associated with Lazarus Group (North Korea), the Sofacy group (Russia), and Chinese state actors [11]. Kaspersky researchers who analyzed the malware in detail found that the false flag elements were carefully designed to appear highly significant in automated first-pass analysis but to break down under manual inspection of the malware's actual execution logic and code quality.

The Bayesian framework from Section 3.2 handles false flags naturally once the possibility is made explicit. When evidence $E$ contains deliberately fabricated elements, the likelihood $P(E \mid H_{a'})$ for the intended false-flag actor $a'$ will be artificially inflated by the planted indicators. An analyst who does not account for manipulation probability will accordingly overweight $H_{a'}$, which is exactly the adversary's goal. The correct Bayesian treatment includes an explicit prior probability on the event that evidence has been partially fabricated, and updates all actor likelihoods accordingly:

$$P(E_{\text{manip}}) = \sum_{a' \in \mathcal{A}} P(E_{\text{manip}} \mid H_{a'}) \cdot P(H_{a'}) \tag{7}$$

where $E_{\text{manip}}$ is the event that some portion of the observed evidence was deliberately planted. In practice, this term is almost never explicitly computed in published attribution assessments, because doing so requires estimating how motivated and capable the responsible actor is to conduct deception operations, which requires a prior judgment about actor identity. This circularity is real but not paralyzing: analysts routinely reason about it informally by noting when the evidence is "too clean" or when the apparent attribution points to an actor that seems unlikely given contextual signals. Formalizing this reasoning would at minimum force analysts to state their manipulation priors explicitly, making their published confidence assessments far more interpretable.

5.3 The Aliasing Problem at Scale

The vendor aliasing problem, while familiar qualitatively to every threat intelligence practitioner, becomes a tractable engineering problem when expressed at scale. With more than 280 named adversaries tracked by CrowdStrike alone [7] and more than 390 by Mandiant [17], each maintained by different organizations under different naming schemes, the total number of names in circulation for the same actors is large enough that manual alias table maintenance is operationally burdensome. The MITRE ATT&CK groups database partially addresses this by maintaining an aliases field for each group entry, listing the names assigned to the same group by other major vendors. As of version 18.1, ATT&CK tracks more than 100 groups with an average of approximately 3.5 aliases each, implying more than 350 distinct names in active circulation for what the knowledge base treats as approximately 100 distinct entities.

The June 2025 collaboration among Microsoft, CrowdStrike, Palo Alto Networks, and Google Mandiant targets this problem by creating a shared mapping rather than requiring naming scheme convergence [1]. This is the correct architectural approach: requiring every vendor to abandon their existing naming schemes would impose enormous documentation, tooling, and customer communication costs while generating resentment without proportionate operational benefit. A shared alias resolution service, functioning conceptually like DNS for threat actor names, would allow each vendor to continue using their own identifiers while providing machine-readable cross-references to canonical STIX threat-actor UUIDs that all vendors agree represent the same entity. The MISP threat-actor-intelligence-server project on GitHub already implements a lightweight version of this concept as a REST API for threat actor lookup by name, synonym, or UUID [16], demonstrating that the technical implementation is straightforward. The barrier is not technical but organizational: agreeing on canonical identities when vendors have different attribution confidence levels for the same clusters requires them to surface and negotiate analytical disagreements that they currently manage by simply using different names.


6. Tooling, Code, and Operational Practice

6.1 The Open-Source Ecosystem

The open-source threat intelligence platform ecosystem has converged on two primary tools that address the problem from complementary angles: MISP (Malware Information Sharing Platform) and OpenCTI. MISP was originally developed at CIRCL (Computer Incident Response Center Luxembourg) as a tool for rapid sharing of indicators of compromise between trusted organizations [13]. Its data model is optimized for IOC management: events group related indicators, attributes represent individual IOCs (IP addresses, domains, hashes, URLs), and the MISP Galaxy mechanism provides a community-maintained structured vocabulary for threat actor classification. MISP's strengths are its network of established sharing communities, its mature API, and its deep integration with detection tooling through export formats for Snort, Suricata, YARA, and others. Its limitation for actor taxonomy purposes is that the event-centered data model makes it less natural to express long-running actor profiles with evolving capability sets over time.

OpenCTI was designed specifically for intelligence management at the level of relationships and context rather than indicator exchange [14]. Its storage backend combines ElasticSearch for full-text search with a knowledge graph layer for relationship modeling, and its native data model is STIX 2.1, meaning that every entity in OpenCTI maps directly to a STIX SDO or SRO. The graph model is analytically significant: queries like "which threat actors have used this malware family in campaigns targeting the financial sector in the last six months" are expressible as graph traversals and execute efficiently on the knowledge graph backend, whereas the same query in a relational database or an event-centric system like MISP requires complex multi-table joins that become slower as the data volume grows. OpenCTI's connector ecosystem supports automated ingestion from Mandiant, CrowdStrike, Recorded Future, AlienVault, VirusTotal, Shodan, TheHive, and more than fifteen other commercial and open-source sources, with an ATT&CK connector keeping the local technique and group database synchronized with MITRE's upstream releases.

"When we moved our CTI infrastructure to OpenCTI, the single biggest change was that we stopped losing context. In SIEM-based correlation, you see events. In OpenCTI, you see the graph of relationships behind the events. Those are different cognitive experiences, and the second one is much more useful for tracking evolving actor behavior."

Cosive, "MISP vs. OpenCTI: Updated 2025 Guide," Cosive Blog, 2025 [15]

The following Python implementation demonstrates programmatic actor capability profiling and IDF-weighted similarity scoring using the mitreattack-python library. This code instantiates the formal model from Section 3.1 against the live ATT&CK dataset:

import math
from mitreattack.stix20 import MitreAttackData


def load_actor_profiles(attack_data: MitreAttackData) -> dict[str, set[str]]:
    """
    Build a dict mapping each group name to the set of ATT&CK technique
    IDs that group has been observed using, per the ATT&CK STIX bundle.
    """
    profiles: dict[str, set[str]] = {}
    groups = attack_data.get_groups()
    for group in groups:
        name = group["name"]
        tech_objects = attack_data.get_techniques_used_by_group(group["id"])
        profiles[name] = {
            t["object"]["external_references"][0]["external_id"]
            for t in tech_objects
        }
    return profiles


def compute_idf_weights(profiles: dict[str, set[str]]) -> dict[str, float]:
    """
    Compute IDF weight per technique. Rare techniques (few actors use them)
    receive higher weight, making them more discriminating for attribution.
    Implements Equation (2) from the article.
    """
    all_techniques: set[str] = set()
    for tech_set in profiles.values():
        all_techniques.update(tech_set)

    n_actors = len(profiles)
    weights: dict[str, float] = {}
    for tech in all_techniques:
        actor_frequency = sum(1 for s in profiles.values() if tech in s)
        # log(N / (1 + df)) avoids division by zero and adds smoothing
        weights[tech] = math.log(n_actors / (1 + actor_frequency))
    return weights


def weighted_cosine_similarity(
    profile_a: set[str],
    profile_b: set[str],
    weights: dict[str, float],
) -> float:
    """
    IDF-weighted cosine similarity between two actor technique profiles.
    Implements Equation (3) from the article.
    Returns a float in [0.0, 1.0]; 1.0 means identical weighted profiles.
    """
    dot = sum(weights.get(t, 0.0) for t in profile_a & profile_b)
    mag_a = math.sqrt(sum(weights.get(t, 0.0) ** 2 for t in profile_a))
    mag_b = math.sqrt(sum(weights.get(t, 0.0) ** 2 for t in profile_b))
    if mag_a == 0.0 or mag_b == 0.0:
        return 0.0
    return dot / (mag_a * mag_b)


def compute_capability_tier(
    profile: set[str],
    weights: dict[str, float],
    n_actors: int,
) -> float:
    """
    Capability tier of an actor: max sophistication score across their profile.
    Implements Equations (5) and (6) from the article.
    Sophistication = 1 - (actor_frequency / n_actors), so rare technique = high score.
    """
    if not profile:
        return 0.0
    # Recover approximate actor_frequency from IDF weight: w = log(N/(1+df))
    # => df = N/exp(w) - 1
    sophistication_scores = []
    for tech in profile:
        w = weights.get(tech, 0.0)
        df_approx = n_actors / math.exp(w) - 1 if w > 0 else n_actors
        score = 1.0 - (df_approx / n_actors)
        sophistication_scores.append(score)
    return max(sophistication_scores)


def rank_similar_actors(
    target: str,
    profiles: dict[str, set[str]],
    weights: dict[str, float],
    top_k: int = 10,
) -> list[tuple[str, float]]:
    """
    Return the top-k actors most similar to the target by IDF-weighted
    cosine similarity, sorted in descending order.
    """
    if target not in profiles:
        raise ValueError(f"Actor '{target}' not found in ATT&CK dataset.")
    target_profile = profiles[target]
    scores = [
        (actor, weighted_cosine_similarity(target_profile, profile, weights))
        for actor, profile in profiles.items()
        if actor != target
    ]
    scores.sort(key=lambda x: x[1], reverse=True)
    return scores[:top_k]


if __name__ == "__main__":
    # Requires: pip install mitreattack-python
    # Downloads enterprise-attack.json from MITRE on first run.
    attack = MitreAttackData("enterprise-attack.json")
    profiles = load_actor_profiles(attack)
    weights = compute_idf_weights(profiles)
    n = len(profiles)

    target_actor = "APT29"
    similar = rank_similar_actors(target_actor, profiles, weights, top_k=5)
    tier = compute_capability_tier(profiles[target_actor], weights, n)

    print(f"Capability tier score for {target_actor}: {tier:.4f}")
    print(f"\nActors most similar to {target_actor} (IDF-weighted cosine):")
    for rank, (actor, score) in enumerate(similar, 1):
        actor_tier = compute_capability_tier(profiles[actor], weights, n)
        print(f"  {rank}. {actor:<30} similarity={score:.4f}  tier={actor_tier:.4f}")

Running this code against the ATT&CK 18.1 dataset for APT29 surfaces other sophisticated espionage-focused actors with high weighted similarity on rare techniques (custom implant families, living-off-the-land procedures that are statistically unusual in the corpus) while correctly down-weighting actors that superficially resemble APT29 through shared use of commodity techniques. The capability tier output provides a quantitative complement to the qualitative sophistication label in STIX, anchored to the empirical distribution of technique use across the full actor corpus.

6.2 CVE-to-Actor Mapping and the Exploitation Tier Problem

One of the most operationally valuable but analytically underexplored subproblems in threat actor taxonomy is the mapping of specific CVEs to the threat actors known to exploit them. Qualys launched TruLens in 2026 to address this directly, providing automated attribution of CVEs to threat actors and covering approximately 700 threat actor groups with more than 6,800 actor-to-CVE relationships organized across 39 industry verticals [15]. The practical operational value is significant: a defender who knows that a specific CVE in their environment is associated primarily with financially motivated ransomware affiliates can prioritize patching differently from a defender who knows the same CVE is associated with nation-state espionage actors targeting their specific industry. The combination of actor-to-CVE mapping and capability tier scoring creates a two-dimensional prioritization matrix that is more actionable than either dimension alone.

The analytical gap in current CVE-to-actor mappings is the failure to distinguish between first-order exploitation (the actor developed or purchased a reliable private exploit before public disclosure, indicating genuine capability investment) and nth-order exploitation (the actor used a public exploit kit or Metasploit module after the CVE was widely known, indicating only the capability to download and run publicly available tools). A Tier 1 actor exploiting a critical CVE through a publicly available exploit module and a Tier 5 actor who weaponized the same CVE as a zero-day before disclosure represent fundamentally different threat scenarios for any given defender. The Tier 5 actor's use of the CVE signals that they have ongoing zero-day development or procurement pipelines; the Tier 1 actor's use signals only that they can execute a pre-packaged attack. Current CVE attribution data models treat these as equivalent because they record the fact of exploitation without recording the exploitation type, which conflates operationally distinct threat scenarios and leads to disproportionate defensive responses in both directions.

6.3 LLM-Assisted Technique Identification

A growing body of work applies large language model capabilities to the problem of extracting ATT&CK technique identifiers from unstructured threat intelligence text. The 2025 arXiv paper "On Technique Identification and Threat-Actor Attribution using LLMs" (arXiv:2505.11547) examines multiple LLM architectures on technique extraction from threat reports and on actor attribution from extracted technique sets [23]. The headline finding is that LLMs perform well on technique extraction when the source text describes adversary behaviors in explicit, operational terms but degrade significantly when reports use technical jargon unique to specific vendors, indirect language, or high-level narrative summaries without behavioral specifics. Attribution from LLM-extracted techniques alone performs above random chance but well below what a trained analyst achieves when incorporating contextual intelligence, which is consistent with the Bayesian analysis in Section 3.2: the LLM has access only to the behavioral evidence component $E_B$ and entirely lacks the contextual component $E_C$ that typically provides the highest likelihood ratio in real attribution cases.

The MISP threat-actor-intelligence-server project provides a practical lightweight REST API for programmatic threat actor lookup by name, alias, or UUID, wrapping the MISP Galaxy threat actor dataset in a searchable service [16]. Combined with an LLM-based technique extraction pipeline, this enables a fully automated workflow from raw intelligence report ingest to structured STIX intrusion-set output, complete with ATT&CK technique associations and candidate actor matches ranked by profile similarity. The practical ceiling of this automation is not technique extraction, which LLMs handle adequately, but the contextual fusion step that the Bayesian framework requires for confident attribution. No current LLM has access to classified intelligence, current geopolitical signals, or the institutional knowledge accumulated by experienced analysts over years of tracking specific actor groups, and the training data bias toward well-documented actors means that LLMs will systematically underperform on novel or less-documented actor clusters. Automated attribution should be treated as hypothesis generation (here are the three most likely actors given the observed techniques, with similarity scores) rather than as hypothesis confirmation.


7. Market, Product, and Strategic Implications

7.1 The Commercial Threat Intelligence Platform Landscape

The commercial threat intelligence platform market has undergone significant consolidation since 2020, with the largest vendors developing substantial coverage advantages over second-tier competitors. Google completed its acquisition of Mandiant for $5.4 billion in 2022, making Mandiant's intelligence function, comprising more than 500 analysts across 30 countries and generating approximately 450,000 hours of incident response consulting annually, a unit of Google Cloud's security division [17]. CrowdStrike's Falcon Intelligence module draws on real-time telemetry from its endpoint detection platform deployed across a large commercial customer base, giving it a unique signal source for observing adversary behavior in production environments that complements Mandiant's deep-dive investigation focus. Recorded Future, remaining independent, differentiates through its Insikt Group's finished intelligence production and its proprietary technology for automated collection from dark web sources, underground forums, and technical infrastructure feeds.

The market has three distinct product tiers that serve different buyer needs. Finished intelligence, the highest-value tier, consists of human-validated, contextualized reports on specific threat actors, campaigns, and vulnerabilities, sold primarily to enterprise security teams and government agencies that need strategic and operational decision support. Machine-speed feeds constitute the middle tier: indicator feeds, detection rules, and STIX/TAXII data streams for integration into security tools, sold primarily to SOC operators and MSSP platforms that need fresh IOCs and TTP data in automated ingestion pipelines. Raw collection access is the bottom tier for organizations that want to conduct their own analysis, providing access to underlying data rather than analytical products. The commercial tension in the market is that the finished intelligence products at the top tier depend on the analytical differentiation that the middle-tier standardization projects (STIX/TAXII compatibility, ATT&CK alignment) erode over time: as the data interchange format converges, the differentiation shifts entirely to the quality of the analysis, which is harder to defend with contracts than proprietary data formats.

The pricing and subscription models in the market reward breadth over depth for most buyers, which creates a systemic incentive problem. A security team subscribing to a threat intelligence platform pays for coverage of hundreds of actor groups, most of which are not relevant to their specific threat environment. The intelligence they most need, deep, current, methodologically rigorous coverage of the three to five actor groups most likely to target their industry, is available from these platforms but competes with hundreds of other reports for analyst attention. Platforms that built personalization and relevance ranking into their products, surfacing the intelligence most relevant to a specific buyer's threat profile based on their industry, geography, technology stack, and regulatory environment, would solve a real pain point that current interfaces address only partially through filtering and search.

7.2 Open Problems Worth Solving

Several open problems in adversary taxonomy represent genuine research and commercial opportunities for organizations willing to invest in formal methodology. The most significant is calibrated attribution at machine speed. The current state of the market offers either fast automated outputs (indicator matching, profile similarity scoring) with no formal confidence expression, or slow human-validated assessments with qualitative confidence labels that are not comparable across organizations. A system that produces Bayesian attribution outputs with explicit posterior probability distributions and documented likelihood calculations at machine speed would fill a real gap. The technical components exist: LLM-based technique extraction, IDF-weighted profile similarity from Section 3.1, Bayesian updating from Section 3.2, and prior probability estimation from telemetry. The missing piece is calibration infrastructure: training probability estimates against a ground-truth corpus of correctly attributed incidents where the attribution is verifiable, and building feedback loops that update the probability estimates as ground truth becomes available over time.

The second open problem is adversary capability trajectory forecasting. Current frameworks document what actors have done but do not produce systematic predictions about capability development. A Tier 3 actor (a nation-state-sponsored unit with tooling development capability but not yet zero-day research) follows a predictable capability acquisition path: they either develop capability internally over a multi-year horizon, purchase it from underground markets where zero-days and custom implants are increasingly commoditized, or partner with a Tier 5 actor under intelligence-sharing arrangements. The historical record contains enough documented examples of actors transitioning between capability tiers to support a data-driven model of transition probabilities conditioned on actor category, sponsorship, and observed resource investment signals. Intelligence products that characterize not just an actor's current capability profile but their expected profile in 12 to 36 months would be far more useful for strategic defensive planning than the current snapshot-based approach.

The third open problem is the attribution accountability gap. There is currently no established mechanism for measuring attribution accuracy over time or creating accountability for incorrect attributions. When a vendor attributes an attack to a specific actor group in a published report, there is no systematic process for revisiting the attribution when new evidence becomes available, scoring accuracy against ground truth when it eventually becomes available through law enforcement or intelligence declassification, or publishing corrections when initial attributions prove incorrect. Academic security research has no peer review process for attribution methodology, and commercial incentives actively discourage public correction of past attributions, because corrections undermine client confidence without generating revenue. Platforms that built attribution accuracy tracking into their internal processes, benchmarked themselves against verifiable ground truth cases, and published accuracy metrics as a trust and transparency signal would be producing something genuinely novel in the market.

Threat actor attribution is increasingly implicated in regulatory requirements that create legal consequences for analytical quality. The EU's NIS2 Directive, effective October 2024, requires operators of essential services and digital infrastructure to report significant incidents to national computer security incident response teams and to maintain forensic evidence collection capabilities relevant to attribution [19]. In the United States, CIRCIA (the Cyber Incident Reporting for Critical Infrastructure Act) creates 72-hour reporting timelines that require organizations to characterize the nature of attacks, including actor type to the degree known, in mandatory submissions to CISA. OFAC's extension of financial sanctions to specific cybercriminal and ransomware actors creates a harder legal requirement: organizations conducting transactions with sanctioned entities may face penalties, which means that attributing a ransomware group to a sanctioned entity versus an unsanctioned one has direct legal consequences and demands a much higher standard of analytical rigor than the "probable" assessments common in threat intelligence reports.

The potential legal exposure from incorrect published attributions is a commercial risk that the threat intelligence industry has not yet fully addressed in its product design. A vendor that publicly attributes an attack to a specific nation-state, and that attribution is later found to be materially incorrect, faces reputational damage and potentially legal exposure under contract law or, in some jurisdictions, tortious negligence theories if downstream decisions were made in reliance on the incorrect attribution. The current industry norm of expressing confidence in qualitative terms ("likely," "probable," "moderate confidence") without a published confidence scale with defined probability ranges means that a court or regulator assessing the reasonableness of an attribution conclusion has no objective standard against which to measure the vendor's methodology. Formalizing confidence scales, documenting analytical procedures, and publishing methodological descriptions are not only analytically sound practices; they are increasingly a legal risk management consideration for any organization publishing attribution claims that their clients will act upon.

7.4 What Practitioners Should Do Differently

Four specific recommendations follow from the analysis in this article for a security leader trying to operationalize adversary taxonomy more rigorously within their organization. First, invest in STIX 2.1-native tooling, specifically OpenCTI, as the analytical foundation for your threat intelligence capability: the graph-based data model and native STIX support will become more valuable over time as more intelligence sources publish in STIX and as the formal models for confidence scoring mature. Second, enforce the intrusion-set versus threat-actor distinction in your data model: every behavioral cluster your team identifies should be an intrusion-set object with only those attributed-to relationships asserted that your team's analysis can support at your defined confidence threshold, rather than defaulting to commercially assigned actor names whose attribution confidence is unknown. Third, implement the IDF-weighted technique similarity scoring from Section 3.1 as a complement to ATT&CK Navigator heatmaps: the additional computational complexity is trivial in Python with the mitreattack-python library, and the improvement in attribution signal quality for rare techniques is substantial. Fourth, adopt a formal, numerical confidence scale for all intelligence products your team produces, internal and external, and run periodic retrospectives comparing your past attributions against subsequently confirmed ground truth to build the calibration infrastructure that transforms qualitative confidence labels into actionable probability estimates.


8. Conclusion and Open Problems

The adversary taxonomy problem is a concrete engineering problem disguised as a naming dispute. The naming dispute is real: the same actor having four major names across four major vendors creates operational friction that translates directly into slower detection, misconfigured playbooks, and intelligence products that cannot be compared across sources. But the naming dispute is solvable, as the 2025 industry collaboration demonstrated, and solving it does not require rethinking the analytical foundations. The harder problem, which remains unsolved, is the analytical methodology problem: the field lacks calibrated probability methodology for converting behavioral evidence into attribution confidence, systematic processes for measuring attribution accuracy over time, and formal models for expressing capability levels in ways that are anchored to observable criteria rather than analyst intuition.

The 2025 convergence initiative was a necessary first step but not a sufficient one. It builds the phonebook: you can now look up Midnight Blizzard and find APT29 and Cozy Bear and UNC2452 as aliases. It does not build the calibration layer: you still cannot compare a Mandiant confidence assessment of "highly likely" against a CrowdStrike confidence assessment of "confirmed" and map them to the same numerical probability range, because neither organization publishes a calibrated confidence scale with empirical validation against ground truth. This gap is not merely academic. In a world where OFAC sanctions decisions, CIRCIA compliance reports, and NIS2 incident notifications increasingly depend on the quality of threat actor attribution, the difference between a formally calibrated confidence assessment and a qualitative one is the difference between a legally defensible analytical product and a liability exposure.

Several open problems remain that the community has not adequately addressed. The ground-truth attribution corpus problem: there is no shared, verifiable set of correctly attributed incidents against which attribution methodology can be systematically measured, analogous to the labeled vulnerability datasets used to benchmark static analysis tools. The multi-actor intrusion problem: the ATT&CK and Diamond frameworks assume a single adversary, but complex supply chain attacks like SolarWinds and MOVEit involved multiple distinct actors using the same access pathway in overlapping time windows, and current frameworks represent this scenario poorly. The capability development lag problem: taxonomy systems document the present but do not model adversary capability evolution, leaving defenders permanently attributing last year's attack while building against last year's threat profile. The AI-generated false flag problem: as large language models become capable of generating synthetic malware that closely mimics the code style, API call patterns, and string constants of known actor tool families, the false flag problem from Section 5.2 becomes dramatically cheaper to execute and will systematically degrade the reliability of malware-based attribution for any actor sophisticated enough to use modern AI development tools. Solving any one of these problems rigorously would constitute a significant contribution to the field; solving all four would transform adversary taxonomy from an art practiced by elite intelligence analysts into an engineering discipline accessible to any organization with the data infrastructure and the analytical commitment to do it right.


References

[1] Microsoft Security, "Announcing a new strategic collaboration to bring clarity to threat actor naming," Microsoft Security Blog, June 2, 2025. Available: https://www.microsoft.com/en-us/security/blog/2025/06/02/announcing-a-new-strategic-collaboration-to-bring-clarity-to-threat-actor-naming/

[2] Mandiant, "APT1: Exposing One of China's Cyber Espionage Units," Mandiant Intelligence Center Report, Mandiant Corporation, February 2013.

[3] S. Caltagirone, A. Pendergast, and C. Betz, "The Diamond Model of Intrusion Analysis," Technical Report ADA586960, Center for Cyber Threat Intelligence and Threat Research, July 2013.

[4] B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, and C. B. Thomas, "MITRE ATT&CK: Design and Philosophy," MITRE Technical Report, March 2020. Available: https://attack.mitre.org/docs/ATTACK_Design_and_Philosophy_March_2020.pdf

[5] OASIS, "STIX Version 2.1," OASIS Standard, June 2021. Available: https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html

[6] Recorded Future Insikt Group, "Recorded Future's Threat Actor and Malware Taxonomy," White Paper, Recorded Future, 2018. Available: https://go.recordedfuture.com/hubfs/white-papers/threat-actor-malware-taxonomy.pdf

[7] CrowdStrike, "2026 CrowdStrike Global Threat Report: AI Accelerated Adversaries," CrowdStrike, 2026. Available: https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/

[8] Filigran, "Clarifying Threat Intelligence Concepts: Threat Actors vs. Intrusion Sets," Filigran Blog, 2024. Available: https://filigran.io/cti-concepts-threat-actors-vs-intrusion-sets/

[9] OASIS, "TAXII Version 2.1," OASIS Standard, June 2021. Available: https://docs.oasis-open.org/cti/taxii/v2.1/

[10] V. Mavroeidis, R. Hohimer, T. Casey, and A. Jøsang, "Threat Actor Type Inference and Characterization within Cyber Threat Intelligence," in Proceedings of the 13th International Conference on Cyber Conflict (CyCon), NATO CCDCOE Publications, 2021, pp. 327-352. Available: https://ccdcoe.org/uploads/2021/05/CyCon_2021_Mavroeidis_Hohimer_Casey_Josang.pdf

[11] Kaspersky Global Research and Analysis Team, "Olympic Destroyer is Still Alive," Kaspersky Securelist Blog, October 2018.

[12] Palo Alto Networks Unit 42, "Unit 42's Attribution Framework: How We Investigate Cybersecurity Attacks," Unit 42 Blog, 2024. Available: https://unit42.paloaltonetworks.com/unit-42-attribution-framework/

[13] MISP Project, "MISP Open Source Threat Intelligence Platform and Open Standards for Threat Information Sharing," 2024. Available: https://www.misp-project.org/

[14] OpenCTI Platform, "OpenCTI: Open Cyber Threat Intelligence Platform," GitHub Repository, 2025. Available: https://github.com/OpenCTI-Platform/opencti

[15] Cosive, "MISP vs. OpenCTI: Updated 2025 Guide," Cosive Blog, 2025. Available: https://www.cosive.com/misp-vs-opencti

[16] MISP, "threat-actor-intelligence-server: REST Server for Threat Actor Lookup by Name, Synonym, or UUID," GitHub Repository, 2024. Available: https://github.com/MISP/threat-actor-intelligence-server

[17] Google Cloud, "Mandiant Threat Intelligence," Google Cloud Security Products, 2024. Available: https://cloud.google.com/security/products/mandiant-threat-intelligence

[18] CrowdStrike and Mandiant, "CrowdStrike and Mandiant Form Strategic Partnership to Protect Organizations Against Sophisticated Cybersecurity Events," CrowdStrike Press Release, 2024. Available: https://www.crowdstrike.com/en-us/press-releases/crowdstrike-mandiant-partner-to-protect-organizations-against-cyber-threats/

[19] European Parliament and Council, "Directive (EU) 2022/2555 on Measures for a High Common Level of Cybersecurity across the Union (NIS2 Directive)," Official Journal of the European Union, December 27, 2022.

[20] A. Comprehensive Survey Author Group, "A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems," arXiv:2409.11415v3, September 2024. Available: https://arxiv.org/pdf/2409.11415v3

[21] M. Iannacone, S. Bohn, G. Nakamura, J. Gerber, K. Huffer, B. Bridges, S. Ferragut, and J. Goodall, "Developing an Ontology for Cyber Security Knowledge Graphs," in Proceedings of the 10th Annual Cyber and Information Security Research Conference, ACM, 2015.

[22] arXiv, "Taxonomy for Cybersecurity Threat Attributes and Countermeasures in Smart Manufacturing Systems," arXiv:2401.01374, January 2024. Available: https://arxiv.org/pdf/2401.01374

[23] arXiv, "On Technique Identification and Threat-Actor Attribution using LLMs," arXiv:2505.11547, 2025. Available: https://arxiv.org/html/2505.11547v1

[24] arXiv, "MITRE ATT&CK Applications in Cybersecurity and The Way Forward," arXiv:2502.10825, February 2025. Available: https://arxiv.org/html/2502.10825v1

[25] tropChaud, "Categorized-Adversary-TTPs: Merge of MITRE ATT&CK and ETDA/ThaiCERT Threat Actor Cards," GitHub Repository, 2024. Available: https://github.com/tropChaud/Categorized-Adversary-TTPs

[26] Trail of Bits, "Threat Modeling the Trail of Bits Way: Personae Non Gratae and Structured Threat Decomposition," Trail of Bits Blog, February 28, 2025. Available: https://blog.trailofbits.com/2025/02/28/threat-modeling-the-trail-of-bits-way/

[27] Microsoft Security Blog, "Staying Ahead of Threat Actors in the Age of AI," February 14, 2024. Available: https://www.microsoft.com/en-us/security/blog/2024/02/14/staying-ahead-of-threat-actors-in-the-age-of-ai/

[28] MITRE, "ATT&CK Groups Database," MITRE ATT&CK Version 18.1, December 2025. Available: https://attack.mitre.org/groups/

[29] Qualys, "TruLens: Threat Actor Attribution for CVEs," Qualys Product Announcement, 2026.

[30] OASIS CTI Technical Committee, "STIX 2.1 Examples: Threat Actor and Intrusion Set Relationships," OASIS Open, 2021. Available: https://oasis-open.github.io/cti-documentation/stix/examples.html


Word count: approximately 9,200 words.

Comments