Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 36 additions & 37 deletions doc/code/memory/3_memory_data_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,50 +78,25 @@ flowchart

This architecture is plumbed throughout PyRIT, providing flexibility to interact with various modalities seamlessly. All pieces are stored in the database as individual `MessagePieces` and are reassembled when needed. The `PromptNormalizer` automatically adds these to the database as prompts are sent.

## SeedPrompts
## Seeds

[`SeedPrompt`](../../../pyrit/models/seeds/seed_prompt.py) objects represent the starting points of conversations. They are used to assemble and initiate attacks, and can be translated to and from `MessagePieces`.
All seed types inherit from [`Seed`](../../../pyrit/models/seeds/seed.py), which provides common fields (`value`, `value_sha256`, `dataset_name`, `harm_categories`, `is_general_technique`, `metadata`, etc.) along with Jinja2 templating and YAML loading support.

**Key Fields:**

- **`value`**: The actual prompt text or file path
- **`data_type`**: Type of data (e.g., `text`, `image_path`, `audio_path`)
- **`name`**: Name of the prompt
- **`dataset_name`**: Name of the dataset this prompt belongs to
- **`harm_categories`**: Categories of harm associated with this prompt
- **`description`**: Description of the prompt's purpose or content
- **`parameters`**: Template parameters that can be filled in dynamically
- **`prompt_group_id`**: Groups related prompts together
- **`role`**: Role in conversation (e.g., `user`, `assistant`)
- **`metadata`**: Arbitrary metadata that can be attached

`SeedPrompts` support Jinja2 templating, allowing dynamic prompt generation with parameter substitution. They can be loaded from YAML files and organized into datasets and groups for systematic testing.

## SeedObjectives
### Seed Types

[`SeedObjective`](../../../pyrit/models/seeds/seed_objective.py) objects represent the goal or objective of an attack or test scenario. They describe what the attacker is trying to achieve and are used alongside `SeedPrompts` to define complete attack scenarios.
- [`SeedPrompt`](../../../pyrit/models/seeds/seed_prompt.py) — A prompt to send to a target. Adds `data_type` (text, image_path, audio_path, etc.), `role` (user/assistant), `sequence` (for multi-turn ordering), and template `parameters`. This is the most common seed type and can be translated to and from `MessagePieces`.

**Key Fields:**
- [`SeedObjective`](../../../pyrit/models/seeds/seed_objective.py) — The goal of an attack (e.g., "Generate hate speech content"). Always text. Cannot be a general technique.

- **`value`**: The objective statement describing the goal (e.g., "Generate hate speech content")
- **`data_type`**: Always `text` for objectives
- **`name`**: Name identifying the objective
- **`dataset_name`**: Name of the dataset this objective belongs to
- **`harm_categories`**: Categories of harm the objective relates to
- **`authors`**: Attribution information for the objective
- **`groups`**: Group affiliations (e.g., "AI Red Team")
- **`source`**: Source or reference for the objective
- **`metadata`**: Additional metadata about the objective
- [`SeedSimulatedConversation`](../../../pyrit/models/seeds/seed_simulated_conversation.py) — Configuration for dynamically generating multi-turn conversations. Specifies system prompt paths, number of turns, and sequence offsets. The actual generation happens in the executor layer.

`SeedObjectives` support Jinja2 templating for dynamic objective generation and can be loaded from YAML files alongside prompts, making it easy to organize and reuse test objectives across different scenarios.
### Seed Groups

**Relationship to SeedGroups:**
Seeds are organized into [`SeedGroup`](../../../pyrit/models/seeds/seed_group.py) containers that enforce consistency (shared `prompt_group_id`, valid role sequences, no duplicate sequence numbers). Two specialized subclasses add further constraints:

`SeedObjective` and `SeedPrompt` objects are combined into [`SeedGroup`](../../../pyrit/models/seeds/seed_group.py) objects, which represent a complete test case with optional seed prompts and an objective. A SeedGroup can contain:
- [`SeedAttackGroup`](../../../pyrit/models/seeds/seed_attack_group.py) — Requires exactly one `SeedObjective`. Represents a complete attack specification: an objective plus optional prompts or simulated conversation config.

- Multiple prompts (for multi-turn conversations)
- A single objective (what the attack is trying to achieve)
- Both prompts and an objective (complete attack specification)
- [`SeedAttackTechniqueGroup`](../../../pyrit/models/seeds/seed_attack_technique_group.py) — All seeds must have `is_general_technique=True` and no `SeedObjective` is allowed. Represents reusable attack techniques (jailbreaks, role-plays, etc.) that can be composed with any objective.


## Scores
Expand Down Expand Up @@ -150,8 +125,8 @@ Scores enable automated evaluation of attack success, content harmfulness, and o

- **`conversation_id`**: The conversation that produced this result
- **`objective`**: Natural-language description of the attacker's goal
- **`attack_identifier`**: Information identifying the attack strategy used
- **`atomic_attack_identifier`**: Composite identifier combining the attack strategy with general technique seed identifiers from the dataset
- **`attack_identifier`**: `ComponentIdentifier` identifying the attack strategy used
- **`atomic_attack_identifier`**: Composite `ComponentIdentifier` combining the attack technique with seed identifiers from the dataset (see [ComponentIdentifiers](#componentidentifiers) below)
- **`last_response`**: The final `MessagePiece` generated in the attack
- **`last_score`**: The final score assigned to the last response
- **`executed_turns`**: Number of turns executed in the attack
Expand All @@ -162,3 +137,27 @@ Scores enable automated evaluation of attack success, content harmfulness, and o
- **`metadata`**: Arbitrary metadata about the attack execution

`AttackResult` objects provide comprehensive reporting on attack campaigns, enabling analysis of red teaming effectiveness and vulnerability identification.

## ComponentIdentifiers

[`ComponentIdentifier`](../../../pyrit/identifiers/component_identifier.py) is an immutable snapshot of a component's behavioral configuration. A single type is used for all components — targets, scorers, converters, and attacks — enabling uniform storage and composition.

**Key Fields:**

- **`class_name`** / **`class_module`**: The Python class and module of the component
- **`params`**: Behavioral parameters (e.g., `temperature`, `model_name`)
- **`children`**: Named child identifiers for composition (e.g., a scorer's `prompt_target`)
- **`hash`**: Content-addressed SHA256 hash computed from class, params, and children

Identifiers are content-addressed: the same configuration always produces the same hash, and any change to params or children produces a different one. This is used throughout PyRIT to track which exact configuration produced a given result.

### Composite Identifiers

For atomic attacks, `build_atomic_attack_identifier` composes a tree of identifiers:

- **`attack_technique`** — the attack strategy and its children (target, converters, scorer, technique seeds)
- **`seed_identifiers`** — all seeds from the seed group, for traceability

### Eval Hashing

[`EvaluationIdentifier`](../../../pyrit/identifiers/evaluation_identifier.py) subclasses wrap a `ComponentIdentifier` and compute a separate **eval hash** that strips operational params (like endpoint URLs) so the same logical configuration on different deployments produces the same hash. This enables grouping equivalent runs for evaluation comparison.
15 changes: 14 additions & 1 deletion pyrit/backend/services/attack_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,20 @@ async def _update_attack_after_message_async(
if ar.atomic_attack_identifier:
atomic = ComponentIdentifier.from_dict(ar.atomic_attack_identifier.to_dict())
atomic_children = dict(atomic.children)
atomic_children["attack"] = new_aid
# Navigate into attack_technique child to update the nested attack child.
technique = atomic_children.get("attack_technique")
if isinstance(technique, ComponentIdentifier):
tech_children = dict(technique.children)
tech_children["attack"] = new_aid
Comment thread
rlundeen2 marked this conversation as resolved.
atomic_children["attack_technique"] = ComponentIdentifier(
class_name=technique.class_name,
class_module=technique.class_module,
params=dict(technique.params),
children=tech_children,
)
else:
# Fallback for pre-nesting rows with children["attack"] directly.
atomic_children["attack"] = new_aid
new_atomic = ComponentIdentifier(
class_name=atomic.class_name,
class_module=atomic.class_module,
Expand Down
2 changes: 1 addition & 1 deletion pyrit/identifiers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
__all__ = [
"AtomicAttackEvaluationIdentifier",
"build_atomic_attack_identifier",
"ChildEvalRule",
"build_seed_identifier",
"ChildEvalRule",
"class_name_to_snake_case",
"ComponentIdentifier",
"compute_eval_hash",
Expand Down
75 changes: 49 additions & 26 deletions pyrit/identifiers/atomic_attack_identifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,17 @@
by combining the attack strategy's identity with the seed identifiers from
the dataset.

The composite identifier always has the same shape:
class_name = "AtomicAttack"
children["attack"] = attack strategy's ComponentIdentifier
children["seeds"] = list of seed ComponentIdentifiers
(may be empty when no seeds are present)
The composite identifier has this shape::

AtomicAttack
├── attack_technique (class_name="AttackTechnique")
│ ├── attack (attack strategy's ComponentIdentifier)
│ └── technique_seeds (optional, list of seed ComponentIdentifiers)
Comment thread
rlundeen2 marked this conversation as resolved.
└── seed_identifiers (list of ALL seed ComponentIdentifiers, for traceability)
"""

import logging
from typing import TYPE_CHECKING, Any, Optional
from typing import TYPE_CHECKING, Any

from pyrit.identifiers.component_identifier import ComponentIdentifier

Expand All @@ -30,6 +32,9 @@
_ATOMIC_ATTACK_CLASS_NAME = "AtomicAttack"
_ATOMIC_ATTACK_CLASS_MODULE = "pyrit.scenario.core.atomic_attack"

_ATTACK_TECHNIQUE_CLASS_NAME = "AttackTechnique"
_ATTACK_TECHNIQUE_CLASS_MODULE = "pyrit.scenario.core.attack_technique"


def build_seed_identifier(seed: "Seed") -> ComponentIdentifier:
"""
Expand All @@ -40,10 +45,10 @@ def build_seed_identifier(seed: "Seed") -> ComponentIdentifier:
always produces the same identifier.

Args:
seed (Seed): The seed to build an identifier for.
seed: The seed to build an identifier for.

Returns:
ComponentIdentifier: An identifier capturing the seed's behavioral properties.
An identifier capturing the seed's behavioral properties.
"""
params: dict[str, Any] = {
"value": seed.value,
Expand All @@ -61,39 +66,57 @@ def build_seed_identifier(seed: "Seed") -> ComponentIdentifier:

def build_atomic_attack_identifier(
*,
attack_identifier: ComponentIdentifier,
seed_group: Optional["SeedGroup"] = None,
technique_identifier: ComponentIdentifier | None = None,
attack_identifier: ComponentIdentifier | None = None,
seed_group: "SeedGroup | None" = None,
) -> ComponentIdentifier:
"""
Build a composite ComponentIdentifier for an atomic attack.

Combines the attack strategy's identity with identifiers for all seeds
from the seed group. Every seed in the group is included in the identity;
each seed's ``is_general_technique`` flag is captured as a param so that
downstream consumers (e.g., evaluation identity) can filter as needed.
The identifier places the attack technique in ``children["attack_technique"]``
and all seeds from the seed group in ``children["seed_identifiers"]`` for traceability.

When no seed_group is provided, the resulting identifier has an empty
``seeds`` children list, but still has the standard ``AtomicAttack``
shape for consistent querying.
Callers that have an ``AttackTechnique`` object should pass
``technique_identifier=attack_technique.get_identifier()``.
Callers that only have a raw attack strategy identifier (e.g. legacy
backward-compat paths) can pass ``attack_identifier`` instead, which is
wrapped in a minimal technique node automatically.

Args:
attack_identifier (ComponentIdentifier): The attack strategy's identifier
(from ``attack.get_identifier()``).
seed_group (Optional[SeedGroup]): The seed group to extract seeds from.
If None, the identifier has an empty seeds list.
technique_identifier: Pre-built technique identifier from
``AttackTechnique.get_identifier()``. Mutually exclusive with
``attack_identifier``.
attack_identifier: Raw attack strategy identifier. Used when no
``AttackTechnique`` instance is available. Mutually exclusive
with ``technique_identifier``.
seed_group: The seed group to extract all seeds from.

Returns:
ComponentIdentifier: A composite identifier with class_name="AtomicAttack",
the attack as a child, and seed identifiers as children.
A composite ComponentIdentifier with class_name="AtomicAttack".

Raises:
ValueError: If both or neither of ``technique_identifier`` and
``attack_identifier`` are provided.
"""
seed_identifiers: list[ComponentIdentifier] = []
if technique_identifier is not None and attack_identifier is not None:
raise ValueError("Provide technique_identifier or attack_identifier, not both")

if technique_identifier is None:
if attack_identifier is None:
raise ValueError("Either technique_identifier or attack_identifier must be provided")
Comment thread
rlundeen2 marked this conversation as resolved.
technique_identifier = ComponentIdentifier(
class_name=_ATTACK_TECHNIQUE_CLASS_NAME,
class_module=_ATTACK_TECHNIQUE_CLASS_MODULE,
children={"attack": attack_identifier},
)

seed_identifiers: list[ComponentIdentifier] = []
if seed_group is not None:
seed_identifiers.extend(build_seed_identifier(seed) for seed in seed_group.seeds)

children: dict[str, Any] = {
"attack": attack_identifier,
"seeds": seed_identifiers,
"attack_technique": technique_identifier,
"seed_identifiers": seed_identifiers,
}

return ComponentIdentifier(
Expand Down
17 changes: 10 additions & 7 deletions pyrit/identifiers/evaluation_identifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,14 +220,17 @@ class AtomicAttackEvaluationIdentifier(EvaluationIdentifier):

Per-child rules:

* ``seed_identifiers`` — excluded entirely (present for traceability only).
* ``attack_technique`` — not listed, so fully included by default.
Its nested children (``objective_target``, ``adversarial_chat``,
``objective_scorer``, ``technique_seeds``) are processed recursively
using the same rules dict, so the rules below apply at any depth.
* ``objective_target`` — include only ``temperature``.
* ``adversarial_chat`` — include ``model_name``, ``temperature``, ``top_p``.
* ``objective_scorer`` — excluded entirely.
* ``seeds`` — include only items where ``is_general_technique=True``.

Non-target children (e.g., ``request_converters``, ``response_converters``)
receive full recursive eval treatment, meaning they fully contribute to
the hash.
Non-target children (e.g., ``request_converters``, ``response_converters``,
``technique_seeds``) receive full recursive eval treatment.
"""

CHILD_EVAL_RULES: ClassVar[dict[str, ChildEvalRule]] = {
Expand All @@ -238,7 +241,7 @@ class AtomicAttackEvaluationIdentifier(EvaluationIdentifier):
included_params=frozenset({"model_name", "temperature", "top_p"}),
),
"objective_scorer": ChildEvalRule(exclude=True),
"seeds": ChildEvalRule(
included_item_values={"is_general_technique": True},
),
"seed_identifiers": ChildEvalRule(exclude=True),
# attack_technique: not listed in rules — fully included in eval hash.
# technique_seeds (nested inside attack_technique): also not listed — fully included.
}
8 changes: 4 additions & 4 deletions pyrit/memory/azure_sql_memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,19 +483,19 @@ def get_unique_attack_class_names(self) -> list[str]:
rows = session.execute(
text(
"""SELECT DISTINCT JSON_VALUE(atomic_attack_identifier,
'$.children.attack.class_name') AS cls
'$.children.attack_technique.children.attack.class_name') AS cls
FROM "AttackResultEntries"
WHERE ISJSON(atomic_attack_identifier) = 1
AND JSON_VALUE(atomic_attack_identifier,
'$.children.attack.class_name') IS NOT NULL"""
'$.children.attack_technique.children.attack.class_name') IS NOT NULL"""
)
).fetchall()
return sorted(row[0] for row in rows)

def get_unique_converter_class_names(self) -> list[str]:
"""
Azure SQL implementation: extract unique converter class_name values
from the children.attack.children.request_converters array
from the children.attack_technique.children.attack.children.request_converters array
in the atomic_attack_identifier JSON column.

Returns:
Expand All @@ -507,7 +507,7 @@ def get_unique_converter_class_names(self) -> list[str]:
"""SELECT DISTINCT JSON_VALUE(c.value, '$.class_name') AS cls
FROM "AttackResultEntries"
CROSS APPLY OPENJSON(JSON_QUERY(atomic_attack_identifier,
'$.children.attack.children.request_converters')) AS c
'$.children.attack_technique.children.attack.children.request_converters')) AS c
WHERE ISJSON(atomic_attack_identifier) = 1
AND JSON_VALUE(c.value, '$.class_name') IS NOT NULL"""
)
Expand Down
4 changes: 2 additions & 2 deletions pyrit/memory/memory_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -1537,7 +1537,7 @@ def get_attack_results(
conditions.append(
self._get_condition_json_property_match(
json_column=AttackResultEntry.atomic_attack_identifier,
property_path="$.children.attack.class_name",
property_path="$.children.attack_technique.children.attack.class_name",
value=attack_class,
case_sensitive=True,
)
Expand All @@ -1549,7 +1549,7 @@ def get_attack_results(
conditions.append(
self._get_condition_json_array_match(
json_column=AttackResultEntry.atomic_attack_identifier,
property_path="$.children.attack.children.request_converters",
property_path="$.children.attack_technique.children.attack.children.request_converters",
array_element_path="$.class_name",
array_to_match=converter_classes,
)
Expand Down
Loading
Loading