Skip to content

FEAT: Adding SeedAttackTechniqueGroup#1373

Open
rlundeen2 wants to merge 4 commits intoAzure:mainfrom
rlundeen2:users/rlundeen/atomic_attack_identifier
Open

FEAT: Adding SeedAttackTechniqueGroup#1373
rlundeen2 wants to merge 4 commits intoAzure:mainfrom
rlundeen2:users/rlundeen/atomic_attack_identifier

Conversation

@rlundeen2
Copy link
Contributor

One problem we want to tackle is to identify unique attack techniques. As we are currently architected, this consists of two parts

  1. An AttackIdentifier: This includes the attack, converters, targets, scorers, etc.
  2. A subset of the datasets, but not all datasets. There are certain datasets that are general. This might be a system prompt jailbreak or a simulated role play conversation.

These are the factors we want to include when we calculate how successful an attack is. But a gap we have is the datasets.

This PR includes a way to distinguish general datasets with the is_general_strategy. To start, simulated conversations and jailbreaks will have this by default, others will not.

In a future PR, we'll introduce an AtomicAttackIdentifier to uniquely identify these.

assert jailbreak_string.template_source == "<string_template>"


def test_all_jailbreak_yaml_templates_have_is_general_strategy(jailbreak_dir):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked making this a property of the data, and then just testing all jailbreaks have this property. I debated other ways (like updating TextJailBreak to set it). But ultimately this seems like a property of the data, so I like this

class SeedObjective(Seed):
"""Represents a seed objective with various attributes and metadata."""

is_general_strategy: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point is to use a boolean to distinguish between general datasets and strategic datasets, but could this be a set instead? I imagine that if users are already going to want to filter by how generic a dataset is, they may want to track other pieces of information as well. Something like:

    is_general_strategy: set = set()

tells the user "this is a generic dataset", while

    is_general_strategy: set = {"jailbreak", "simulation"}

allows the user to check for the presence of any tags (this provides the generic/strategic split desired) while also tracking more specific details.

If I've just misunderstood the purpose of this flag, please feel free to say so 😄 this may be overengineering or only worth visiting in a follow-up PR

parameters:
- prompt
data_type: text
is_general_strategy: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my later point in the review, this also lets us treat this like an optional field, so new or legacy datasets can be treated as generic by default, while strategic datasets get a new field that makes it explicit rather than implicit

missing = []
for yaml_file in yaml_files:
seed = SeedPrompt.from_yaml_file(yaml_file)
if not seed.is_general_strategy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this raise an error for non-boolean values, and if so, do we want to catch it explicitly or let SeedPrompt handle it? Curious if we see this evolving such that explicitly flagging as false vs mistyping vs omitting the field are meaningfully different failure modes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments