FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer` by fdubut · Pull Request #1366 · Azure/PyRIT

fdubut · 2026-02-12T00:52:52Z

Description

A couple of fixes:

Support content moderation errors in MultiPromptSendingAttack. Currently the attack fails with an uncaught exception if one of the intermediate prompts returns a moderation error. With the fix, it will fail gracefully.
Support safe completions in SelfAskRefusalScorer. Currently, the scorer will lean towards "not a refusal" if the model returns a safe completion (which most modern models post GPT-5 will do). With the added option, safe completions are considered a refusal. The default is unchanged, this is an additional template that users can select when they instantiate the scorer.

Tests and Documentation

Added one test to verify SelfAskRefusalScorer throws an exception when no objective is provided and safe completions are disallowed (the scorer needs to know the objective to assess whether this was a "safe" completion or a "true" completion).

rlundeen2 · 2026-02-17T19:43:31Z

pyrit/datasets/score/refusal/refusal_no_safe_completions.yaml

@@ -0,0 +1,185 @@
+name: Refusal Scorer, No Safe Completions Allowed


The original also should consider safe responses a refusal so this shouldn't necessarily be something that distinguishes. And this might handle that case better.

One important thing is we need to add to our refusal metrics. Can you create a story to add to update metrices (update refusal.csv) and include examples where the old one was mis-scoring? Then we can run evals and see which is better over the entire dataset

Are you suggesting renaming the yaml or something else?

rlundeen2 · 2026-02-17T19:49:04Z

pyrit/score/true_false/self_ask_refusal_scorer.py

            validator (Optional[ScorerPromptValidator]): Custom validator. Defaults to None.
            score_aggregator (TrueFalseAggregatorFunc): The aggregator function to use.
                Defaults to TrueFalseScoreAggregator.OR.
+            allow_safe_completions (bool): Whether to allow safe completions.


Rather than this, can we update so we have an enum of the appropriate SeedPrompts. Again, consider a naming that might make more sense but even this is fine.

class RefusalSystemPromptPaths(enum.Enum): """Paths to refusal scorer system prompt YAML files.""" WITH_OBJECTIVE = Path(SCORER_SEED_PROMPT_PATH, "refusal", "refusal_with_objective.yaml").resolve() WITHOUT_OBJECTIVE = Path(SCORER_SEED_PROMPT_PATH, "refusal", "refusal_without_objective.yaml").resolve() NO_SAFE_COMPLETIONS = Path(SCORER_SEED_PROMPT_PATH, "refusal", "refusal_no_safe_completions.yaml").resolve()

Also, the objective stuff likely doesn't matter much anymore since everyone passes it now, we can get rid of that logic and just use the system prompt passed in

def __init__( self, *, chat_target: PromptChatTarget, system_prompt: Optional[SeedPrompt] = None, validator: Optional[ScorerPromptValidator] = None, score_aggregator: TrueFalseAggregatorFunc = TrueFalseScoreAggregator.OR, ) -> None: # if None make sure to set the default `system_prompt`

One gotcha is we need to make sure all templates work whether or not we pass objectives. If not, it shoudl only be a small update to update the template or passing an empty string for the objective or something like that.

Our other self ask scorers pass in a system prompt path rather than system prompt into init. We should probably follow that patten here right?

right now, the use of the disallowing safe completions system prompt requires an objective. Do we want it to work without an objective?

rlundeen2 · 2026-02-17T19:50:08Z

pyrit/score/true_false/self_ask_refusal_scorer.py

+            allow_safe_completions (bool): Whether to allow safe completions.
+                Safe completions can be disallowed only if an objective is provided. This is enforced at
+                scoring time since the same scorer instance can be used to score with and without objectives.
+                Defaults to True.


As an aside, while we're here, it would be good to update prompt_value_template as an __init__ arg rather than hard coded.

rlundeen2 · 2026-02-17T19:50:32Z

pyrit/score/true_false/self_ask_refusal_scorer.py

            SeedPrompt.from_yaml_file(REFUSAL_SCORE_SYSTEM_PROMPT_WITH_OBJECTIVE)
        ).value
        self._system_prompt_without_objective = (
-            SeedPrompt.from_yaml_file(REFUSAL_SCORE_SYSTEM_PROMPT_WITH_OBJECTIVE)


With the changes, make sure to update _build_identifier so evals work properly

fdubut added 3 commits February 11, 2026 10:48

Add support for safe completions in refusal scorer

d5fc459

Fix blocked/error handling of MultiPromptSendingAttack

86e5985

Merge branch 'main' of https://github.com/fdubut/PyRIT into bug_fixes

5bfe81b

rlundeen2 reviewed Feb 17, 2026

View reviewed changes

rlundeen2 assigned jsong468 Feb 17, 2026

jsong468 added 3 commits February 17, 2026 12:49

Merge branch 'main' into bug_fixes

84d500a

PR feedback

77b26a6

docstring

4d702f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer`#1366

FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer`#1366
fdubut wants to merge 6 commits intoAzure:mainfrom
fdubut:bug_fixes

fdubut commented Feb 12, 2026

Uh oh!

rlundeen2 Feb 17, 2026

Uh oh!

jsong468 Feb 17, 2026

Uh oh!

rlundeen2 Feb 17, 2026

Uh oh!

rlundeen2 Feb 17, 2026 •

edited

Loading

Uh oh!

jsong468 Feb 17, 2026 •

edited

Loading

Uh oh!

jsong468 Feb 17, 2026

Uh oh!

rlundeen2 Feb 17, 2026

Uh oh!

rlundeen2 Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,185 @@
		name: Refusal Scorer, No Safe Completions Allowed

Conversation

fdubut commented Feb 12, 2026

Description

Tests and Documentation

Uh oh!

rlundeen2 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

jsong468 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsong468 Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsong468 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rlundeen2 Feb 17, 2026 •

edited

Loading

jsong468 Feb 17, 2026 •

edited

Loading