Restructure system prompt with softer tone by akseljoonas · Pull Request #33 · huggingface/ml-intern

akseljoonas · 2026-04-09T11:17:59Z

Summary

Adopts the research loop structure, SOTA check, scope discipline, and consolidated sections from the imperative prompt rewrite
Restores the original advisory tone — removes "MUST", "No exceptions", "These are not hypothetical" language
Same content and rules, presented as guidance rather than commands

Test plan

Review diff to confirm no content/rules were lost
Run agent on a sample task to verify prompt behavior is unchanged

Adopt the research loop structure, SOTA check, scope discipline, and consolidated sections from the imperative rewrite, but restore the original advisory tone throughout.

akseljoonas · 2026-04-24T14:08:28Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:47Z

Claude finished @akseljoonas's task in 1m 59s —— View job

PR Review: Restructure system prompt with softer tone

Read the changed file (agent/prompts/system_prompt_v3.yaml)
Diff against origin/main
Check for dropped rules and regressions
Assess correctness, security, LiteLLM/Bedrock routing impact, merge safety

Verdict: NEEDS CHANGES before merge. Three behavioral regressions; otherwise a clean, well-structured rewrite.

Security / Routing impact

None. This file is a pure YAML system prompt with no routing logic, credentials, or infrastructure references. LiteLLM/Bedrock routing is unaffected.

Rebase status

This branch is missing 1f3fd90 (ci: add automatic Claude Code review on PRs) which landed on main after this branch was cut. Needs rebase before merging to avoid a stale-history merge commit.

Rule-by-rule analysis

❌ HARDCODED UNAVAILABLE PACKAGES — fully dropped, no equivalent

The old prompt had an explicit mistake bullet:

"You will forget to install necessary packages like 'flash-attn' for flash_attention_2 or other packages that aren't automatically installed in the job environment. Fix: install necessary packages before running the job."

The new prompt has no equivalent anywhere — not in the mistakes list, not in error recovery, not in the pre-flight check. This is the most impactful regression: an agent using flash_attention_2 will submit a job, the job will fail with a missing-package error, and the prompt gives no guidance on why this happens or what to do. Suggest adding it back to # Common mistakes without research or to the pre-flight check items.

Fix this →

⚠️ Data audit — meaningfully reduced

Old prompt had a dedicated # Data audit section instructing the agent to inspect: schema/columns, row counts per split, value distributions, sample rows, class imbalance, missing values, outliers, duplicate rows — with the note "Looking at data is the best way to boost performance of any ML model."

New prompt's Validate step (step 2) only says "confirm columns match the training method." The # Dataset format requirements section is just a column-name lookup table, not a quality audit. This loses the distribution/imbalance/outlier checks that catch subtle data bugs before they surface as training failures. At minimum, the pre-flight check item Dataset format verified: [columns confirmed via hf_inspect_dataset] should be extended to prompt schema + sample row inspection.

⚠️ Task completion guardrail — gone for non-autonomous mode

The old # Task completion section contained:

"Do not stop after describing what you plan to do. Continue calling tools until the task is verifiably done."
"Did you actually DO what the user asked, not just explain what you would do?"

These are not duplicated anywhere in the new prompt for supervised (non-headless) mode. The headless section preserves "never respond with only text" but that doesn't apply when a human is present. Without this, a model may describe steps and then stop, treating the description as the deliverable.

✅ SILENT DATASET SUBSTITUTION — preserved, distributed across two sections

Covered in Error recovery item 5 ("Do not silently substitute datasets, models, or training methods") and Scope discipline ("Dataset load fails → tell the user, not silently switch to a different dataset"). Not a regression.

✅ BATCH FAILURES — preserved in `# Batch jobs` section

Well-covered. The new layout is cleaner.

✅ SCOPE-CHANGING FIXES — improved

The new # Scope discipline section is more organized and easier to scan than the old wall-of-text bullet. Good addition.

✅ Research loop / SOTA check — improved

The 6-step research loop and dedicated SOTA check section are genuine improvements over the old citation-crawl workflow description. These should be kept.

Summary

Change	Status
Tone softening	✅ No behavioral change
SOTA check section	✅ Improvement
Scope discipline section	✅ Improvement
HARDCODED UNAVAILABLE PACKAGES	❌ Dropped, no equivalent
Data audit depth	⚠️ Reduced
Task completion (non-autonomous)	⚠️ Dropped
SILENT DATASET SUBSTITUTION	✅ Preserved elsewhere
Rebase against main	❌ Needed

The tone rewrite itself is sound. Fix the three items above and rebase, then this is mergeable.

Restructure system prompt with softer tone

14252d3

Adopt the research loop structure, SOTA check, scope discipline, and consolidated sections from the imperative rewrite, but restore the original advisory tone throughout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure system prompt with softer tone#33

Restructure system prompt with softer tone#33
akseljoonas wants to merge 1 commit intomainfrom
soften-system-prompt

akseljoonas commented Apr 9, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akseljoonas commented Apr 9, 2026

Summary

Test plan

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Restructure system prompt with softer tone

Security / Routing impact

Rebase status

Rule-by-rule analysis

❌ HARDCODED UNAVAILABLE PACKAGES — fully dropped, no equivalent

⚠️ Data audit — meaningfully reduced

⚠️ Task completion guardrail — gone for non-autonomous mode

✅ SILENT DATASET SUBSTITUTION — preserved, distributed across two sections

✅ BATCH FAILURES — preserved in # Batch jobs section

✅ SCOPE-CHANGING FIXES — improved

✅ Research loop / SOTA check — improved

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 24, 2026 •

edited

Loading

✅ BATCH FAILURES — preserved in `# Batch jobs` section