feat: add sandbox agent core — reasoning, execution, graph#182
feat: add sandbox agent core — reasoning, execution, graph#182
Conversation
| # Simple HTML tag stripping for readability | ||
| import re | ||
|
|
||
| text = re.sub(r"<script[^>]*>.*?</script>", "", text, flags=re.DOTALL) |
Check failure
Code scanning / CodeQL
Bad HTML filtering regexp High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, the best fix is to avoid hand-written regexes for HTML sanitization and instead use a well-tested HTML parser or sanitizer that correctly handles case-insensitive tag names and malformed tag syntax. Here, since the goal is just to strip scripts/styles and then all tags for readability, we can use Python’s standard html.parser or a small, well-known library, or minimally harden the regexes to be case-insensitive and more tolerant of odd closing tags.
The smallest change that fixes the reported issue without altering overall functionality is:
- Make the
<script>and<style>patterns case-insensitive by usingre.IGNORECASE. - Make the closing tag patterns more forgiving so that
</script foo="bar">and similar variants are also stripped, not just exact</script>or</style>. This can be done by allowing optional trailing attributes on the closing tag. - Leave the rest of the pipeline (removing all other tags, collapsing whitespace, truncation) unchanged.
Concretely, in a2a/sandbox_agent/src/sandbox_agent/graph.py, within the web_fetch function’s HTML handling block (lines ~533–541), update the two re.sub calls that strip script and style blocks. We can keep using the local import re but change the patterns and flags to:
r"(?is)<script\b[^>]*>.*?</script\b[^>]*>"(or equivalently passflags=re.DOTALL | re.IGNORECASEinstead of using(?is)inline).r"(?is)<style\b[^>]*>.*?</style\b[^>]*>".
This makes the tag matching case-insensitive (i) and dot match newlines (s), and accepts extra junk on the closing tag. We’ll still use explicit flags since that’s the existing style.
| @@ -535,8 +535,20 @@ | ||
| # Simple HTML tag stripping for readability | ||
| import re | ||
|
|
||
| text = re.sub(r"<script[^>]*>.*?</script>", "", text, flags=re.DOTALL) | ||
| text = re.sub(r"<style[^>]*>.*?</style>", "", text, flags=re.DOTALL) | ||
| # Remove script and style blocks in a case-insensitive way, | ||
| # allowing for malformed closing tags like </script foo="bar"> | ||
| text = re.sub( | ||
| r"<script\b[^>]*>.*?</script\b[^>]*>", | ||
| "", | ||
| text, | ||
| flags=re.DOTALL | re.IGNORECASE, | ||
| ) | ||
| text = re.sub( | ||
| r"<style\b[^>]*>.*?</style\b[^>]*>", | ||
| "", | ||
| text, | ||
| flags=re.DOTALL | re.IGNORECASE, | ||
| ) | ||
| text = re.sub(r"<[^>]+>", " ", text) | ||
| text = re.sub(r"\s+", " ", text).strip() | ||
|
|
210049c to
9451997
Compare
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…nd graph card endpoint Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…wall-clock limits Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
… reasoning loop Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…to UI Signed-off-by: Ladislav Smola <lsmola@redhat.com>
… JSON streaming format Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…on in workspace Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…nd sandboxed tools Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…trospection Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…6_64/aarch64) Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…rked child process Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…d LangChain auto-instrumentation Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…es from settings.json Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…rnative subplans Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…or, and reporter nodes Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…er, executor, reflector, and reporter nodes Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…ith no-fallback policy Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…egistries, and runtime limits Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…(multi-mode) strategies Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…on on shared PVC Signed-off-by: Ladislav Smola <lsmola@redhat.com>
…matting Auto-fixed: 16 import ordering (I001), unnecessary f-strings (F541) Manual: prefix 6 unused variables with underscore (F401) Formatted: 16 files with ruff format Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Ruff I001 requires contiguous import blocks. The try/except for DatabaseTaskStore was breaking the a2a imports block. Moved it after all clean imports. Signed-off-by: Ladislav Smola <lsmola@redhat.com>
… 3.11+, fixes I001) Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
- Validate context_id against traversal (workspace.py) - Use is_relative_to instead of startswith (subagents.py) - Use shlex.split for interpreter/sources checks (permissions.py, executor.py) - Remove duplicate _MAX_SUB_AGENT_ITERATIONS (subagents.py) - Remove dead _BARE_DECISION_RE (reasoning.py) Signed-off-by: Ladislav Smola <lsmola@redhat.com>
9451997 to
bed64f5
Compare
Summary
New LangGraph-based sandbox agent with:
21 files | Part of Sandbox Agent streaming PR series