Skip to content

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606

Open
janiussyafiq wants to merge 7 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr2
Open

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#13606
janiussyafiq wants to merge 7 commits into
apache:masterfrom
janiussyafiq:feat/ai-lakera-guard-pr2

Conversation

@janiussyafiq

@janiussyafiq janiussyafiq commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Description

PR-2 of ai-lakera-guard, following the input-scanning MVP (#13570). Adds response (output) scanning for non-streaming and streaming (SSE) traffic. Back-compatible: direction still defaults to input.

  • Schema: direction extended to input/output/both; adds response_failure_message.
  • Response path (lua_body_filter, the same dispatch ai-aliyun-content-moderation uses):
    • Non-streaming: scans ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny.
    • Streaming: buffers the SSE response, scans the assembled completion once, then releases it verbatim (clean) or replaces it with a deny SSE terminated by [DONE] (flagged). Because the stream's 200/text/event-stream headers are already committed when buffering begins, a streamed block is delivered as the deny bodydeny_code does not apply to streams.
  • Docs (en + zh) and tests added.

Which issue(s) this PR fixes:

Part of #13291.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 25, 2026
Comment thread apisix/plugins/ai-lakera-guard.lua Outdated
Comment thread apisix/plugins/ai-lakera-guard.lua
Comment thread apisix/plugins/ai-lakera-guard.lua
Comment thread apisix/plugins/ai-lakera-guard.lua
Comment thread apisix/plugins/ai-lakera-guard.lua Outdated
… mode and fail-open support; update documentation and tests
…sts for fail-open behavior; update documentation
nic-6443
nic-6443 previously approved these changes Jun 26, 2026
AlinsRan
AlinsRan previously approved these changes Jun 26, 2026
membphis
membphis previously approved these changes Jun 26, 2026
@AlinsRan AlinsRan self-requested a review June 26, 2026 02:33
Comment on lines +262 to +291
if not ctx.var.llm_request_done then
-- Withhold this chunk until end-of-stream, replacing it with an SSE
-- keep-alive comment. Not "" (nginx treats an empty body as nothing
-- to flush) and not nil (which would let the original chunk reach
-- the client) -- the keep-alive holds the content back while keeping
-- the connection open.
return nil, ":\n\n"
end

local text = ctx.var.llm_response_text
if not text or text == "" then
if conf.fail_open then
core.log.warn("ai-lakera-guard: streamed response ended without ",
"an assembled completion (no upstream usage event?); ",
"fail_open=true, releasing unscanned")
return nil, concat(buffer)
end
core.log.error("ai-lakera-guard: streamed response ended without ",
"an assembled completion (no upstream usage event?); ",
"fail_open=false, blocking response")
return ngx.OK, deny_message(ctx, conf, conf.response_failure_message)
end

local code, message = moderate_response(ctx, conf, text)
if code then
return ngx.OK, message
end

-- Clean: release the buffered stream verbatim, preserving SSE framing.
return nil, concat(buffer)

@AlinsRan AlinsRan Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block-mode streaming can duplicate or drop the response when a protocol converter is active (e.g. Anthropic client → OpenAI upstream via ai-proxy-multi). The same-protocol path tested here is fine.

Root cause: release is gated only on llm_request_done. In ai-providers/base.lua that flag is set before the dispatch loop, and with a converter the loop calls lua_body_filter once per converted chunk. On [DONE], anthropic-messages-to-openai-chat emits two chunks (message_delta + message_stop) — both dispatched with llm_request_done == true.

Two symptoms:

  • Upstream without include_usage: both terminal chunks hit return nil, concat(buffer)response sent twice + Lakera scanned twice.
  • Upstream with include_usage: the converter defers the terminal events to the usage chunk (type usage, doesn't set llm_request_done), then [DONE] converts to nothing → buffer never released, clean response dropped (client gets only keep-alives).

Fix: gate scan+release behind a one-shot flag (e.g. ctx.lakera_response_released) so it runs exactly once, and ensure the buffer is flushed at EOF even when the final dispatch produced no chunk (an end-of-stream filter pass in base.lua, like the abort path this PR adds). A block-mode ai-proxy-multi regression test (Anthropic client + OpenAI streaming upstream) would lock it down.

…for blocking and clean responses; add tests for output direction
@AlinsRan AlinsRan self-requested a review June 26, 2026 05:39
…prove fail-closed behavior; add tests for streaming scenarios
@janiussyafiq janiussyafiq dismissed stale reviews from membphis, AlinsRan, and nic-6443 via ab6cc45 June 26, 2026 07:26
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants