Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion fastdeploy/model_executor/layers/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ def forward(
forward_meta,
proxy_rmsnorm=None,
) -> paddle.Tensor:
if proxy_rmsnorm is None and self.qk_norm_fused and forward_meta.step_use_cudagraph:
if proxy_rmsnorm is None and self.qk_norm_fused:
qkv_out = qk_rmsnorm_fused(
qkv_out,
self.q_norm.weight,
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e/test_Qwen3VL_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def test_consistency_between_runs(api_url, headers, consistent_payload):
content1 = result1["choices"][0]["message"]["content"]

# base result
content2 = "视频中手机支架的颜色是黑色的。"
content2 = "视频中手机支架的颜色是黑色。"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 测试预期输出从 "视频中手机支架的颜色是黑色的。" 修改为 "视频中手机支架的颜色是黑色。",这个变更应该在 PR 描述中说明原因。

建议在 PR 描述中补充说明:由于改用 triton qk_rmsnorm_fused kernel 替代 paddle 算子,导致模型输出在端到端测试中存在微小差异(属于正常范围内的浮点计算差异)。


# Verify that result is same as the base result
assert content1.startswith(content2), content1
Expand Down
Loading