Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion fastdeploy/model_executor/layers/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ def forward(
forward_meta,
proxy_rmsnorm=None,
) -> paddle.Tensor:
if proxy_rmsnorm is None and self.qk_norm_fused and forward_meta.step_use_cudagraph:
if proxy_rmsnorm is None and self.qk_norm_fused:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 建议添加注释说明为什么可以移除 step_use_cudagraph 条件

step_use_cudagraph 原本用于限制在 Decode 阶段(使用 CUDA Graph)才使用融合算子。移除此条件后,融合算子也将在 Prefill 阶段使用。

建议在代码中添加注释说明这一变化的原因和性能影响:

# 在 Prefill 和 Decode 阶段都使用 QKRMSNorm 融合算子
# 移除 step_use_cudagraph 条件以支持 Prefill 阶段性能优化
# 预期加速:部分模型单Kernel 2~7倍,Prefill空泡较大模型单次Forward 2倍左右
if proxy_rmsnorm is None and self.qk_norm_fused:

qkv_out = qk_rmsnorm_fused(
qkv_out,
self.q_norm.weight,
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e/test_Qwen3VL_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def test_consistency_between_runs(api_url, headers, consistent_payload):
content1 = result1["choices"][0]["message"]["content"]

# base result
content2 = "视频中手机支架的颜色是黑色的。"
content2 = "视频中手机支架的颜色是黑色。"

# Verify that result is same as the base result
assert content1.startswith(content2), content1
Expand Down
Loading