【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332
Open
r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
Open
【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
Conversation
added 13 commits
March 6, 2026 10:30
- MiniCPM4MLP: gate/up merged, silu activation, no bias - MiniCPM4Attention: GQA with QKVParallelLinear(bias=False), neox rotary - MiniCPM4DecoderLayer: μP residual scaling (scale_depth/sqrt(num_layers)) - MiniCPM4Model: μP embedding scaling (scale_emb), LongRoPE support - MiniCPM4ForCausalLM: μP lm_head scaling (hidden_size/dim_model_base) - Weight mapping: HF model. to FD minicpm4. prefix - Architecture: MiniCPMForCausalLM (GQA, not MLA) - Follows Qwen2 patterns adapted for MiniCPM4 μP parametrization
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review | 2026-04-11
📋 Review 摘要
PR 概述:新增 MiniCPM4.1-8B 模型支持,实现 μP (Maximal Update Parametrization) 缩放机制
变更范围:model_executor/models/minicpm4.py (新文件)、tests/model_executor/test_minicpm4.py (新文件)、文档更新
影响面 Tag:[Models]
📝 PR 规范检查
标题缺少官方 Tag:标题 【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction 没有包含有效的官方 Tag。
标题建议(可直接复制):
[Feature] 添加 MiniCPM4/4.1-8B 模型支持
描述模板(可直接复制):
## Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。
## Modifications
### Model Code
新增 `fastdeploy/model_executor/models/minicpm4.py`,实现 MiniCPM4 模型架构:
- MiniCPM4MLP: Gate/up 合并投影 + SiLU 激活
- MiniCPM4Attention: GQA 注意力机制
- MiniCPM4DecoderLayer: μP residual scaling
- MiniCPM4Model: μP embedding scaling
- MiniCPM4ForCausalLM: μP lm_head scaling,权重映射 HF → FD
### Key Features
- μP 3-site scaling: embedding (×12), residual (×scale_depth/√N), lm_head (÷16)
- 权重映射: HF `model.` → FD `minicpm4.`
- 支持 tie_word_embeddings
- 自动注册为 `MiniCPMForCausalLM`
### Tests & Docs
- 新增 16 个单元测试(全部通过)
- 新增 `docs/best_practices/MiniCPM4-8B.md` 部署文档
- 更新 `docs/supported_models.md`问题
| 级别 | 文件 | 概述 |
|---|---|---|
| (无) | — | 未发现阻塞性问题 |
总体评价
代码实现整体正确,符合 FastDeploy 模型实现规范:
- ✅ μP 3-site scaling 实现正确,缩放顺序符合论文要求
- ✅ 正确复用
layers/标准组件 - ✅ 权重映射和 tensor parallel 支持完整
- ✅ 测试覆盖全面(16 个单元测试全部通过)
- ✅ 文档完整,包含硬件要求和部署示例
唯一建议是修复 PR 标题格式,添加官方 Tag。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。
This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.
MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:
num_key_value_heads=2MiniCPMForCausalLMModifications
Model Code (
fastdeploy/model_executor/models/minicpm4.py)New model file (516 lines) implementing:
MiniCPM4MLP: Gate/up merged projection with SiLU activation, no biasMiniCPM4Attention: GQA withQKVParallelLinear(with_bias=False), neox-style RoPEMiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization supportMiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HFmodel.→ FDminicpm4.), registered asMiniCPMForCausalLMMiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)Documentation
docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuningdocs/supported_models.md: Added MiniCPM4 entry to LLM model tableEngineering Highlights
μP 3-Site Scaling — Correct implementation of Maximal Update Parametrization at three distinct points, each with different mathematical operations:
× scale_emb(amplifies to ×12)× scale_depth / √num_hidden_layers(applied independently to both attention and MLP outputs per layer, before residual add)÷ (hidden_size / dim_model_base)(normalizes ÷16 before logit computation)Ordering is critical: residual scaling must happen after each sub-layer output but before the residual addition.
Vocab Masking:
logits[:, ori_vocab_size:] = -infprevents generation of padding tokens at inference time — preserves original vocabulary boundary whenvocab_sizewas padded during training.tie_word_embeddings: Transposes embedding weight → lm_head with dtype consistency, matching
MiniCPMForCausalLMHF default.Design Decisions
@ModelRegistry.register_model_classdecorator — no manual imports neededscale_emb,scale_depth,dim_model_base) read from HFconfig.jsonviaModelConfigauto-setattrUsage or Command
See docs/best_practices/MiniCPM4-8B.md for full deployment guide.
Accuracy Tests
Unit Tests (16/16 passed across 3 environments)
tests/model_executor/test_minicpm4.py(320 lines, 4 classes, 16 tests)TestMuPScaling(6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)TestWeightMapping(5 tests): Verifies HF→FD weight name mapping (model.→minicpm4.), column/row parallel splitsTestRegistration(4 tests): Model registry, config auto-setattr, architecture nameMiniCPMForCausalLMTestComputeLogits(1 test): End-to-end lm_head scaling with real Paddle tensorsAI Studio V100 GPU Validation
Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.
Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.
CI Coverage Job (H20 GPU)
All 16 tests passed in CI (
run_tests_with_coveragejob):Local CPU Test Output
GPU Validation Note
Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.
Checklist
@ModelRegistry.register_model_classdecorator