【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction by r-cloudforge · Pull Request #7332 · PaddlePaddle/FastDeploy

r-cloudforge · 2026-04-10T19:56:08Z

Motivation

⚡ Engineering Highlight: Correct μP (Maximal Update Parametrization) 3-site scaling — embedding (×12), residual (×scale_depth/√N per sub-layer), lm_head (÷16) — with ordering-critical placement (each scaling must happen after sub-layer output but before residual add), plus vocab masking and tie_word_embeddings.

为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.

MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:

μP (Maximal Update Parametrization): Three scaling sites — embedding (×12), residual (×scale_depth/√num_layers), and lm_head (÷hidden_size/dim_model_base)
GQA: Grouped Query Attention with num_key_value_heads=2
LongRoPE: Extended position encoding supporting up to 65,536 tokens
Architecture registered as MiniCPMForCausalLM

Modifications

Model Code (`fastdeploy/model_executor/models/minicpm4.py`)

New model file (516 lines) implementing:

MiniCPM4MLP: Gate/up merged projection with SiLU activation, no bias
MiniCPM4Attention: GQA with QKVParallelLinear(with_bias=False), neox-style RoPE
MiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)
MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization support
MiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HF model. → FD minicpm4.), registered as MiniCPMForCausalLM
MiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)

Documentation

docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuning
docs/supported_models.md: Added MiniCPM4 entry to LLM model table

Engineering Highlights

μP 3-Site Scaling — Correct implementation of Maximal Update Parametrization at three distinct points, each with different mathematical operations:
- Embedding: × scale_emb (amplifies to ×12)
- Residual: × scale_depth / √num_hidden_layers (applied independently to both attention and MLP outputs per layer, before residual add)
- LM head: ÷ (hidden_size / dim_model_base) (normalizes ÷16 before logit computation)
Ordering is critical: residual scaling must happen after each sub-layer output but before the residual addition.
Vocab Masking: logits[:, ori_vocab_size:] = -inf prevents generation of padding tokens at inference time — preserves original vocabulary boundary when vocab_size was padded during training.
tie_word_embeddings: Transposes embedding weight → lm_head with dtype consistency, matching MiniCPMForCausalLM HF default.

Design Decisions

Followed Qwen2 model pattern (closest architecture in FastDeploy) with μP scaling additions
Auto-discovery via @ModelRegistry.register_model_class decorator — no manual imports needed
μP config values (scale_emb, scale_depth, dim_model_base) read from HF config.json via ModelConfig auto-setattr
Quantization support (WINT8/WINT4/FP8) through standard FastDeploy layers — no custom ops needed

Usage or Command

# Deploy MiniCPM4.1-8B with WINT4 quantization
python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

# Send a request
curl http://localhost:8180/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openbmb/MiniCPM4.1-8B",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 512
  }'

See docs/best_practices/MiniCPM4-8B.md for full deployment guide.

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

Test file: tests/model_executor/test_minicpm4.py (320 lines, 4 classes, 16 tests)
TestMuPScaling (6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)
TestWeightMapping (5 tests): Verifies HF→FD weight name mapping (model. → minicpm4.), column/row parallel splits
TestRegistration (4 tests): Model registry, config auto-setattr, architecture name MiniCPMForCausalLM
TestComputeLogits (1 test): End-to-end lm_head scaling with real Paddle tensors

AI Studio V100 GPU Validation

Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.

Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.

CI Coverage Job (H20 GPU)

All 16 tests passed in CI (run_tests_with_coverage job):

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED

Local CPU Test Output

$ pytest tests/model_executor/test_minicpm4.py -v
========================= test session starts ==========================
platform linux -- Python 3.13.9, pytest-9.0.2, pluggy-1.5.0
collected 16 items

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED [  6%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED [ 12%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED [ 18%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED [ 25%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED [ 31%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED [ 37%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED [ 43%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED [ 50%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED [ 56%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED [ 62%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED [ 68%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED [ 75%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED [ 81%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED [ 87%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED [ 93%]
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED [100%]
======================== 16 passed, 1 warning in 0.55s =================

GPU Validation Note

Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.

Checklist

Model code follows existing FastDeploy patterns (Qwen2 reference)
All pre-commit checks pass (black, isort, flake8, ruff)
Model registered via @ModelRegistry.register_model_class decorator
Weight mapping supports HuggingFace torch format
Usage documentation provided
Supported models table updated
GPU validation (unit tests passed on V100)
Unit tests: 16/16 passed (CPU + GPU)

- MiniCPM4MLP: gate/up merged, silu activation, no bias - MiniCPM4Attention: GQA with QKVParallelLinear(bias=False), neox rotary - MiniCPM4DecoderLayer: μP residual scaling (scale_depth/sqrt(num_layers)) - MiniCPM4Model: μP embedding scaling (scale_emb), LongRoPE support - MiniCPM4ForCausalLM: μP lm_head scaling (hidden_size/dim_model_base) - Weight mapping: HF model. to FD minicpm4. prefix - Architecture: MiniCPMForCausalLM (GQA, not MLA) - Follows Qwen2 patterns adapted for MiniCPM4 μP parametrization

…registration

paddle-bot · 2026-04-10T19:56:22Z

Thanks for your contribution!

fastdeploy-bot

🤖 AI Code Review | 2026-04-11

📋 Review 摘要

PR 概述：新增 MiniCPM4.1-8B 模型支持，实现 μP (Maximal Update Parametrization) 缩放机制

变更范围：model_executor/models/minicpm4.py (新文件)、tests/model_executor/test_minicpm4.py (新文件)、文档更新

影响面 Tag：[Models]

📝 PR 规范检查

标题缺少官方 Tag：标题 【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction 没有包含有效的官方 Tag。

标题建议（可直接复制）：

[Feature] 添加 MiniCPM4/4.1-8B 模型支持

描述模板（可直接复制）：

## Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

## Modifications
### Model Code
新增 `fastdeploy/model_executor/models/minicpm4.py`，实现 MiniCPM4 模型架构：
- MiniCPM4MLP: Gate/up 合并投影 + SiLU 激活
- MiniCPM4Attention: GQA 注意力机制
- MiniCPM4DecoderLayer: μP residual scaling
- MiniCPM4Model: μP embedding scaling
- MiniCPM4ForCausalLM: μP lm_head scaling，权重映射 HF → FD

### Key Features
- μP 3-site scaling: embedding (×12), residual (×scale_depth/√N), lm_head (÷16)
- 权重映射: HF `model.` → FD `minicpm4.`
- 支持 tie_word_embeddings
- 自动注册为 `MiniCPMForCausalLM`

### Tests & Docs
- 新增 16 个单元测试（全部通过）
- 新增 `docs/best_practices/MiniCPM4-8B.md` 部署文档
- 更新 `docs/supported_models.md`

问题

级别	文件	概述
(无)	—	未发现阻塞性问题

总体评价

代码实现整体正确，符合 FastDeploy 模型实现规范：

✅ μP 3-site scaling 实现正确，缩放顺序符合论文要求
✅ 正确复用 layers/ 标准组件
✅ 权重映射和 tensor parallel 支持完整
✅ 测试覆盖全面（16 个单元测试全部通过）
✅ 文档完整，包含硬件要求和部署示例

唯一建议是修复 PR 标题格式，添加官方 Tag。

cloudforge1 added 13 commits March 6, 2026 10:30

Merge remote-tracking branch 'upstream/develop' into develop

daf20d9

Merge remote-tracking branch 'upstream/develop' into develop

6f1e63c

Merge remote-tracking branch 'upstream/develop' into develop

4deb7a7

Merge remote-tracking branch 'upstream/develop' into develop

676daf6

Merge remote-tracking branch 'upstream/develop' into develop

9bcfdca

Merge remote-tracking branch 'upstream/develop' into develop

2bfa878

Merge remote-tracking branch 'upstream/develop' into develop

262c470

Merge remote-tracking branch 'upstream/develop' into develop

171b4d3

Merge remote-tracking branch 'upstream/develop' into develop

def0bd2

Merge remote-tracking branch 'upstream/develop' into develop

4fad5dc

add MiniCPM4 usage documentation and supported models entry

59758a5

add CPU-side unit tests for MiniCPM4 μP scaling, weight mapping, and …

1cb8661

…registration

r-cloudforge temporarily deployed to Metax_ci April 10, 2026 19:56 — with GitHub Actions Inactive

paddle-bot bot added the contributor External developers label Apr 10, 2026

r-cloudforge mentioned this pull request Apr 10, 2026

CloudForge-Solutions — Hackathon 10th Spring Portfolio Tracker PaddlePaddle/community#1325

Open

fastdeploy-bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332

【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332
r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-v2

r-cloudforge commented Apr 10, 2026

Uh oh!

paddle-bot bot commented Apr 10, 2026

Uh oh!

fastdeploy-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r-cloudforge commented Apr 10, 2026

Motivation

Modifications

Model Code (fastdeploy/model_executor/models/minicpm4.py)

Documentation

Engineering Highlights

Design Decisions

Usage or Command

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

AI Studio V100 GPU Validation

CI Coverage Job (H20 GPU)

Local CPU Test Output

GPU Validation Note

Checklist

Uh oh!

paddle-bot bot commented Apr 10, 2026

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Model Code (`fastdeploy/model_executor/models/minicpm4.py`)