Skip to content

【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332

Open
r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-v2
Open

【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332
r-cloudforge wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-v2

Conversation

@r-cloudforge
Copy link
Copy Markdown

Motivation

⚡ Engineering Highlight: Correct μP (Maximal Update Parametrization) 3-site scaling — embedding (×12), residual (×scale_depth/√N per sub-layer), lm_head (÷16) — with ordering-critical placement (each scaling must happen after sub-layer output but before residual add), plus vocab masking and tie_word_embeddings.

为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.

MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:

  • μP (Maximal Update Parametrization): Three scaling sites — embedding (×12), residual (×scale_depth/√num_layers), and lm_head (÷hidden_size/dim_model_base)
  • GQA: Grouped Query Attention with num_key_value_heads=2
  • LongRoPE: Extended position encoding supporting up to 65,536 tokens
  • Architecture registered as MiniCPMForCausalLM

Modifications

Model Code (fastdeploy/model_executor/models/minicpm4.py)

New model file (516 lines) implementing:

  • MiniCPM4MLP: Gate/up merged projection with SiLU activation, no bias
  • MiniCPM4Attention: GQA with QKVParallelLinear(with_bias=False), neox-style RoPE
  • MiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)
  • MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization support
  • MiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HF model. → FD minicpm4.), registered as MiniCPMForCausalLM
  • MiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)

Documentation

  • docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuning
  • docs/supported_models.md: Added MiniCPM4 entry to LLM model table

Engineering Highlights

  1. μP 3-Site Scaling — Correct implementation of Maximal Update Parametrization at three distinct points, each with different mathematical operations:

    • Embedding: × scale_emb (amplifies to ×12)
    • Residual: × scale_depth / √num_hidden_layers (applied independently to both attention and MLP outputs per layer, before residual add)
    • LM head: ÷ (hidden_size / dim_model_base) (normalizes ÷16 before logit computation)

    Ordering is critical: residual scaling must happen after each sub-layer output but before the residual addition.

  2. Vocab Masking: logits[:, ori_vocab_size:] = -inf prevents generation of padding tokens at inference time — preserves original vocabulary boundary when vocab_size was padded during training.

  3. tie_word_embeddings: Transposes embedding weight → lm_head with dtype consistency, matching MiniCPMForCausalLM HF default.

Design Decisions

  • Followed Qwen2 model pattern (closest architecture in FastDeploy) with μP scaling additions
  • Auto-discovery via @ModelRegistry.register_model_class decorator — no manual imports needed
  • μP config values (scale_emb, scale_depth, dim_model_base) read from HF config.json via ModelConfig auto-setattr
  • Quantization support (WINT8/WINT4/FP8) through standard FastDeploy layers — no custom ops needed

Usage or Command

# Deploy MiniCPM4.1-8B with WINT4 quantization
python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

# Send a request
curl http://localhost:8180/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openbmb/MiniCPM4.1-8B",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 512
  }'

See docs/best_practices/MiniCPM4-8B.md for full deployment guide.

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

  • Test file: tests/model_executor/test_minicpm4.py (320 lines, 4 classes, 16 tests)
  • TestMuPScaling (6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)
  • TestWeightMapping (5 tests): Verifies HF→FD weight name mapping (model.minicpm4.), column/row parallel splits
  • TestRegistration (4 tests): Model registry, config auto-setattr, architecture name MiniCPMForCausalLM
  • TestComputeLogits (1 test): End-to-end lm_head scaling with real Paddle tensors

AI Studio V100 GPU Validation

Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.

Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.

CI Coverage Job (H20 GPU)

All 16 tests passed in CI (run_tests_with_coverage job):

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED

Local CPU Test Output

$ pytest tests/model_executor/test_minicpm4.py -v
========================= test session starts ==========================
platform linux -- Python 3.13.9, pytest-9.0.2, pluggy-1.5.0
collected 16 items

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED [  6%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED [ 12%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED [ 18%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED [ 25%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED [ 31%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED [ 37%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED [ 43%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED [ 50%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED [ 56%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED [ 62%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED [ 68%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED [ 75%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED [ 81%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED [ 87%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED [ 93%]
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED [100%]
======================== 16 passed, 1 warning in 0.55s =================

GPU Validation Note

Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.

Checklist

  • Model code follows existing FastDeploy patterns (Qwen2 reference)
  • All pre-commit checks pass (black, isort, flake8, ruff)
  • Model registered via @ModelRegistry.register_model_class decorator
  • Weight mapping supports HuggingFace torch format
  • Usage documentation provided
  • Supported models table updated
  • GPU validation (unit tests passed on V100)
  • Unit tests: 16/16 passed (CPU + GPU)

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 10, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-11

📋 Review 摘要

PR 概述:新增 MiniCPM4.1-8B 模型支持,实现 μP (Maximal Update Parametrization) 缩放机制

变更范围model_executor/models/minicpm4.py (新文件)、tests/model_executor/test_minicpm4.py (新文件)、文档更新

影响面 Tag[Models]

📝 PR 规范检查

标题缺少官方 Tag:标题 【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction 没有包含有效的官方 Tag。

标题建议(可直接复制):

[Feature] 添加 MiniCPM4/4.1-8B 模型支持

描述模板(可直接复制):

## Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

## Modifications
### Model Code
新增 `fastdeploy/model_executor/models/minicpm4.py`,实现 MiniCPM4 模型架构:
- MiniCPM4MLP: Gate/up 合并投影 + SiLU 激活
- MiniCPM4Attention: GQA 注意力机制
- MiniCPM4DecoderLayer: μP residual scaling
- MiniCPM4Model: μP embedding scaling
- MiniCPM4ForCausalLM: μP lm_head scaling,权重映射 HF → FD

### Key Features
- μP 3-site scaling: embedding (×12), residual (×scale_depth/√N), lm_head (÷16)
- 权重映射: HF `model.` → FD `minicpm4.`
- 支持 tie_word_embeddings
- 自动注册为 `MiniCPMForCausalLM`

### Tests & Docs
- 新增 16 个单元测试(全部通过)
- 新增 `docs/best_practices/MiniCPM4-8B.md` 部署文档
- 更新 `docs/supported_models.md`

问题

级别 文件 概述
(无) 未发现阻塞性问题

总体评价

代码实现整体正确,符合 FastDeploy 模型实现规范:

  • ✅ μP 3-site scaling 实现正确,缩放顺序符合论文要求
  • ✅ 正确复用 layers/ 标准组件
  • ✅ 权重映射和 tensor parallel 支持完整
  • ✅ 测试覆盖全面(16 个单元测试全部通过)
  • ✅ 文档完整,包含硬件要求和部署示例

唯一建议是修复 PR 标题格式,添加官方 Tag。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants