Transformer specification and auto-generation method for the existing models

### Feature request:

Transformer is the dominant architecture for modern LLMs, and its design has largely converged. For example, most state-of-the-art open-source models adopt GQA/MLA for the Attention layer and MoE for the MLP layer. As a result, a Transformer can be viewed as a composition of standardized building blocks. This enables further abstraction of a unified architectural specification across different open-source models, which can serve as the Transformer specification in ModelPack. Based on this specification, many valuable capabilities become possible. 

Expected Outcome:
- Jointly complete a unified Transformer specification (an in-progress PR already exists)
- Using vLLM and SGLang, conduct POCs on three or more mainstream open-source Transformer models based on this specification
- Design a workflow or Claude skills that can automatically generate Transformer specification definitions from models in the Hugging Face transformers repository
 
### Use case:

For inference engines, it enables automatic support for multiple Transformer models, so newly trained Transformer models can be supported without per-model adaptation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer specification and auto-generation method for the existing models #164

Feature request:

Use case:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer specification and auto-generation method for the existing models #164

Description

Feature request:

Use case:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions