Skip to content

Transformer specification and auto-generation method for the existing models #164

@aftersnow

Description

@aftersnow

Feature request:

Transformer is the dominant architecture for modern LLMs, and its design has largely converged. For example, most state-of-the-art open-source models adopt GQA/MLA for the Attention layer and MoE for the MLP layer. As a result, a Transformer can be viewed as a composition of standardized building blocks. This enables further abstraction of a unified architectural specification across different open-source models, which can serve as the Transformer specification in ModelPack. Based on this specification, many valuable capabilities become possible.

Expected Outcome:

  • Jointly complete a unified Transformer specification (an in-progress PR already exists)
  • Using vLLM and SGLang, conduct POCs on three or more mainstream open-source Transformer models based on this specification
  • Design a workflow or Claude skills that can automatically generate Transformer specification definitions from models in the Hugging Face transformers repository

Use case:

For inference engines, it enables automatic support for multiple Transformer models, so newly trained Transformer models can be supported without per-model adaptation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions