Skip to content

Fix EP fusion graph wiring, GQA node support, and partial shape inference#34

Open
KenLagos wants to merge 1 commit into
mainfrom
bug_fixes
Open

Fix EP fusion graph wiring, GQA node support, and partial shape inference#34
KenLagos wants to merge 1 commit into
mainfrom
bug_fixes

Conversation

@KenLagos

@KenLagos KenLagos commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

FusedMatMul EP fusion:

  • Reject fusion when MatMul output has multiple consumers (fixes SwiGLU SiLU pattern where x*sigmoid(x) needs the raw MatMul output)
  • Extend HasSingleConsumer to treat graph outputs as consumers
  • Reject scale scalars whose rank exceeds the other Mul/Div input's rank

GroupQueryAttention node support:

  • Skip missing optional inputs with null type info in data type check
  • Move CPU constant input bypass before DML unknown-type rejection so int64 inputs like GQA's total_sequence_length are accepted
  • Propagate input/output aliases to ORT kernel defs for KV cache sharing
  • Fix OrtStatus leak in kernel lookup

Partial shape inferrer padding:

  • When a shape inferrer returns fewer outputs than the node has, fill remaining output shapes from the creation context (fixes DynamicQuantizeLinear whose scalar outputs were allocated with the input shape, breaking downstream ConvInteger)

…ence

FusedMatMul EP fusion:
- Reject fusion when MatMul output has multiple consumers (fixes SwiGLU
  SiLU pattern where x*sigmoid(x) needs the raw MatMul output)
- Extend HasSingleConsumer to treat graph outputs as consumers
- Reject scale scalars whose rank exceeds the other Mul/Div input's rank

GroupQueryAttention node support:
- Skip missing optional inputs with null type info in data type check
- Move CPU constant input bypass before DML unknown-type rejection so
  int64 inputs like GQA's total_sequence_length are accepted
- Propagate input/output aliases to ORT kernel defs for KV cache sharing
- Fix OrtStatus leak in kernel lookup

Partial shape inferrer padding:
- When a shape inferrer returns fewer outputs than the node has, fill
  remaining output shapes from the creation context (fixes
  DynamicQuantizeLinear whose scalar outputs were allocated with the
  input shape, breaking downstream ConvInteger)
@KenLagos KenLagos requested a review from apwojcik July 1, 2026 03:14
@KenLagos KenLagos self-assigned this Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant