Skip to content

Conversation

@spiridonov
Copy link

Rationale for this change

apache/arrow-go is used in the new query engine in Grafana Loki and we are actively working on improving its performance. Here are few low hanging fruits that reduce allocations by many gigabytes when used on the hot path.

What changes are included in this PR?

This PR reduces amount of allocated objects.

  • arrow/datatype_binary.go: straightforward, those Layout objects are allocated on each call currently.
  • arrow/schema.go: Schema.Fields() does not return a clone anymore and NewSchemaWithEndian() does not clone inputs anymore. All call sites are func-signature compatible, but should not modify those args/results anymore. In our opinion this is a reasonable trade-off for higher performance. I am happy to discuss other ways to implement that.

Are these changes tested?

  • Tested with the current apache/arrow-go unit test suite.
  • Tested with grafana/loki test suite for the query engine.

Are there any user-facing changes?

Yes. Explained above.

@spiridonov spiridonov requested a review from zeroshade as a code owner January 20, 2026 21:14
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, just a couple questions / nit picks

@spiridonov
Copy link
Author

spiridonov commented Jan 21, 2026

Thank you for taking a look @zeroshade!

I have changed a few things:

  • Added a private newSchema constructor, so that there is a way to initialize Schema without cloning fields and meta. This is used internally by WithEndianness() and AddField(). The latter used to effectively allocate fields twice (first inside itself and then by calling NewSchema).
  • Rolled back my Fields() change as a middle ground. So your old behavior remains, but allocations are at the old level.

Each Loki query can touch hundreds of streams that are spread over thousands of data objects, which results in thousands of arrow records being processed. Small functions such as NewSchema or Fields called thousands times turn into gigabytes of allocations quickly.

I find it a bit tricky with iter.Seq[Field]. By itself it does not allocate anything. But calling yield on each iteration effectively allocates a record on the heap because 1/ this fails escape test (a record is passed to another function and can be used after the current closure returns) and 2/ the record has pointers inside. Using iter.Seq[Field] will not allocate a slice, but it will allocate the same amount of records on the heap anyway.

@spiridonov
Copy link
Author

spiridonov commented Jan 21, 2026

Most of our patterns is like following:

  • Use a local variable []arrow.Field
  • Build whatever fields we need here
  • Pass it into NewSchema
  • Produce arrow record with new schema
  • Forget the variable and move on

We already build fields for one time use and do not modify them later. So it feels inefficient to clone them right after NewSchema call...

Access to Fields() is also read-only. I understand that it accidentally can be modified, but there are a lot of things that can go wrong accidentally. If the code is easy to read and follow the likelihood is low. "Get an arrow record, look at its schema, create a new record with another schema, return new record, move on...."

@zeroshade
Copy link
Member

I find it a bit tricky with iter.Seq[Field]. By itself it does not allocate anything. But calling yield on each iteration effectively allocates a record on the heap because 1/ this fails escape test (a record is passed to another function and can be used after the current closure returns) and 2/ the record has pointers inside. Using iter.Seq[Field] will not allocate a slice, but it will allocate the same amount of records on the heap anyway.

I'm confused here, why would yield allocate an entire record on the heap as opposed to just creating a copy of the Field on the stack? For clarify, I'm suggesting using the slices package so the function would be like:

func (s *Schema) FieldIter() iter.Seq[Field] {
    return slices.Values(s.fields)
}

and would then be used like:

for f := range sc.FieldIter() {
    // do stuff with f
}

So I'm curious what your code path is like such that this would result in allocating the entire record on the heap? At worst it would allocate the Schema object on the heap? Unless I'm missing something?

@zeroshade
Copy link
Member

The rest of the changes look good to me!

@spiridonov
Copy link
Author

That is a great question! I also was confused in the beginning, because I saw those objects being allocated in a memory profile report. I did some research and I think that it happens sometimes because the compiler is conservative. If it is not 100% sure in the result of escape analysis (the value is indeed passed into another func that passes it into another func) or a structure is too large it falls back to allocate it on the heap instead of stack. I did build a sandbox example https://godbolt.org/z/h19PnWM7T which shows runtime.newobject clearly. Compare that with a simpler type https://godbolt.org/z/KMznfPxMx which does not allocate on the heap.

@spiridonov
Copy link
Author

@zeroshade ^

@zeroshade
Copy link
Member

Super weird. Though I'd expect records on the heap in general because they are interface types.

That said, could you avoid the heap alloc by making it a value receiver instead of a pointer receiver? The schema object is fairly small, so a value receiver would avoid the escape, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants