-
Notifications
You must be signed in to change notification settings - Fork 89
perf(arrow): Reduce the amount of allocated objects #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf(arrow): Reduce the amount of allocated objects #645
Conversation
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, just a couple questions / nit picks
|
Thank you for taking a look @zeroshade! I have changed a few things:
Each Loki query can touch hundreds of streams that are spread over thousands of data objects, which results in thousands of arrow records being processed. Small functions such as I find it a bit tricky with |
|
Most of our patterns is like following:
We already build fields for one time use and do not modify them later. So it feels inefficient to clone them right after Access to |
I'm confused here, why would yield allocate an entire record on the heap as opposed to just creating a copy of the func (s *Schema) FieldIter() iter.Seq[Field] {
return slices.Values(s.fields)
}and would then be used like: for f := range sc.FieldIter() {
// do stuff with f
}So I'm curious what your code path is like such that this would result in allocating the entire record on the heap? At worst it would allocate the Schema object on the heap? Unless I'm missing something? |
|
The rest of the changes look good to me! |
|
That is a great question! I also was confused in the beginning, because I saw those objects being allocated in a memory profile report. I did some research and I think that it happens sometimes because the compiler is conservative. If it is not 100% sure in the result of escape analysis (the value is indeed passed into another func that passes it into another func) or a structure is too large it falls back to allocate it on the heap instead of stack. I did build a sandbox example https://godbolt.org/z/h19PnWM7T which shows |
|
Super weird. Though I'd expect records on the heap in general because they are interface types. That said, could you avoid the heap alloc by making it a value receiver instead of a pointer receiver? The schema object is fairly small, so a value receiver would avoid the escape, right? |
Rationale for this change
apache/arrow-gois used in the new query engine in Grafana Loki and we are actively working on improving its performance. Here are few low hanging fruits that reduce allocations by many gigabytes when used on the hot path.What changes are included in this PR?
This PR reduces amount of allocated objects.
arrow/datatype_binary.go: straightforward, thoseLayoutobjects are allocated on each call currently.arrow/schema.go:Schema.Fields()does not return a clone anymore andNewSchemaWithEndian()does not clone inputs anymore. All call sites are func-signature compatible, but should not modify those args/results anymore. In our opinion this is a reasonable trade-off for higher performance. I am happy to discuss other ways to implement that.Are these changes tested?
apache/arrow-gounit test suite.grafana/lokitest suite for the query engine.Are there any user-facing changes?
Yes. Explained above.