Add static_multiset::for_each and its OA impl#506
Conversation
c82aad6 to
f14f521
Compare
| * @param callback Function to call on every element found | ||
| */ | ||
| template <class ProbeKey, class Callback> | ||
| __device__ void for_each(ProbeKey const& key, Callback callback) const noexcept |
There was a problem hiding this comment.
| __device__ void for_each(ProbeKey const& key, Callback callback) const noexcept | |
| __device__ void for_each(ProbeKey const& key, Callback&& callback) const noexcept |
Unsure if this needs to be a mutable or even universal reference instead. Let's say we define a count functor as such:
struct count_functor {
std::size_t thread_count = 0;
// counts the number of matching elements for this thread
template <class InputIt>
__device__ void operator()(InputIt) { thread_count++; }
};And then call
//...
auto thread_counter = count_functor{};
set.for_each(key, thread_counter);
auto const key_count = thread_counter.count;
//...Then we want the functor to be taken as a mutable reference, right?
There was a problem hiding this comment.
pass by value is preferred.
The above example is a good example of a bad callback, especially in a parallel context
There was a problem hiding this comment.
Is it best practice to pass a callback by-value? I'd have to skim some stackoverflow/cppreference pages to get familiar with the topic. With pass-by-value we lose the ability of giving the callback an internal state that can hold the result of the operation. How would we solve the above example with a callback passed by-value? Pass a pointer to thread_count to the callback?
There was a problem hiding this comment.
Pass a pointer to thread_count to the callback?
Yes
I see a callable defining the operations to be performed on the output instead of being the output itself.
| struct for_each_tag { | ||
| } inline constexpr for_each; ///< `cuco::for_each` operator |
There was a problem hiding this comment.
I see for_each as an internal utility as opposed to an actual hash table operator. Need to think more on this.
There was a problem hiding this comment.
From my standpoint I would treat it as an extension to the STL API that is more suitable for the GPU. Having a "cooperative iterator" instead, which would be closer to the spirit of modern C++ has its drawbacks. For example, how do we ensure users only increment the iterator with the same CG? for_each solves this problem by making the probing part internal. We should even be able to redefine any lookup function (find, count, retrieve) that relies on probing with for_each, giving us a proper abstraction layer for probing.
There was a problem hiding this comment.
On a side note I personally find this funtional approach, i.e., "for each found key do X" very appealing. Historic evidence that it is indeed useful comes from warpcore, where many downstream applications (mostly genomics stuff) implemented their custom lookup operations through for_each functors.
| * @param callback Function to call on every element found | ||
| */ | ||
| template <class ProbeKey, class Callback> | ||
| __device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group, |
There was a problem hiding this comment.
Not sure why the unit test is failing. Seems like the logic in this function is flawed.
There was a problem hiding this comment.
I think I found the problem. #509 should fix the issue.
…indows (required for shmem bounce buffer flushing during retrieve())
|
/ok to test WTH all of my commits are signed... |
|
/ok to test |
|
effing bot |
closes #499