Add static_multiset::for_each and its OA impl by sleeepyjack · Pull Request #506 · NVIDIA/cuCollections

sleeepyjack · 2024-06-15T00:12:57Z

closes #499

sleeepyjack · 2024-06-15T00:59:22Z

include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh

+   * @param callback Function to call on every element found
+   */
+  template <class ProbeKey, class Callback>
+  __device__ void for_each(ProbeKey const& key, Callback callback) const noexcept


Suggested change

__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept

__device__ void for_each(ProbeKey const& key, Callback&& callback) const noexcept

Unsure if this needs to be a mutable or even universal reference instead. Let's say we define a count functor as such:

struct count_functor { std::size_t thread_count = 0; // counts the number of matching elements for this thread template <class InputIt> __device__ void operator()(InputIt) { thread_count++; } };

And then call

//... auto thread_counter = count_functor{}; set.for_each(key, thread_counter); auto const key_count = thread_counter.count; //...

Then we want the functor to be taken as a mutable reference, right?

pass by value is preferred.

The above example is a good example of a bad callback, especially in a parallel context

Is it best practice to pass a callback by-value? I'd have to skim some stackoverflow/cppreference pages to get familiar with the topic. With pass-by-value we lose the ability of giving the callback an internal state that can hold the result of the operation. How would we solve the above example with a callback passed by-value? Pass a pointer to thread_count to the callback?

Pass a pointer to thread_count to the callback?

Yes

I see a callable defining the operations to be performed on the output instead of being the output itself.

PointKernel · 2024-06-15T01:37:10Z

include/cuco/operator.hpp

+struct for_each_tag {
+} inline constexpr for_each;  ///< `cuco::for_each` operator


I see for_each as an internal utility as opposed to an actual hash table operator. Need to think more on this.

From my standpoint I would treat it as an extension to the STL API that is more suitable for the GPU. Having a "cooperative iterator" instead, which would be closer to the spirit of modern C++ has its drawbacks. For example, how do we ensure users only increment the iterator with the same CG? for_each solves this problem by making the probing part internal. We should even be able to redefine any lookup function (find, count, retrieve) that relies on probing with for_each, giving us a proper abstraction layer for probing.

On a side note I personally find this funtional approach, i.e., "for each found key do X" very appealing. Historic evidence that it is indeed useful comes from warpcore, where many downstream applications (mostly genomics stuff) implemented their custom lookup operations through for_each functors.

sleeepyjack · 2024-06-15T01:42:50Z

include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh

+   * @param callback Function to call on every element found
+   */
+  template <class ProbeKey, class Callback>
+  __device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,


Not sure why the unit test is failing. Seems like the logic in this function is flawed.

I think I found the problem. #509 should fix the issue.

…indows (required for shmem bounce buffer flushing during retrieve())

copy-pr-bot · 2024-06-25T02:00:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

sleeepyjack · 2024-06-25T02:07:14Z

/ok to test

WTH all of my commits are signed...

sleeepyjack · 2024-06-25T02:08:31Z

/ok to test

sleeepyjack · 2024-06-25T02:08:54Z

effing bot

PointKernel

LGTM

include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh

include/cuco/detail/static_multiset/static_multiset_ref.inl

sleeepyjack added type: feature request New feature request P1: Should have Necessary but not critical topic: static_multiset Issue related to the static_multiset labels Jun 15, 2024

sleeepyjack added this to the static_multiset milestone Jun 15, 2024

sleeepyjack self-assigned this Jun 15, 2024

sleeepyjack requested a review from PointKernel as a code owner June 15, 2024 00:12

Add static_multiset::for_each

f14f521

sleeepyjack force-pushed the for-each-new branch from c82aad6 to f14f521 Compare June 15, 2024 00:16

NVIDIA deleted a comment from copy-pr-bot bot Jun 15, 2024

sleeepyjack added 5 commits June 15, 2024 00:18

Add unit test

4108a41

Fix docstring

099503c

Remove newline

ca41a48

Add operator docs

e7a8e03

Fix unit test

7053703

sleeepyjack commented Jun 15, 2024

View reviewed changes

sleeepyjack added 2 commits June 15, 2024 01:35

Pass callback as universal reference

18c5f60

Remove unused operator members

22dab49

PointKernel reviewed Jun 15, 2024

View reviewed changes

sleeepyjack commented Jun 15, 2024

View reviewed changes

sleeepyjack added 5 commits June 24, 2024 14:00

Merge remote-tracking branch 'upstream/dev' into for-each-new

ee3dae7

Fix probing logic

bd309c1

Rename callback to make the usage a bit more clear

6f6e5ff

Add overload that allows for synchronizing the CG inbetween probing w…

7cd072c

…indows (required for shmem bounce buffer flushing during retrieve())

Merge remote-tracking branch 'upstream/dev' into for-each-new

d2ade79

sleeepyjack changed the base branch from feature/static_multiset to dev June 25, 2024 19:39

Merge remote-tracking branch 'upstream/dev' into for-each-new

6343ca1

sleeepyjack requested a review from PointKernel June 25, 2024 23:47

sleeepyjack added the Needs Review Awaiting reviews before merging label Jun 25, 2024

PointKernel approved these changes Jun 26, 2024

View reviewed changes

PointKernel reviewed Jun 26, 2024

View reviewed changes

include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh Show resolved Hide resolved

include/cuco/detail/static_multiset/static_multiset_ref.inl Show resolved Hide resolved

sleeepyjack merged commit a547e8f into NVIDIA:dev Jun 26, 2024

sleeepyjack deleted the for-each-new branch June 26, 2024 01:32

	__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept
	__device__ void for_each(ProbeKey const& key, Callback&& callback) const noexcept

		struct for_each_tag {
		} inline constexpr for_each; ///< `cuco::for_each` operator

Conversation

sleeepyjack commented Jun 15, 2024

Uh oh!

sleeepyjack Jun 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sleeepyjack Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 15, 2024

Choose a reason for hiding this comment

Uh oh!

sleeepyjack Jun 15, 2024

Choose a reason for hiding this comment

Uh oh!

sleeepyjack Jun 15, 2024

Choose a reason for hiding this comment

Uh oh!

sleeepyjack Jun 15, 2024

Choose a reason for hiding this comment

Uh oh!

sleeepyjack Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot bot commented Jun 25, 2024

Uh oh!

sleeepyjack commented Jun 25, 2024

Uh oh!

sleeepyjack commented Jun 25, 2024

Uh oh!

sleeepyjack commented Jun 25, 2024

Uh oh!

PointKernel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sleeepyjack Jun 15, 2024 •

edited

Loading

PointKernel Jun 17, 2024 •

edited

Loading