### 🚀 The feature, motivation and pitch Run sampler (argmax, softmax for temperature > 0) on CUDA so that in the LLM workflow we don't have to memcpy logits to CPU and then sample. ### Alternatives _No response_ ### Additional context _No response_ ### RFC (Optional) _No response_