Open
Conversation
Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>, replacing the previous per-CPU wait_q design where sleeping tasks were bound to a specific CPU. Wakers now hold direct Arc<Work> references and can re-enqueue tasks on any CPU upon wakeup. Key changes: - Add Work struct wrapping OwnedTask with an AtomicTaskState and scheduler metadata (SchedulerData), replacing the old SchedulableTask. Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the single source of truth for task state. - Rewrite the run queue using BinaryHeap-based eligible/ineligible split (EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan. Extract vclock into its own module. - Rewrite wakers to hold Arc<Work> directly instead of looking up tasks by TaskDescriptor from TASK_LIST. - Replace lock-based sleep transitions in uspc_ret with atomic CAS (try_sleep_current) that correctly detects concurrent Woken state. - Simplify least-tasked-CPU metric to use only run-queue weight, since sleeping tasks are no longer bound to any CPU. - Add current_work() accessor.
Fix two issues: 1. When a task is dropped from the runqueue, it will trigger the destructors for the task to run. This may well call wakers to wake up parent processes, other ends of pipes, etc. If we do that while `SCHED_STATE` is still borrowed, this causes a double-borrow panic. Fix this by deferring all drops until after we have unlocked `SCHED_STATE`. 2. Tasks inside the runqueue which are yet to be scheduled which become finished will be returned by `find_next_task` and the state will be set to `TaskState::Running` overwriting the fact that this task had `Finish`ed. We'd then queue this task forever. Filter finished tasks in `find_next_task` and add them to the defered drop list.
Ensure that `tick()` is called on the current task, and allow the task account decide whether we should try to switch to another task. Also, ensure accounting is updated for freshly inserted tasks into the runqueue.
| /// First 16 bits: CPU ID | ||
| /// Next 24 bits: Weight | ||
| /// Next 24 bits: Number of waiting tasks | ||
| /// Remaining 48 bits: Run-queue weight |
Collaborator
There was a problem hiding this comment.
The reason I included the number of waiting tasks was to prevent a single CPU from taking on all tasks because they were bursty or I/O bound. Ideally we'd rather schedule onto a CPU with less waiting tasks, all else equal. Maybe we should consider total weight (run queue + waiting queue).
Not something that needs to change in the PR.
Collaborator
There was a problem hiding this comment.
Also probably worth removing this atomic. I think linux does proper searching. However, for this we'd need a concept of "shared scheduler state".
| /// | ||
| /// Only exposes named transition methods to enforce state transition logic. | ||
| /// | ||
| /// ```text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sched: introduce Work as the unified scheduleable unit
Refactor the scheduler so all scheduleable work is wrapped in Arc,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc references and
can re-enqueue tasks on any CPU upon wakeup.
Key changes:
Add Work struct wrapping OwnedTask with an AtomicTaskState and
scheduler metadata (SchedulerData), replacing the old SchedulableTask.
Remove Task::state (Arc<SpinLock>). Work::state is now the
single source of truth for task state.
Rewrite the run queue using BinaryHeap-based eligible/ineligible split
(EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
Extract vclock into its own module.
Rewrite wakers to hold Arc directly instead of looking up tasks
by TaskDescriptor from TASK_LIST.
Replace lock-based sleep transitions in uspc_ret with atomic CAS
(try_sleep_current) that correctly detects concurrent Woken state.
Simplify least-tasked-CPU metric to use only run-queue weight, since
sleeping tasks are no longer bound to any CPU.
Add current_work() accessor.
sched: runqueue: derfer task drops and drop Finished tasks
Fix two issues:
When a task is dropped from the runqueue, it will trigger the
destructors for the task to run. This may well call wakers to wake up
parent processes, other ends of pipes, etc. If we do that while
SCHED_STATEis still borrowed, this causes a double-borrow panic. Fixthis by deferring all drops until after we have unlocked
SCHED_STATE.Tasks inside the runqueue which are yet to be scheduled which become
finished will be returned by
find_next_taskand the state will be setto
TaskState::Runningoverwriting the fact that this task hadFinished. We'd then queue this task forever. Filter finished tasks infind_next_taskand add them to the defered drop list.sched: runqueue: fix tick and runqueue insertion bug
Ensure that
tick()is called on the current task, and allow the taskaccount decide whether we should try to switch to another task.
Also, ensure accounting is updated for freshly inserted tasks into the
runqueue.