Skip to content

sched refactor#260

Open
hexagonal-sun wants to merge 3 commits intomasterfrom
sched-refactor
Open

sched refactor#260
hexagonal-sun wants to merge 3 commits intomasterfrom
sched-refactor

Conversation

@hexagonal-sun
Copy link
Owner

  • sched: introduce Work as the unified scheduleable unit
    Refactor the scheduler so all scheduleable work is wrapped in Arc,
    replacing the previous per-CPU wait_q design where sleeping tasks were
    bound to a specific CPU. Wakers now hold direct Arc references and
    can re-enqueue tasks on any CPU upon wakeup.

    Key changes:

    • Add Work struct wrapping OwnedTask with an AtomicTaskState and
      scheduler metadata (SchedulerData), replacing the old SchedulableTask.
      Remove Task::state (Arc<SpinLock>). Work::state is now the
      single source of truth for task state.

    • Rewrite the run queue using BinaryHeap-based eligible/ineligible split
      (EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
      Extract vclock into its own module.

    • Rewrite wakers to hold Arc directly instead of looking up tasks
      by TaskDescriptor from TASK_LIST.

    • Replace lock-based sleep transitions in uspc_ret with atomic CAS
      (try_sleep_current) that correctly detects concurrent Woken state.

    • Simplify least-tasked-CPU metric to use only run-queue weight, since
      sleeping tasks are no longer bound to any CPU.

    • Add current_work() accessor.

  • sched: runqueue: derfer task drops and drop Finished tasks
    Fix two issues:

    1. When a task is dropped from the runqueue, it will trigger the
      destructors for the task to run. This may well call wakers to wake up
      parent processes, other ends of pipes, etc. If we do that while
      SCHED_STATE is still borrowed, this causes a double-borrow panic. Fix
      this by deferring all drops until after we have unlocked SCHED_STATE.

    2. Tasks inside the runqueue which are yet to be scheduled which become
      finished will be returned by find_next_task and the state will be set
      to TaskState::Running overwriting the fact that this task had
      Finished. We'd then queue this task forever. Filter finished tasks in
      find_next_task and add them to the defered drop list.

  • sched: runqueue: fix tick and runqueue insertion bug
    Ensure that tick() is called on the current task, and allow the task
    account decide whether we should try to switch to another task.

    Also, ensure accounting is updated for freshly inserted tasks into the
    runqueue.

Refactor the scheduler so all scheduleable work is wrapped in Arc<Work>,
replacing the previous per-CPU wait_q design where sleeping tasks were
bound to a specific CPU. Wakers now hold direct Arc<Work> references and
can re-enqueue tasks on any CPU upon wakeup.

Key changes:

- Add Work struct wrapping OwnedTask with an AtomicTaskState and
  scheduler metadata (SchedulerData), replacing the old SchedulableTask.
  Remove Task::state (Arc<SpinLock<TaskState>>). Work::state is now the
  single source of truth for task state.

- Rewrite the run queue using BinaryHeap-based eligible/ineligible split
  (EEVDF) with a dedicated VClock, replacing the BTreeMap linear scan.
  Extract vclock into its own module.

- Rewrite wakers to hold Arc<Work> directly instead of looking up tasks
  by TaskDescriptor from TASK_LIST.

- Replace lock-based sleep transitions in uspc_ret with atomic CAS
  (try_sleep_current) that correctly detects concurrent Woken state.

- Simplify least-tasked-CPU metric to use only run-queue weight, since
  sleeping tasks are no longer bound to any CPU.

- Add current_work() accessor.
Fix two issues:

1. When a task is dropped from the runqueue, it will trigger the
   destructors for the task to run. This may well call wakers to wake up
   parent processes, other ends of pipes, etc. If we do that while
   `SCHED_STATE` is still borrowed, this causes a double-borrow panic. Fix
   this by deferring all drops until after we have unlocked `SCHED_STATE`.

2. Tasks inside the runqueue which are yet to be scheduled which become
   finished will be returned by `find_next_task` and the state will be set
   to `TaskState::Running` overwriting the fact that this task had
   `Finish`ed. We'd then queue this task forever. Filter finished tasks in
   `find_next_task` and add them to the defered drop list.
Ensure that `tick()` is called on the current task, and allow the task
account decide whether we should try to switch to another task.

Also, ensure accounting is updated for freshly inserted tasks into the
runqueue.
Copy link
Collaborator

@arihant2math arihant2math left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// First 16 bits: CPU ID
/// Next 24 bits: Weight
/// Next 24 bits: Number of waiting tasks
/// Remaining 48 bits: Run-queue weight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I included the number of waiting tasks was to prevent a single CPU from taking on all tasks because they were bursty or I/O bound. Ideally we'd rather schedule onto a CPU with less waiting tasks, all else equal. Maybe we should consider total weight (run queue + waiting queue).

Not something that needs to change in the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably worth removing this atomic. I think linux does proper searching. However, for this we'd need a concept of "shared scheduler state".

///
/// Only exposes named transition methods to enforce state transition logic.
///
/// ```text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice diagram :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants