Job System¶
Portal Framework provides a task-based cooperative multitasking system built on C++20 coroutines. This architecture allows for efficient utilization of multi-core processors while providing a high-level, asynchronous programming model.
Architecture Overview¶
The job system is designed around a work-stealing scheduler that manages a fixed pool of worker threads.
flowchart TB
subgraph arch[Job System Architecture]
direction TB
Scheduler[Scheduler]
subgraph Workers[Worker Pool]
W1[Worker 1<br/>Local Queue + Cache]
W2[Worker 2<br/>Local Queue + Cache]
WN[Worker N<br/>Local Queue + Cache]
end
GlobalQueue[Global Queue]
Scheduler --> Workers
GlobalQueue -.-> Workers
end
subgraph ops[Worker Operations Priority]
direction TB
A[1. Pop from own Local Queue] --> B[2. Steal from other Workers]
B --> C[3. Dequeue from Global Queue]
end
Key Principles¶
Non-Blocking: Jobs are coroutines. When a job needs to wait for another job, it suspends itself, allowing the worker thread to pick up other work.
Work Stealing: To prevent load imbalance, idle worker threads “steal” jobs from other workers’ queues.
Prioritization: The scheduler supports different priority levels (Low, Normal, High) to ensure critical tasks are handled first.
Locality: Workers prioritize their own local queue and a small job cache to improve cache hits and reduce contention.
Core Components¶
Job¶
A portal::Job is the primary unit of work. It is a coroutine that can return a value
of type T (or void).
Managed Lifecycle: Jobs are owned by the scheduler once dispatched.
Awaitable: You can
co_awaita job to suspend the current coroutine until the awaited job completes. The result ofco_awaitis astd::expected<T, JobResultStatus>, allowing for robust error propagation.Result Access: When using
wait_for_jobinstead, the result is unwrapped and returned asTdirectly.
Scheduler¶
The portal::jobs::Scheduler is the heart of the system. It initializes worker threads and manages the distribution of work.
Worker Threads: By default, the scheduler spawns one worker thread per hardware core (minus one for the main thread). This can be configured via the constructor.
Dispatching: Use
dispatch_jobto submit a single job, ordispatch_jobsto submit a batch. Both are fire-and-forget; they return immediately without waiting for completion.Waiting: Use
wait_for_joborwait_for_counterto wait until work is complete. While waiting, the calling thread continues to process other available jobs rather than blocking idle.Main Thread Participation: The
main_thread_do_work()method allows the main thread to participate in job execution, processing available work from the global queue.
Counter¶
A portal::jobs::Counter is a synchronization primitive used to track the completion of multiple jobs.
Batching: You can dispatch multiple jobs and associate them with a single counter by passing it to
dispatch_jobordispatch_jobs.Synchronization: Use
scheduler.wait_for_counter(counter)to wait until all associated jobs have finished.
Execution Flow¶
Dispatch: A job is created and passed to
Scheduler::dispatch_job.Queueing: The job is placed in a local queue (if dispatched from a worker) or the global queue.
Acquisition: A worker thread pulls the job from its local queue, steals it from another worker, or fetches it from the global queue.
Execution: The worker thread resumes the job coroutine.
Suspension: If the job
co_awaits another job or a counter, it suspends, and the worker moves to the next available task. A job can also voluntarily yield by usingco_await SuspendJob(), which re-queues the job and returns control to the worker thread.Completion: When the coroutine reaches its end, it notifies any waiting continuations or counters.
Thread Safety Guarantees¶
Scheduler API: Most scheduler methods (
dispatch_job,wait_for_job) are thread-safe and can be called from any thread.Worker Queues: The local queues use lock-free or low-contention designs to minimize overhead during work stealing. Each worker maintains separate local and stealable queue sets to reduce contention during normal operation.
Data Access: While the job system manages execution, it does not automatically synchronize access to shared data. Developers must use standard synchronization primitives (mutexes, atomics) or ensure data is uniquely owned by a job.
Best Practices¶
Avoid Long-Running Blocking Calls: Since worker threads are a limited resource, avoid
std::mutex::lock()or other blocking I/O within a job. Use async alternatives or suspend the job.Grain Size: Jobs should be large enough to justify the scheduling overhead but small enough to allow for effective parallelization.
Prefer co_await: Within a job, always use
co_await other_jobinstead ofwait_for_jobto allow the worker thread to remain productive.Use Counters for Batch Work: When dispatching many independent jobs, associate them with a single
Counterand wait on it, rather than waiting on each job individually.