| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192 | 
							- /*
 
-  * This file is part of the StarPU Handbook.
 
-  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
 
-  * Copyright (C) 2010, 2011, 2012, 2013, 2014  CNRS
 
-  * Copyright (C) 2011, 2012 INRIA
 
-  * See the file version.doxy for copying conditions.
 
-  */
 
- /*! \page Scheduling Scheduling
 
- \section TaskSchedulingPolicy Task Scheduling Policy
 
- The basics of the scheduling policy are that
 
- <ul>
 
- <li>The scheduler gets to schedule tasks (<c>push</c> operation) when they become
 
- ready to be executed, i.e. they are not waiting for some tags, data dependencies
 
- or task dependencies.</li>
 
- <li>Workers pull tasks (<c>pop</c> operation) one by one from the scheduler.
 
- </ul>
 
- This means scheduling policies usually contain at least one queue of tasks to
 
- store them between the time when they become available, and the time when a
 
- worker gets to grab them.
 
- By default, StarPU uses the simple greedy scheduler <c>eager</c>. This is
 
- because it provides correct load balance even if the application codelets do not
 
- have performance models. If your application codelets have performance models
 
- (\ref PerformanceModelExample), you should change the scheduler thanks
 
- to the environment variable \ref STARPU_SCHED. For instance <c>export
 
- STARPU_SCHED=dmda</c> . Use <c>help</c> to get the list of available schedulers.
 
- The <b>eager</b> scheduler uses a central task queue, from which all workers draw tasks
 
- to work on concurrently. This however does not permit to prefetch data since the scheduling
 
- decision is taken late. If a task has a non-0 priority, it is put at the front of the queue.
 
- The <b>prio</b> scheduler also uses a central task queue, but sorts tasks by
 
- priority (between -5 and 5).
 
- The <b>random</b> scheduler uses a queue per worker, and distributes tasks randomly according to assumed worker
 
- overall performance.
 
- The <b>ws</b> (work stealing) scheduler uses a queue per worker, and schedules
 
- a task on the worker which released it by
 
- default. When a worker becomes idle, it steals a task from the most loaded
 
- worker.
 
- The <b>lws</b> (locality work stealing) scheduler uses a queue per worker, and schedules
 
- a task on the worker which released it by
 
- default. When a worker becomes idle, it steals a task from neighbour workers. It
 
- also takes into account priorities.
 
- The <b>dm</b> (deque model) scheduler uses task execution performance models into account to
 
- perform a HEFT-similar scheduling strategy: it schedules tasks where their
 
- termination time will be minimal. The difference with HEFT is that <b>dm</b>
 
- schedules tasks as soon as they become available, and thus in the order they
 
- become available, without taking priorities into account.
 
- The <b>dmda</b> (deque model data aware) scheduler is similar to dm, but it also takes
 
- into account data transfer time.
 
- The <b>dmdar</b> (deque model data aware ready) scheduler is similar to dmda,
 
- but it also sorts tasks on per-worker queues by number of already-available data
 
- buffers on the target device.
 
- The <b>dmdas</b> (deque model data aware sorted) scheduler is similar to dmdar,
 
- except that it sorts tasks by priority order, which allows to become even closer
 
- to HEFT by respecting priorities after having made the scheduling decision (but
 
- it still schedules tasks in the order they become available).
 
- The <b>heft</b> (heterogeneous earliest finish time) scheduler is a deprecated
 
- alias for <b>dmda</b>.
 
- The <b>pheft</b> (parallel HEFT) scheduler is similar to dmda, it also supports
 
- parallel tasks (still experimental). Should not be used when several contexts using
 
- it are being executed simultaneously.
 
- The <b>peager</b> (parallel eager) scheduler is similar to eager, it also
 
- supports parallel tasks (still experimental). Should not be used when several 
 
- contexts using it are being executed simultaneously.
 
- TODO: describe modular schedulers
 
- \section TaskDistributionVsDataTransfer Task Distribution Vs Data Transfer
 
- Distributing tasks to balance the load induces data transfer penalty. StarPU
 
- thus needs to find a balance between both. The target function that the
 
- scheduler <c>dmda</c> of StarPU
 
- tries to minimize is <c>alpha * T_execution + beta * T_data_transfer</c>, where
 
- <c>T_execution</c> is the estimated execution time of the codelet (usually
 
- accurate), and <c>T_data_transfer</c> is the estimated data transfer time. The
 
- latter is estimated based on bus calibration before execution start,
 
- i.e. with an idle machine, thus without contention. You can force bus
 
- re-calibration by running the tool <c>starpu_calibrate_bus</c>. The
 
- beta parameter defaults to <c>1</c>, but it can be worth trying to tweak it
 
- by using <c>export STARPU_SCHED_BETA=2</c> for instance, since during
 
- real application execution, contention makes transfer times bigger.
 
- This is of course imprecise, but in practice, a rough estimation
 
- already gives the good results that a precise estimation would give.
 
- \section Power-basedScheduling Power-based Scheduling
 
- If the application can provide some power performance model (through
 
- the field starpu_codelet::power_model), StarPU will
 
- take it into account when distributing tasks. The target function that
 
- the scheduler <c>dmda</c> minimizes becomes <c>alpha * T_execution +
 
- beta * T_data_transfer + gamma * Consumption</c> , where <c>Consumption</c>
 
- is the estimated task consumption in Joules. To tune this parameter, use
 
- <c>export STARPU_SCHED_GAMMA=3000</c> for instance, to express that each Joule
 
- (i.e kW during 1000us) is worth 3000us execution time penalty. Setting
 
- <c>alpha</c> and <c>beta</c> to zero permits to only take into account power consumption.
 
- This is however not sufficient to correctly optimize power: the scheduler would
 
- simply tend to run all computations on the most energy-conservative processing
 
- unit. To account for the consumption of the whole machine (including idle
 
- processing units), the idle power of the machine should be given by setting
 
- <c>export STARPU_IDLE_POWER=200</c> for 200W, for instance. This value can often
 
- be obtained from the machine power supplier.
 
- The power actually consumed by the total execution can be displayed by setting
 
- <c>export STARPU_PROFILING=1 STARPU_WORKER_STATS=1</c> .
 
- On-line task consumption measurement is currently only supported through the
 
- <c>CL_PROFILING_POWER_CONSUMED</c> OpenCL extension, implemented in the MoviSim
 
- simulator. Applications can however provide explicit measurements by
 
- using the function starpu_perfmodel_update_history() (examplified in \ref PerformanceModelExample
 
- with the <c>power_model</c> performance model). Fine-grain
 
- measurement is often not feasible with the feedback provided by the hardware, so
 
- the user can for instance run a given task a thousand times, measure the global
 
- consumption for that series of tasks, divide it by a thousand, repeat for
 
- varying kinds of tasks and task sizes, and eventually feed StarPU
 
- with these manual measurements through starpu_perfmodel_update_history().
 
- For instance, for CUDA devices, <c>nvidia-smi -q -d POWER</c> can be used to get
 
- the current consumption in Watt. Multiplying that value by the average duration
 
- of a single task gives the consumption of the task in Joules, which can be given
 
- to starpu_perfmodel_update_history().
 
- \section StaticScheduling Static Scheduling
 
- In some cases, one may want to force some scheduling, for instance force a given
 
- set of tasks to GPU0, another set to GPU1, etc. while letting some other tasks
 
- be scheduled on any other device. This can indeed be useful to guide StarPU into
 
- some work distribution, while still letting some degree of dynamism. For
 
- instance, to force execution of a task on CUDA0:
 
- \code{.c}
 
- task->execute_on_a_specific_worker = 1;
 
- task->worker = starpu_worker_get_by_type(STARPU_CUDA_WORKER, 0);
 
- \endcode
 
- One can also specify the order in which tasks must be executed by setting the
 
- starpu_task::workerorder field. If this field is set to a non-zero value, it
 
- provides the per-worker consecutive order in which tasks will be executed,
 
- starting from 1. For a given of such task, the worker will thus not execute
 
- it before all the tasks with smaller order value have been executed, notably
 
- in case those tasks are not available yet due to some dependencies. This
 
- eventually gives total control of task scheduling, and StarPU will only serve as
 
- a "self-timed" task runtime. Of course, the provided order has to be runnable,
 
- i.e. a task should should not depend on another task bound to the same worker
 
- with a bigger order.
 
- Note however that using scheduling contexts while statically scheduling tasks on workers
 
- could be tricky. Be careful to schedule the tasks exactly on the workers of the corresponding
 
- contexts, otherwise the workers' corresponding scheduling structures may not be allocated or
 
- the execution of the application may deadlock. Moreover, the hypervisor should not be used when
 
- statically scheduling tasks.
 
- \section DefiningANewSchedulingPolicy Defining A New Scheduling Policy
 
- A full example showing how to define a new scheduling policy is available in
 
- the StarPU sources in the directory <c>examples/scheduler/</c>.
 
- See \ref API_Scheduling_Policy
 
- \code{.c}
 
- static struct starpu_sched_policy dummy_sched_policy = {
 
-     .init_sched = init_dummy_sched,
 
-     .deinit_sched = deinit_dummy_sched,
 
-     .add_workers = dummy_sched_add_workers,
 
-     .remove_workers = dummy_sched_remove_workers,
 
-     .push_task = push_task_dummy,
 
-     .push_prio_task = NULL,
 
-     .pop_task = pop_task_dummy,
 
-     .post_exec_hook = NULL,
 
-     .pop_every_task = NULL,
 
-     .policy_name = "dummy",
 
-     .policy_description = "dummy scheduling strategy"
 
- };
 
- \endcode
 
- */
 
 
  |