123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2013 Inria
- * Copyright (C) 2014,2016-2019 CNRS
- * Copyright (C) 2014,2017,2019 Université de Bordeaux
- * Copyright (C) 2013 Simon Archipoff
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page HowToDefineANewSchedulingPolicy How To Define A New Scheduling Policy
- \section Introduction Introduction
- presentation of both types of schedulers: the basic and the modular.
- Explain why it is easier to define a new scheduler by using the modular approach
- \section DefiningANewBasicSchedulingPolicy Defining A New Basic Scheduling Policy
- A full example showing how to define a new scheduling policy is available in
- the StarPU sources in <c>examples/scheduler/dummy_sched.c</c>.
- The scheduler has to provide methods:
- \code{.c}
- static struct starpu_sched_policy dummy_sched_policy =
- {
- .init_sched = init_dummy_sched,
- .deinit_sched = deinit_dummy_sched,
- .add_workers = dummy_sched_add_workers,
- .remove_workers = dummy_sched_remove_workers,
- .push_task = push_task_dummy,
- .pop_task = pop_task_dummy,
- .policy_name = "dummy",
- .policy_description = "dummy scheduling strategy"
- };
- \endcode
- The idea is that when a task becomes ready for execution, the
- starpu_sched_policy::push_task method is called to give the ready task to the
- scheduler. When a worker is idle, the starpu_sched_policy::pop_task method is
- called to get a task from the scheduler. It is up to the
- scheduler to implement what is between. A simple eager scheduler is for instance
- to make starpu_sched_policy::push_task push the task to a global list, and make
- starpu_sched_policy::pop_task pop from this list. A scheduler can also use
- starpu_push_local_task() to directly push tasks to a per-worker queue, and then
- starpu_does not even need to implement starpu_sched_policy::pop_task.
- If there are no ready tasks within the scheduler, it can just return \c NULL, and
- the worker will sleep.
- The \ref starpu_sched_policy section provides the exact rules that govern the
- methods of the policy.
- Make sure to have a look at the \ref API_Scheduling_Policy section, which
- provides a complete list of the functions available for writing advanced schedulers.
- This includes getting an estimation for a task computation completion with
- starpu_task_expected_length(), for the required data transfers with
- starpu_task_expected_data_transfer_time_for(), for the required energy with
- starpu_task_expected_energy(), etc. Other
- useful functions include starpu_transfer_bandwidth(), starpu_transfer_latency(),
- starpu_transfer_predict(), ...
- Usual functions can be used on tasks, for instance one can use the following to
- get the data size for a task.
- \code{.c}
- size = 0;
- write = 0;
- if (task->cl)
- for (i = 0; i < STARPU_TASK_GET_NBUFFERS(task); i++)
- {
- starpu_data_handle_t data = STARPU_TASK_GET_HANDLE(task, i)
- size_t datasize = starpu_data_get_size(data);
- size += datasize;
- if (STARPU_TASK_GET_MODE(task, i) & STARPU_W)
- write += datasize;
- }
- \endcode
- One can enumerate the workers with this iterator:
- \code{.c}
- struct starpu_worker_collection *workers = starpu_sched_ctx_get_worker_collection(sched_ctx_id);
- struct starpu_sched_ctx_iterator it;
- workers->init_iterator(workers, &it);
- while(workers->has_next(workers, &it))
- {
- unsigned worker = workers->get_next(workers, &it);
- ...
- }
- \endcode
- Task queues can be implemented with the starpu_task_list functions.
- To provide synchronization between workers, a per-worker lock exists to protect
- the data structures of a given worker. It is acquired around scheduler methods,
- so that the scheduler does not need any additional mutex to protect its per-worker data.
- In case the scheduler wants to access another scheduler's data, it should use
- starpu_worker_lock() and starpu_worker_unlock().
- Calling starpu_worker_lock(B) from a worker A will however thus make
- worker A wait for worker B to complete its scheduling method. That may be
- a problem if that method takes a long time, because it is e.g. computing a
- heuristic or waiting for another mutex. In such a case, worker B can call
- starpu_worker_relax_on() and starpu_worker_relax_off() around the section which
- takes a long time (and does not actually need protection), so that worker A can
- e.g. push tasks for worker B, and B will notice them once it gets back from its
- expensive computation.
- When the starpu_sched_policy::push_task method has pushed a task for another
- worker, one has to call starpu_wake_worker_relax_light() so that worker wakes up
- and picks it. If the task was pushed on a shared queue, one may want to only
- wake one idle worker. An example doing this is available in
- <c>src/sched_policies/eager_central_policy.c</c>.
- A variety of examples of
- advanced schedulers can be read in <c>src/sched_policies</c>, for
- instance <c>random_policy.c</c>, <c>eager_central_policy.c</c>,
- <c>work_stealing_policy.c</c> Code protected by
- <c>if (_starpu_get_nsched_ctxs() > 1)</c> can be ignored, this is for scheduling
- contexts, which is an experimental feature.
- \section DefiningANewModularSchedulingPolicy Defining A New Modular Scheduling Policy
- StarPU's Modularized Schedulers are made of individual Scheduling Components
- Modularizedly assembled as a Scheduling Tree. Each Scheduling Component has an
- unique purpose, such as prioritizing tasks or mapping tasks over resources.
- A typical Scheduling Tree is shown below.
- <pre>
- |
- starpu_push_task |
- |
- v
- Fifo_Component
- | ^
- | |
- v |
- Eager_Component
- | ^
- | |
- v |
- --------><--------------><--------
- | ^ | ^
- | | | |
- v | v |
- Fifo_Component Fifo_Component
- | ^ | ^
- | | | |
- v | v |
- Worker_Component Worker_Component
- </pre>
- When a task is pushed by StarPU in a Modularized Scheduler, the task moves from
- a Scheduling Component to an other, following the hierarchy of the
- Scheduling Tree, and is stored in one of the Scheduling Components of the
- strategy.
- When a worker wants to pop a task from the Modularized Scheduler, the
- corresponding Worker Component of the Scheduling Tree tries to pull a task from
- its parents, following the hierarchy, and gives it to the worker if it succeded
- to get one.
- \subsection ExistingModularizedSchedulers Existing Modularized Schedulers
- StarPU is currently shipped with the following pre-defined Modularized
- Schedulers :
- - Eager-based Schedulers (with/without prefetching) : \n
- Naive scheduler, which tries to map a task on the first available resource
- it finds.
- - Prio-based Schedulers (with/without prefetching) : \n
- Similar to Eager-Based Schedulers. Can handle tasks which have a defined
- priority and schedule them accordingly.
- - Random-based Schedulers (with/without prefetching) : \n
- Selects randomly a resource to be mapped on for each task.
- - HEFT Scheduler : \n
- Heterogeneous Earliest Finish Time Scheduler.
- This scheduler needs that every task submitted to StarPU have a
- defined performance model (\ref PerformanceModelCalibration)
- to work efficiently, but can handle tasks without a performance
- model.
- To use one of these schedulers, one can set the environment variable \ref STARPU_SCHED.
- All modularized schedulers are named following the RE <c>tree-*</c>
- \subsection ExampleTreeEagerPrefetchingStrategy An Example : The Tree-Eager-Prefetching Strategy
- <pre>
- |
- starpu_push_task |
- |
- v
- Fifo_Component
- | ^
- Push | | Can_Push
- v |
- Eager_Component
- | ^
- | |
- v |
- --------><-------------------><---------
- | ^ | ^
- Push | | Can_Push Push | | Can_Push
- v | v |
- Fifo_Component Fifo_Component
- | ^ | ^
- Pull | | Can_Pull Pull | | Can_Pull
- v | v |
- Worker_Component Worker_Component
- </pre>
- \subsection Interface
- Each Scheduling Component must follow the following pre-defined Interface
- to be able to interact with other Scheduling Components.
- - Push (Caller_Component, Child_Component, Task) \n
- The calling Scheduling Component transfers a task to its
- Child Component. When the Push function returns, the task no longer
- belongs to the calling Component. The Modularized Schedulers'
- model relies on this function to perform prefetching.
- See starpu_sched_component::push_task for more details
- - Pull (Caller_Component, Parent_Component) -> Task \n
- The calling Scheduling Component requests a task from
- its Parent Component. When the Pull function ends, the returned
- task belongs to the calling Component.
- See starpu_sched_component::pull_task for more details
- - Can_Push (Caller_Component, Parent_Component) \n
- The calling Scheduling Component notifies its Parent Component that
- it is ready to accept new tasks.
- See starpu_sched_component::can_push for more details
- - Can_Pull (Caller_Component, Child_Component) \n
- The calling Scheduling Component notifies its Child Component
- that it is ready to give new tasks.
- See starpu_sched_component::can_pull for more details
- \subsection BuildAModularizedScheduler Building a Modularized Scheduler
- \subsubsection PreImplementedComponents Pre-implemented Components
- StarPU is currently shipped with the following four Scheduling Components :
- - Flow-control Components : Fifo, Prio \n
- Components which store tasks. They can also prioritize them if
- they have a defined priority. It is possible to define a threshold
- for those Components following two criterias : the number of tasks
- stored in the Component, or the sum of the expected length of all
- tasks stored in the Component.
- - Resource-Mapping Components : Mct, Heft, Eager, Random, Work-Stealing \n
- "Core" of the Scheduling Strategy, those Components are the
- ones who make scheduling choices.
- - Worker Components : Worker \n
- Each Worker Component modelize a concrete worker.
- - Special-Purpose Components : Perfmodel_Select, Best_Implementation \n
- Components dedicated to original purposes. The Perfmodel_Select
- Component decides which Resource-Mapping Component should be used to
- schedule a task. The Best_Implementation Component chooses which
- implementation of a task should be used on the choosen resource.
- \subsubsection ProgressionAndValidationRules Progression And Validation Rules
- Some rules must be followed to ensure the correctness of a Modularized
- Scheduler :
- - At least one Flow-control Component without threshold per Worker Component
- is needed in a Modularized Scheduler, to store incoming tasks from StarPU
- and to give tasks to Worker Components who asks for it. It is possible to
- use one Flow-control Component per Worker Component, or one for all Worker
- Components, depending on how the Scheduling Tree is defined.
- - At least one Resource-Mapping Component is needed in a Modularized
- Scheduler. Resource-Mapping Components are the only ones who can make
- scheduling choices, and so the only ones who can have several child.
- \subsubsection ImplementAModularizedScheduler Implementing a Modularized Scheduler
- The following code shows how the Tree-Eager-Prefetching Scheduler
- shown in Section \ref ExampleTreeEagerPrefetchingStrategy is implemented :
- \code{.c}
- #define _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT 2
- #define _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT 1000000000.0
- static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
- {
- unsigned ntasks_threshold = _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT;
- double exp_len_threshold = _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT;
- [...]
- starpu_sched_ctx_create_worker_collection
- (sched_ctx_id, STARPU_WORKER_LIST);
- /* Create the Scheduling Tree */
- struct starpu_sched_tree * t = starpu_sched_tree_create(sched_ctx_id);
- /* The Root Component is a Flow-control Fifo Component */
- t->root = starpu_sched_component_fifo_create(NULL);
- /* The Resource-mapping Component of the strategy is an Eager Component
- */
- struct starpu_sched_component *eager_component = starpu_sched_component_eager_create(NULL);
- /* Create links between Components : the Eager Component is the child
- * of the Root Component */
- t->root->add_child(t->root, eager_component);
- eager_component->add_father(eager_component, t->root);
- /* A task threshold is set for the Flow-control Components which will
- * be connected to Worker Components. By doing so, this Modularized
- * Scheduler will be able to perform some prefetching on the resources
- */
- struct starpu_sched_component_fifo_data fifo_data =
- {
- .ntasks_threshold = ntasks_threshold,
- .exp_len_threshold = exp_len_threshold,
- };
- unsigned i;
- for(i = 0; i < starpu_worker_get_count() + starpu_combined_worker_get_count(); i++)
- {
- /* Each Worker Component has a Flow-control Fifo Component as
- * father */
- struct starpu_sched_component * worker_component = starpu_sched_component_worker_new(i);
- struct starpu_sched_component * fifo_component = starpu_sched_component_fifo_create(&fifo_data);
- fifo_component->add_child(fifo_component, worker_component);
- worker_component->add_father(worker_component, fifo_component);
- /* Each Flow-control Fifo Component associated to a Worker
- * Component is linked to the Eager Component as one of its
- * children */
- eager_component->add_child(eager_component, fifo_component);
- fifo_component->add_father(fifo_component, eager_component);
- }
- starpu_sched_tree_update_workers(t);
- starpu_sched_ctx_set_policy_data(sched_ctx_id, (void*)t);
- }
- /* Properly destroy the Scheduling Tree and all its Components */
- static void deinitialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
- {
- struct starpu_sched_tree * tree = (struct starpu_sched_tree*)starpu_sched_ctx_get_policy_data(sched_ctx_id);
- starpu_sched_tree_destroy(tree);
- starpu_sched_ctx_delete_worker_collection(sched_ctx_id);
- }
- /* Initializing the starpu_sched_policy struct associated to the Modularized
- * Scheduler : only the init_sched and deinit_sched needs to be defined to
- * implement a Modularized Scheduler */
- struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy =
- {
- .init_sched = initialize_eager_prefetching_center_policy,
- .deinit_sched = deinitialize_eager_prefetching_center_policy,
- .add_workers = starpu_sched_tree_add_workers,
- .remove_workers = starpu_sched_tree_remove_workers,
- .push_task = starpu_sched_tree_push_task,
- .pop_task = starpu_sched_tree_pop_task,
- .pre_exec_hook = starpu_sched_component_worker_pre_exec_hook,
- .post_exec_hook = starpu_sched_component_worker_post_exec_hook,
- .pop_every_task = NULL,
- .policy_name = "tree-eager-prefetching",
- .policy_description = "eager with prefetching tree policy"
- };
- \endcode
- \subsection WriteASchedulingComponent Writing a Scheduling Component
- \subsubsection GenericSchedulingComponent Generic Scheduling Component
- Each Scheduling Component is instantiated from a Generic Scheduling Component,
- which implements a generic version of the Interface. The generic implementation
- of Pull, Can_Pull and Can_Push functions are recursive calls to their parents
- (respectively to their children). However, as a Generic Scheduling Component do
- not know how much children it will have when it will be instantiated, it does
- not implement the Push function.
- \subsubsection InstantiationRedefineInterface Instantiation : Redefining the Interface
- A Scheduling Component must implement all the functions of the Interface. It is
- so necessary to implement a Push function to instantiate a Scheduling Component.
- The implemented Push function is the "fingerprint" of a Scheduling Component.
- Depending on how functionalities or properties the programmer wants to give
- to the Scheduling Component he is implementing, it is possible to reimplement
- all the functions of the Interface. For example, a Flow-control Component
- reimplements the Pull and the Can_Push functions of the Interface, allowing him
- to catch the generic recursive calls of these functions. The Pull function of
- a Flow-control Component can, for example, pop a task from the local storage
- queue of the Component, and give it to the calling Component which asks for it.
- \subsubsection DetailedProgressionAndValidationRules Detailed Progression and Validation Rules
- - A Reservoir is a Scheduling Component which redefines a Push and a Pull
- function, in order to store tasks into it. A Reservoir delimit Scheduling
- Areas in the Scheduling Tree.
- - A Pump is the engine source of the Scheduler : it pushes/pulls tasks
- to/from a Scheduling Component to an other. Native Pumps of a Scheduling
- Tree are located at the root of the Tree (incoming Push calls from StarPU),
- and at the leafs of the Tree (Pop calls coming from StarPU Workers).
- Pre-implemented Scheduling Components currently shipped with Pumps are
- Flow-Control Components and the Resource-Mapping Component Heft, within
- their defined Can_Push functions.
- - A correct Scheduling Tree requires a Pump per Scheduling Area and per
- Execution Flow.
- The Tree-Eager-Prefetching Scheduler shown in Section
- \ref ExampleTreeEagerPrefetchingStrategy follows the previous assumptions :
- <pre>
- starpu_push_task
- <b>Pump</b>
- |
- Area 1 |
- |
- v
- -----------------------Fifo_Component-----------------------------
- <b>Pump</b>
- | ^
- Push | | Can_Push
- v |
- Area 2 Eager_Component
- | ^
- | |
- v |
- --------><-------------------><---------
- | ^ | ^
- Push | | Can_Push Push | | Can_Push
- v | v |
- -----Fifo_Component-----------------------Fifo_Component----------
- | ^ | ^
- Pull | | Can_Pull Pull | | Can_Pull
- Area 3 v | v |
- <b>Pump</b> <b>Pump</b>
- Worker_Component Worker_Component
- </pre>
- \section GraphScheduling Graph-based Scheduling
- For performance reasons, most of the schedulers shipped with StarPU use simple
- list-scheduling heuristics, assuming that the application has already set
- priorities. This is why they do their scheduling between when tasks become
- available for execution and when a worker becomes idle, without looking at the
- task graph.
- Other heuristics can however look at the task graph. Recording the task graph
- is expensive, so it is not available by default, the scheduling heuristic has
- to set \c _starpu_graph_record to \c 1 from the initialization function, to make it
- available. Then the <c>_starpu_graph*</c> functions can be used.
- <c>src/sched_policies/graph_test_policy.c</c> is an example of simple greedy
- policy which automatically computes priorities by bottom-up rank.
- The idea is that while the application submits tasks, they are only pushed
- to a bag of tasks. When the application is finished with submitting tasks,
- it calls starpu_do_schedule() (or starpu_task_wait_for_all(), which calls
- starpu_do_schedule()), and the starpu_sched_policy::do_schedule method of the
- scheduler is called. This method calls _starpu_graph_compute_depths to compute
- the bottom-up ranks, and then uses these ranks to set priorities over tasks.
- It then has two priority queues, one for CPUs, and one for GPUs, and uses a dumb
- heuristic based on the duration of the task over CPUs and GPUs to decide between
- the two queues. CPU workers can then pop from the CPU priority queue, and GPU
- workers from the GPU priority queue.
- \section DebuggingScheduling Debugging Scheduling
- All the \ref OnlinePerformanceTools and \ref OfflinePerformanceTools can
- be used to get information about how well the execution proceeded, and thus the
- overall quality of the execution.
- Precise debugging can also be performed by using the
- \ref STARPU_TASK_BREAK_ON_PUSH, \ref STARPU_TASK_BREAK_ON_SCHED,
- \ref STARPU_TASK_BREAK_ON_POP, and \ref STARPU_TASK_BREAK_ON_EXEC environment variables.
- By setting the job_id of a task
- in these environment variables, StarPU will raise <c>SIGTRAP</c> when the task is being
- scheduled, pushed, or popped by the scheduler. This means that when one notices
- that a task is being scheduled in a seemingly odd way, one can just reexecute
- the application in a debugger, with some of those variables set, and the
- execution will stop exactly at the scheduling points of this task, thus allowing
- to inspect the scheduler state, etc.
- */
|