| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570 | /* StarPU --- Runtime system for heterogeneous multicore architectures. * * Copyright (C) 2013                                     Inria * Copyright (C) 2014,2016-2019                           CNRS * Copyright (C) 2014,2017,2019                           Université de Bordeaux * Copyright (C) 2013                                     Simon Archipoff * * StarPU is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or (at * your option) any later version. * * StarPU is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * See the GNU Lesser General Public License in COPYING.LGPL for more details. *//*! \page HowToDefineANewSchedulingPolicy How To Define A New Scheduling Policy\section Introduction IntroductionStarPU provides two ways of defining a scheduling policy, a basic monolithicway, and a modular way.The basic monolithic way is directly connected with the core of StarPU, whichmeans that the policy then has to handle all performance details, such as dataprefetching, task performance model calibration, worker locking, etc.<c>examples/scheduler/dummy_sched.c</c> is a trivial example which does nothandle this, and thus e.g. does not achieve any data prefetching or smartscheduling.The modular way allows to implement just one component, andreuse existing components to cope with all these details.<c>examples/scheduler/dummy_modular_sched.c</c> is a trivial example verysimilar to <c>dummy_sched.c</c>, but implemented as a component, which allows toassemble it with other components, and notably get data prefetching support forfree, and task performance model calibration is properly performed, which allowsto easily extend it into taking task duration into account, etc.\section DefiningANewBasicSchedulingPolicy Defining A New Basic Scheduling PolicyA full example showing how to define a new scheduling policy is available inthe StarPU sources in <c>examples/scheduler/dummy_sched.c</c>.The scheduler has to provide methods:\code{.c}static struct starpu_sched_policy dummy_sched_policy ={    .init_sched = init_dummy_sched,    .deinit_sched = deinit_dummy_sched,    .add_workers = dummy_sched_add_workers,    .remove_workers = dummy_sched_remove_workers,    .push_task = push_task_dummy,    .pop_task = pop_task_dummy,    .policy_name = "dummy",    .policy_description = "dummy scheduling strategy"};\endcodeThe idea is that when a task becomes ready for execution, thestarpu_sched_policy::push_task method is called to give the ready task to thescheduler. When a worker is idle, the starpu_sched_policy::pop_task method iscalled to get a task from the scheduler. It is up to thescheduler to implement what is between. A simple eager scheduler is for instanceto make starpu_sched_policy::push_task push the task to a global list, and makestarpu_sched_policy::pop_task pop from this list. A scheduler can also usestarpu_push_local_task() to directly push tasks to a per-worker queue, and thenstarpu_does not even need to implement starpu_sched_policy::pop_task.If there are no ready tasks within the scheduler, it can just return \c NULL, andthe worker will sleep.The \ref starpu_sched_policy section provides the exact rules that govern themethods of the policy.Make sure to have a look at the \ref API_Scheduling_Policy section, whichprovides a complete list of the functions available for writing advanced schedulers.This includes getting an estimation for a task computation completion withstarpu_task_expected_length(), for the required data transfers withstarpu_task_expected_data_transfer_time_for(), for the required energy withstarpu_task_expected_energy(), etc. Otheruseful functions include starpu_transfer_bandwidth(), starpu_transfer_latency(),starpu_transfer_predict(), ...Usual functions can be used on tasks, for instance one can use the following toget the data size for a task.\code{.c}size = 0;write = 0;if (task->cl)    for (i = 0; i < STARPU_TASK_GET_NBUFFERS(task); i++)    {        starpu_data_handle_t data = STARPU_TASK_GET_HANDLE(task, i)	size_t datasize = starpu_data_get_size(data);        size += datasize;	if (STARPU_TASK_GET_MODE(task, i) & STARPU_W)	    write += datasize;    }\endcodeOne can enumerate the workers with this iterator:\code{.c}struct starpu_worker_collection *workers = starpu_sched_ctx_get_worker_collection(sched_ctx_id);struct starpu_sched_ctx_iterator it;workers->init_iterator(workers, &it);while(workers->has_next(workers, &it)){	unsigned worker = workers->get_next(workers, &it);	...}\endcodeTask queues can be implemented with the starpu_task_list functions.To provide synchronization between workers, a per-worker lock exists to protectthe data structures of a given worker. It is acquired around scheduler methods,so that the scheduler does not need any additional mutex to protect its per-worker data.In case the scheduler wants to access another scheduler's data, it should usestarpu_worker_lock() and starpu_worker_unlock().Calling \code{.c}starpu_worker_lock(B)\endcode from a worker \c A will however thus makeworker \c A wait for worker \c B to complete its scheduling method. That may bea problem if that method takes a long time, because it is e.g. computing aheuristic or waiting for another mutex, or even cause deadlocks if worker \c B iscalling \code{.c}starpu_worker_lock(A)\endcode	 at the same time. In such a case, worker \c B mustcall starpu_worker_relax_on() and starpu_worker_relax_off() around the sectionwhich potentially blocks (and does not actually need protection). While a workeris in relaxed mode, e.g. between a pair of starpu_worker_relax_on() andstarpu_worker_relax_off() calls, its state can be altered by other threads: forinstance, worker \c A can push tasks for worker \c B. In consequence, worker \c Bmust re-assess its state after \code{.c}starpu_worker_relax_off(B)\endcode, such as takingpossible new tasks pushed to its queue into account.When the starpu_sched_policy::push_task method has pushed a task for anotherworker, one has to call starpu_wake_worker_relax_light() so that worker wakes upand picks it. If the task was pushed on a shared queue, one may want to onlywake one idle worker. An example doing this is available in<c>src/sched_policies/eager_central_policy.c</c>.A variety of examples ofadvanced schedulers can be read in <c>src/sched_policies</c>, forinstance <c>random_policy.c</c>, <c>eager_central_policy.c</c>,<c>work_stealing_policy.c</c> Code protected by<c>if (_starpu_get_nsched_ctxs() > 1)</c> can be ignored, this is for schedulingcontexts, which is an experimental feature.\section DefiningANewModularSchedulingPolicy Defining A New Modular Scheduling PolicyStarPU's Modularized Schedulers are made of individual Scheduling ComponentsModularizedly assembled as a Scheduling Tree. Each Scheduling Component has anunique purpose, such as prioritizing tasks or mapping tasks over resources.A typical Scheduling Tree is shown below.<pre>                                  |              starpu_push_task    |                                  |                                  v                            Fifo_Component                                |  ^                                |  |                                v  |                           Eager_Component                                |  ^                                |  |                                v  |                 --------><--------------><--------                 |  ^                          |  ^                 |  |                          |  |                 v  |                          v  |             Fifo_Component                 Fifo_Component                 |  ^                          |  ^                 |  |                          |  |                 v  |                          v  |            Worker_Component               Worker_Component</pre>When a task is pushed by StarPU in a Modularized Scheduler, the task moves froma Scheduling Component to an other, following the hierarchy of theScheduling Tree, and is stored in one of the Scheduling Components of thestrategy.When a worker wants to pop a task from the Modularized Scheduler, thecorresponding Worker Component of the Scheduling Tree tries to pull a task fromits parents, following the hierarchy, and gives it to the worker if it succededto get one.\subsection ExistingModularizedSchedulers Existing Modularized SchedulersStarPU is currently shipped with the following pre-defined ModularizedSchedulers :- Eager-based Schedulers (with/without prefetching) : \nNaive scheduler, which tries to map a task on the first available resourceit finds.- Prio-based Schedulers (with/without prefetching) : \nSimilar to Eager-Based Schedulers. Can handle tasks which have a definedpriority and schedule them accordingly.- Random-based Schedulers (with/without prefetching) : \nSelects randomly a resource to be mapped on for each task.- HEFT Scheduler : \nHeterogeneous Earliest Finish Time Scheduler.This scheduler needs that every task submitted to StarPU have adefined performance model (\ref PerformanceModelCalibration)to work efficiently, but can handle tasks without a performancemodel.To use one of these schedulers, one can set the environment variable \ref STARPU_SCHED.All modularized schedulers are named following the RE <c>tree-*</c>\subsection ExampleTreeEagerPrefetchingStrategy An Example : The Tree-Eager-Prefetching Strategy<pre>                                 |             starpu_push_task    |                                 |                                 v                           Fifo_Component                                |  ^                        Push    |  |    Can_Push                                v  |                          Eager_Component                                |  ^                                |  |                                v  |              --------><-------------------><---------              |  ^                                |  ^      Push    |  |    Can_Push            Push    |  |    Can_Push              v  |                                v  |         Fifo_Component                       Fifo_Component              |  ^                                |  ^      Pull    |  |    Can_Pull            Pull    |  |    Can_Pull              v  |                                v  |        Worker_Component                     Worker_Component</pre>\subsection InterfaceEach Scheduling Component must follow the following pre-defined Interfaceto be able to interact with other Scheduling Components.	- Push (Caller_Component, Child_Component, Task) \n	The calling Scheduling Component transfers a task to its	Child Component. When the Push function returns, the task no longer	belongs to the calling Component. The Modularized Schedulers'	model relies on this function to perform prefetching.	See starpu_sched_component::push_task for more details	- Pull (Caller_Component, Parent_Component)  ->  Task \n	The calling Scheduling Component requests a task from	its Parent Component. When the Pull function ends, the returned	task belongs to the calling Component.	See starpu_sched_component::pull_task for more details	- Can_Push (Caller_Component, Parent_Component) \n	The calling Scheduling Component notifies its Parent Component that	it is ready to accept new tasks.	See starpu_sched_component::can_push for more details	- Can_Pull (Caller_Component, Child_Component) \n	The calling Scheduling Component notifies its Child Component	that it is ready to give new tasks.	See starpu_sched_component::can_pull for more details\subsection BuildAModularizedScheduler Building a Modularized Scheduler\subsubsection PreImplementedComponents Pre-implemented ComponentsStarPU is currently shipped with the following four Scheduling Components :	- Flow-control Components : Fifo, Prio \n	Components which store tasks. They can also prioritize them if	they have a defined priority. It is possible to define a threshold	for those Components following two criterias : the number of tasks	stored in the Component, or the sum of the expected length of all	tasks stored in the Component.	- Resource-Mapping Components : Mct, Heft, Eager, Random, Work-Stealing \n	"Core" of the Scheduling Strategy, those Components are the	ones who make scheduling choices.	- Worker Components : Worker \n	Each Worker Component modelize a concrete worker.	- Special-Purpose Components : Perfmodel_Select, Best_Implementation \n	Components dedicated to original purposes. The Perfmodel_Select	Component decides which Resource-Mapping Component should be used to	schedule a task. The Best_Implementation Component chooses which	implementation of a task should be used on the choosen resource.\subsubsection ProgressionAndValidationRules Progression And Validation RulesSome rules must be followed to ensure the correctness of a ModularizedScheduler :	- At least one Flow-control Component without threshold per Worker Component	is needed in a Modularized Scheduler, to store incoming tasks from StarPU	and to give tasks to Worker Components who asks for it. It is possible to	use one Flow-control Component per Worker Component, or one for all Worker	Components, depending on how the Scheduling Tree is defined.	- At least one Resource-Mapping Component is needed in a Modularized	Scheduler. Resource-Mapping Components are the only ones who can make	scheduling choices, and so the only ones who can have several child.\subsubsection ImplementAModularizedScheduler Implementing a Modularized SchedulerThe following code shows how the Tree-Eager-Prefetching Schedulershown in Section \ref ExampleTreeEagerPrefetchingStrategy can be implemented :\code{.c}static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id){  /* The eager component will decide for each task which worker will run it,   * and we want fifos both above and below the component */  starpu_sched_component_initialize_simple_scheduler(    starpu_sched_component_eager_create, NULL,    STARPU_SCHED_SIMPLE_DECIDE_WORKERS |    STARPU_SCHED_SIMPLE_FIFO_ABOVE |    STARPU_SCHED_SIMPLE_FIFOS_BELOW,    sched_ctx_id);}/* Properly destroy the Scheduling Tree and all its Components */static void deinitialize_eager_prefetching_center_policy(unsigned sched_ctx_id){  struct starpu_sched_tree * tree = (struct starpu_sched_tree*)starpu_sched_ctx_get_policy_data(sched_ctx_id);  starpu_sched_tree_destroy(tree);}/* Initializing the starpu_sched_policy struct associated to the Modularized * Scheduler : only the init_sched and deinit_sched needs to be defined to * implement a Modularized Scheduler */struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy ={  .init_sched = initialize_eager_prefetching_center_policy,  .deinit_sched = deinitialize_eager_prefetching_center_policy,  .add_workers = starpu_sched_tree_add_workers,  .remove_workers = starpu_sched_tree_remove_workers,  .push_task = starpu_sched_tree_push_task,  .pop_task = starpu_sched_tree_pop_task,  .pre_exec_hook = starpu_sched_component_worker_pre_exec_hook,  .post_exec_hook = starpu_sched_component_worker_post_exec_hook,  .pop_every_task = NULL,  .policy_name = "tree-eager-prefetching",  .policy_description = "eager with prefetching tree policy"};\endcodestarpu_sched_component_initialize_simple_scheduler is a helper function whichmakes it very trivial to assemble a modular scheduler. The modular scheduler can also be built by hand in the following way:\code{.c}#define _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT 2#define _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT 1000000000.0static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id){  unsigned ntasks_threshold = _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT;  double exp_len_threshold = _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT;  [...]  starpu_sched_ctx_create_worker_collection    (sched_ctx_id, STARPU_WORKER_LIST);  /* Create the Scheduling Tree */  struct starpu_sched_tree * t = starpu_sched_tree_create(sched_ctx_id);  /* The Root Component is a Flow-control Fifo Component */   t->root = starpu_sched_component_fifo_create(NULL);  /* The Resource-mapping Component of the strategy is an Eager Component   */  struct starpu_sched_component *eager_component = starpu_sched_component_eager_create(NULL);  /* Create links between Components : the Eager Component is the child   * of the Root Component */  t->root->add_child(t->root, eager_component);  eager_component->add_father(eager_component, t->root);  /* A task threshold is set for the Flow-control Components which will   * be connected to Worker Components. By doing so, this Modularized   * Scheduler will be able to perform some prefetching on the resources   */  struct starpu_sched_component_fifo_data fifo_data =  {    .ntasks_threshold = ntasks_threshold,    .exp_len_threshold = exp_len_threshold,  };  unsigned i;  for(i = 0; i < starpu_worker_get_count() + starpu_combined_worker_get_count(); i++)  {    /* Each Worker Component has a Flow-control Fifo Component as     * father */    struct starpu_sched_component * worker_component = starpu_sched_component_worker_new(i);    struct starpu_sched_component * fifo_component = starpu_sched_component_fifo_create(&fifo_data);    fifo_component->add_child(fifo_component, worker_component);    worker_component->add_father(worker_component, fifo_component);    /* Each Flow-control Fifo Component associated to a Worker     * Component is linked to the Eager Component as one of its     * children */    eager_component->add_child(eager_component, fifo_component);    fifo_component->add_father(fifo_component, eager_component);  }  starpu_sched_tree_update_workers(t);  starpu_sched_ctx_set_policy_data(sched_ctx_id, (void*)t);}/* Properly destroy the Scheduling Tree and all its Components */static void deinitialize_eager_prefetching_center_policy(unsigned sched_ctx_id){  struct starpu_sched_tree * tree = (struct starpu_sched_tree*)starpu_sched_ctx_get_policy_data(sched_ctx_id);  starpu_sched_tree_destroy(tree);  starpu_sched_ctx_delete_worker_collection(sched_ctx_id);}/* Initializing the starpu_sched_policy struct associated to the Modularized * Scheduler : only the init_sched and deinit_sched needs to be defined to * implement a Modularized Scheduler */struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy ={  .init_sched = initialize_eager_prefetching_center_policy,  .deinit_sched = deinitialize_eager_prefetching_center_policy,  .add_workers = starpu_sched_tree_add_workers,  .remove_workers = starpu_sched_tree_remove_workers,  .push_task = starpu_sched_tree_push_task,  .pop_task = starpu_sched_tree_pop_task,  .pre_exec_hook = starpu_sched_component_worker_pre_exec_hook,  .post_exec_hook = starpu_sched_component_worker_post_exec_hook,  .pop_every_task = NULL,  .policy_name = "tree-eager-prefetching",  .policy_description = "eager with prefetching tree policy"};\endcode\subsection WriteASchedulingComponent Writing a Scheduling Component\subsubsection GenericSchedulingComponent Generic Scheduling ComponentEach Scheduling Component is instantiated from a Generic Scheduling Component,which implements a generic version of the Interface. The generic implementationof Pull, Can_Pull and Can_Push functions are recursive calls to their parents(respectively to their children). However, as a Generic Scheduling Component donot know how much children it will have when it will be instantiated, it doesnot implement the Push function.\subsubsection InstantiationRedefineInterface Instantiation : Redefining the InterfaceA Scheduling Component must implement all the functions of the Interface. It isso necessary to implement a Push function to instantiate a Scheduling Component.The implemented Push function is the "fingerprint" of a Scheduling Component.Depending on how functionalities or properties the programmer wants to giveto the Scheduling Component he is implementing, it is possible to reimplementall the functions of the Interface. For example, a Flow-control Componentreimplements the Pull and the Can_Push functions of the Interface, allowing himto catch the generic recursive calls of these functions. The Pull function ofa Flow-control Component can, for example, pop a task from the local storagequeue of the Component, and give it to the calling Component which asks for it.\subsubsection DetailedProgressionAndValidationRules Detailed Progression and Validation Rules	- A Reservoir is a Scheduling Component which redefines a Push and a Pull	function, in order to store tasks into it. A Reservoir delimit Scheduling	Areas in the Scheduling Tree.	- A Pump is the engine source of the Scheduler : it pushes/pulls tasks	to/from a Scheduling Component to an other. Native Pumps of a Scheduling	Tree are located at the root of the Tree (incoming Push calls from StarPU),	and at the leafs of the Tree (Pop calls coming from StarPU Workers).	Pre-implemented Scheduling Components currently shipped with Pumps are	Flow-Control Components and the Resource-Mapping Component Heft, within	their defined Can_Push functions.	- A correct Scheduling Tree requires a Pump per Scheduling Area and per	Execution Flow.The Tree-Eager-Prefetching Scheduler shown in Section\ref ExampleTreeEagerPrefetchingStrategy follows the previous assumptions :<pre>                                  starpu_push_task                                       <b>Pump</b>                                         | Area 1                                  |                                         |                                         v            -----------------------Fifo_Component-----------------------------                                       <b>Pump</b>                                        |  ^                                Push    |  |    Can_Push                                        v  | Area 2                           Eager_Component                                        |  ^                                        |  |                                        v  |                      --------><-------------------><---------                      |  ^                                |  ^              Push    |  |    Can_Push            Push    |  |    Can_Push                      v  |                                v  |            -----Fifo_Component-----------------------Fifo_Component----------                      |  ^                                |  ^              Pull    |  |    Can_Pull            Pull    |  |    Can_Pull Area 3               v  |                                v  |                     <b>Pump</b>                               <b>Pump</b>                Worker_Component                     Worker_Component</pre>\section GraphScheduling Graph-based SchedulingFor performance reasons, most of the schedulers shipped with StarPU use simplelist-scheduling heuristics, assuming that the application has already setpriorities.  This is why they do their scheduling between when tasks becomeavailable for execution and when a worker becomes idle, without looking at thetask graph.Other heuristics can however look at the task graph. Recording the task graphis expensive, so it is not available by default, the scheduling heuristic hasto set \c _starpu_graph_record to \c 1 from the initialization function, to make itavailable. Then the <c>_starpu_graph*</c> functions can be used.<c>src/sched_policies/graph_test_policy.c</c> is an example of simple greedypolicy which automatically computes priorities by bottom-up rank.The idea is that while the application submits tasks, they are only pushedto a bag of tasks. When the application is finished with submitting tasks,it calls starpu_do_schedule() (or starpu_task_wait_for_all(), which callsstarpu_do_schedule()), and the starpu_sched_policy::do_schedule method of thescheduler is called. This method calls _starpu_graph_compute_depths to computethe bottom-up ranks, and then uses these ranks to set priorities over tasks.It then has two priority queues, one for CPUs, and one for GPUs, and uses a dumbheuristic based on the duration of the task over CPUs and GPUs to decide betweenthe two queues. CPU workers can then pop from the CPU priority queue, and GPUworkers from the GPU priority queue.\section DebuggingScheduling Debugging SchedulingAll the \ref OnlinePerformanceTools and \ref OfflinePerformanceTools canbe used to get information about how well the execution proceeded, and thus theoverall quality of the execution.Precise debugging can also be performed by using the\ref STARPU_TASK_BREAK_ON_PUSH, \ref STARPU_TASK_BREAK_ON_SCHED,\ref STARPU_TASK_BREAK_ON_POP, and \ref STARPU_TASK_BREAK_ON_EXEC environment variables.By setting the job_id of a taskin these environment variables, StarPU will raise <c>SIGTRAP</c> when the task is beingscheduled, pushed, or popped by the scheduler. This means that when one noticesthat a task is being scheduled in a seemingly odd way, one can just reexecutethe application in a debugger, with some of those variables set, and theexecution will stop exactly at the scheduling points of this task, thus allowingto inspect the scheduler state, etc.*/
 |