350_scheduling_policy_definition.doxy 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2013 Inria
  4. * Copyright (C) 2014,2016-2019 CNRS
  5. * Copyright (C) 2014,2017,2019 Université de Bordeaux
  6. * Copyright (C) 2013 Simon Archipoff
  7. *
  8. * StarPU is free software; you can redistribute it and/or modify
  9. * it under the terms of the GNU Lesser General Public License as published by
  10. * the Free Software Foundation; either version 2.1 of the License, or (at
  11. * your option) any later version.
  12. *
  13. * StarPU is distributed in the hope that it will be useful, but
  14. * WITHOUT ANY WARRANTY; without even the implied warranty of
  15. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  16. *
  17. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  18. */
  19. /*! \page HowToDefineANewSchedulingPolicy How To Define A New Scheduling Policy
  20. \section Introduction Introduction
  21. presentation of both types of schedulers: the basic and the modular.
  22. Explain why it is easier to define a new scheduler by using the modular approach
  23. \section DefiningANewBasicSchedulingPolicy Defining A New Basic Scheduling Policy
  24. A full example showing how to define a new scheduling policy is available in
  25. the StarPU sources in <c>examples/scheduler/dummy_sched.c</c>.
  26. The scheduler has to provide methods:
  27. \code{.c}
  28. static struct starpu_sched_policy dummy_sched_policy =
  29. {
  30. .init_sched = init_dummy_sched,
  31. .deinit_sched = deinit_dummy_sched,
  32. .add_workers = dummy_sched_add_workers,
  33. .remove_workers = dummy_sched_remove_workers,
  34. .push_task = push_task_dummy,
  35. .pop_task = pop_task_dummy,
  36. .policy_name = "dummy",
  37. .policy_description = "dummy scheduling strategy"
  38. };
  39. \endcode
  40. The idea is that when a task becomes ready for execution, the
  41. starpu_sched_policy::push_task method is called to give the ready task to the
  42. scheduler. When a worker is idle, the starpu_sched_policy::pop_task method is
  43. called to get a task from the scheduler. It is up to the
  44. scheduler to implement what is between. A simple eager scheduler is for instance
  45. to make starpu_sched_policy::push_task push the task to a global list, and make
  46. starpu_sched_policy::pop_task pop from this list. A scheduler can also use
  47. starpu_push_local_task() to directly push tasks to a per-worker queue, and then
  48. starpu_does not even need to implement starpu_sched_policy::pop_task.
  49. If there are no ready tasks within the scheduler, it can just return \c NULL, and
  50. the worker will sleep.
  51. The \ref starpu_sched_policy section provides the exact rules that govern the
  52. methods of the policy.
  53. Make sure to have a look at the \ref API_Scheduling_Policy section, which
  54. provides a complete list of the functions available for writing advanced schedulers.
  55. This includes getting an estimation for a task computation completion with
  56. starpu_task_expected_length(), for the required data transfers with
  57. starpu_task_expected_data_transfer_time_for(), for the required energy with
  58. starpu_task_expected_energy(), etc. Other
  59. useful functions include starpu_transfer_bandwidth(), starpu_transfer_latency(),
  60. starpu_transfer_predict(), ...
  61. Usual functions can be used on tasks, for instance one can use the following to
  62. get the data size for a task.
  63. \code{.c}
  64. size = 0;
  65. write = 0;
  66. if (task->cl)
  67. for (i = 0; i < STARPU_TASK_GET_NBUFFERS(task); i++)
  68. {
  69. starpu_data_handle_t data = STARPU_TASK_GET_HANDLE(task, i)
  70. size_t datasize = starpu_data_get_size(data);
  71. size += datasize;
  72. if (STARPU_TASK_GET_MODE(task, i) & STARPU_W)
  73. write += datasize;
  74. }
  75. \endcode
  76. One can enumerate the workers with this iterator:
  77. \code{.c}
  78. struct starpu_worker_collection *workers = starpu_sched_ctx_get_worker_collection(sched_ctx_id);
  79. struct starpu_sched_ctx_iterator it;
  80. workers->init_iterator(workers, &it);
  81. while(workers->has_next(workers, &it))
  82. {
  83. unsigned worker = workers->get_next(workers, &it);
  84. ...
  85. }
  86. \endcode
  87. Task queues can be implemented with the starpu_task_list functions.
  88. To provide synchronization between workers, a per-worker lock exists to protect
  89. the data structures of a given worker. It is acquired around scheduler methods,
  90. so that the scheduler does not need any additional mutex to protect its per-worker data.
  91. In case the scheduler wants to access another scheduler's data, it should use
  92. starpu_worker_lock() and starpu_worker_unlock().
  93. Calling starpu_worker_lock(B) from a worker A will however thus make
  94. worker A wait for worker B to complete its scheduling method. That may be
  95. a problem if that method takes a long time, because it is e.g. computing a
  96. heuristic or waiting for another mutex. In such a case, worker B can call
  97. starpu_worker_relax_on() and starpu_worker_relax_off() around the section which
  98. takes a long time (and does not actually need protection), so that worker A can
  99. e.g. push tasks for worker B, and B will notice them once it gets back from its
  100. expensive computation.
  101. When the starpu_sched_policy::push_task method has pushed a task for another
  102. worker, one has to call starpu_wake_worker_relax_light() so that worker wakes up
  103. and picks it. If the task was pushed on a shared queue, one may want to only
  104. wake one idle worker. An example doing this is available in
  105. <c>src/sched_policies/eager_central_policy.c</c>.
  106. A variety of examples of
  107. advanced schedulers can be read in <c>src/sched_policies</c>, for
  108. instance <c>random_policy.c</c>, <c>eager_central_policy.c</c>,
  109. <c>work_stealing_policy.c</c> Code protected by
  110. <c>if (_starpu_get_nsched_ctxs() > 1)</c> can be ignored, this is for scheduling
  111. contexts, which is an experimental feature.
  112. \section DefiningANewModularSchedulingPolicy Defining A New Modular Scheduling Policy
  113. StarPU's Modularized Schedulers are made of individual Scheduling Components
  114. Modularizedly assembled as a Scheduling Tree. Each Scheduling Component has an
  115. unique purpose, such as prioritizing tasks or mapping tasks over resources.
  116. A typical Scheduling Tree is shown below.
  117. <pre>
  118. |
  119. starpu_push_task |
  120. |
  121. v
  122. Fifo_Component
  123. | ^
  124. | |
  125. v |
  126. Eager_Component
  127. | ^
  128. | |
  129. v |
  130. --------><--------------><--------
  131. | ^ | ^
  132. | | | |
  133. v | v |
  134. Fifo_Component Fifo_Component
  135. | ^ | ^
  136. | | | |
  137. v | v |
  138. Worker_Component Worker_Component
  139. </pre>
  140. When a task is pushed by StarPU in a Modularized Scheduler, the task moves from
  141. a Scheduling Component to an other, following the hierarchy of the
  142. Scheduling Tree, and is stored in one of the Scheduling Components of the
  143. strategy.
  144. When a worker wants to pop a task from the Modularized Scheduler, the
  145. corresponding Worker Component of the Scheduling Tree tries to pull a task from
  146. its parents, following the hierarchy, and gives it to the worker if it succeded
  147. to get one.
  148. \subsection ExistingModularizedSchedulers Existing Modularized Schedulers
  149. StarPU is currently shipped with the following pre-defined Modularized
  150. Schedulers :
  151. - Eager-based Schedulers (with/without prefetching) : \n
  152. Naive scheduler, which tries to map a task on the first available resource
  153. it finds.
  154. - Prio-based Schedulers (with/without prefetching) : \n
  155. Similar to Eager-Based Schedulers. Can handle tasks which have a defined
  156. priority and schedule them accordingly.
  157. - Random-based Schedulers (with/without prefetching) : \n
  158. Selects randomly a resource to be mapped on for each task.
  159. - HEFT Scheduler : \n
  160. Heterogeneous Earliest Finish Time Scheduler.
  161. This scheduler needs that every task submitted to StarPU have a
  162. defined performance model (\ref PerformanceModelCalibration)
  163. to work efficiently, but can handle tasks without a performance
  164. model.
  165. To use one of these schedulers, one can set the environment variable \ref STARPU_SCHED.
  166. All modularized schedulers are named following the RE <c>tree-*</c>
  167. \subsection ExampleTreeEagerPrefetchingStrategy An Example : The Tree-Eager-Prefetching Strategy
  168. <pre>
  169. |
  170. starpu_push_task |
  171. |
  172. v
  173. Fifo_Component
  174. | ^
  175. Push | | Can_Push
  176. v |
  177. Eager_Component
  178. | ^
  179. | |
  180. v |
  181. --------><-------------------><---------
  182. | ^ | ^
  183. Push | | Can_Push Push | | Can_Push
  184. v | v |
  185. Fifo_Component Fifo_Component
  186. | ^ | ^
  187. Pull | | Can_Pull Pull | | Can_Pull
  188. v | v |
  189. Worker_Component Worker_Component
  190. </pre>
  191. \subsection Interface
  192. Each Scheduling Component must follow the following pre-defined Interface
  193. to be able to interact with other Scheduling Components.
  194. - Push (Caller_Component, Child_Component, Task) \n
  195. The calling Scheduling Component transfers a task to its
  196. Child Component. When the Push function returns, the task no longer
  197. belongs to the calling Component. The Modularized Schedulers'
  198. model relies on this function to perform prefetching.
  199. See starpu_sched_component::push_task for more details
  200. - Pull (Caller_Component, Parent_Component) -> Task \n
  201. The calling Scheduling Component requests a task from
  202. its Parent Component. When the Pull function ends, the returned
  203. task belongs to the calling Component.
  204. See starpu_sched_component::pull_task for more details
  205. - Can_Push (Caller_Component, Parent_Component) \n
  206. The calling Scheduling Component notifies its Parent Component that
  207. it is ready to accept new tasks.
  208. See starpu_sched_component::can_push for more details
  209. - Can_Pull (Caller_Component, Child_Component) \n
  210. The calling Scheduling Component notifies its Child Component
  211. that it is ready to give new tasks.
  212. See starpu_sched_component::can_pull for more details
  213. \subsection BuildAModularizedScheduler Building a Modularized Scheduler
  214. \subsubsection PreImplementedComponents Pre-implemented Components
  215. StarPU is currently shipped with the following four Scheduling Components :
  216. - Flow-control Components : Fifo, Prio \n
  217. Components which store tasks. They can also prioritize them if
  218. they have a defined priority. It is possible to define a threshold
  219. for those Components following two criterias : the number of tasks
  220. stored in the Component, or the sum of the expected length of all
  221. tasks stored in the Component.
  222. - Resource-Mapping Components : Mct, Heft, Eager, Random, Work-Stealing \n
  223. "Core" of the Scheduling Strategy, those Components are the
  224. ones who make scheduling choices.
  225. - Worker Components : Worker \n
  226. Each Worker Component modelize a concrete worker.
  227. - Special-Purpose Components : Perfmodel_Select, Best_Implementation \n
  228. Components dedicated to original purposes. The Perfmodel_Select
  229. Component decides which Resource-Mapping Component should be used to
  230. schedule a task. The Best_Implementation Component chooses which
  231. implementation of a task should be used on the choosen resource.
  232. \subsubsection ProgressionAndValidationRules Progression And Validation Rules
  233. Some rules must be followed to ensure the correctness of a Modularized
  234. Scheduler :
  235. - At least one Flow-control Component without threshold per Worker Component
  236. is needed in a Modularized Scheduler, to store incoming tasks from StarPU
  237. and to give tasks to Worker Components who asks for it. It is possible to
  238. use one Flow-control Component per Worker Component, or one for all Worker
  239. Components, depending on how the Scheduling Tree is defined.
  240. - At least one Resource-Mapping Component is needed in a Modularized
  241. Scheduler. Resource-Mapping Components are the only ones who can make
  242. scheduling choices, and so the only ones who can have several child.
  243. \subsubsection ImplementAModularizedScheduler Implementing a Modularized Scheduler
  244. The following code shows how the Tree-Eager-Prefetching Scheduler
  245. shown in Section \ref ExampleTreeEagerPrefetchingStrategy is implemented :
  246. \code{.c}
  247. #define _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT 2
  248. #define _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT 1000000000.0
  249. static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
  250. {
  251. unsigned ntasks_threshold = _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT;
  252. double exp_len_threshold = _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT;
  253. [...]
  254. starpu_sched_ctx_create_worker_collection
  255. (sched_ctx_id, STARPU_WORKER_LIST);
  256. /* Create the Scheduling Tree */
  257. struct starpu_sched_tree * t = starpu_sched_tree_create(sched_ctx_id);
  258. /* The Root Component is a Flow-control Fifo Component */
  259. t->root = starpu_sched_component_fifo_create(NULL);
  260. /* The Resource-mapping Component of the strategy is an Eager Component
  261. */
  262. struct starpu_sched_component *eager_component = starpu_sched_component_eager_create(NULL);
  263. /* Create links between Components : the Eager Component is the child
  264. * of the Root Component */
  265. t->root->add_child(t->root, eager_component);
  266. eager_component->add_father(eager_component, t->root);
  267. /* A task threshold is set for the Flow-control Components which will
  268. * be connected to Worker Components. By doing so, this Modularized
  269. * Scheduler will be able to perform some prefetching on the resources
  270. */
  271. struct starpu_sched_component_fifo_data fifo_data =
  272. {
  273. .ntasks_threshold = ntasks_threshold,
  274. .exp_len_threshold = exp_len_threshold,
  275. };
  276. unsigned i;
  277. for(i = 0; i < starpu_worker_get_count() + starpu_combined_worker_get_count(); i++)
  278. {
  279. /* Each Worker Component has a Flow-control Fifo Component as
  280. * father */
  281. struct starpu_sched_component * worker_component = starpu_sched_component_worker_new(i);
  282. struct starpu_sched_component * fifo_component = starpu_sched_component_fifo_create(&fifo_data);
  283. fifo_component->add_child(fifo_component, worker_component);
  284. worker_component->add_father(worker_component, fifo_component);
  285. /* Each Flow-control Fifo Component associated to a Worker
  286. * Component is linked to the Eager Component as one of its
  287. * children */
  288. eager_component->add_child(eager_component, fifo_component);
  289. fifo_component->add_father(fifo_component, eager_component);
  290. }
  291. starpu_sched_tree_update_workers(t);
  292. starpu_sched_ctx_set_policy_data(sched_ctx_id, (void*)t);
  293. }
  294. /* Properly destroy the Scheduling Tree and all its Components */
  295. static void deinitialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
  296. {
  297. struct starpu_sched_tree * tree = (struct starpu_sched_tree*)starpu_sched_ctx_get_policy_data(sched_ctx_id);
  298. starpu_sched_tree_destroy(tree);
  299. starpu_sched_ctx_delete_worker_collection(sched_ctx_id);
  300. }
  301. /* Initializing the starpu_sched_policy struct associated to the Modularized
  302. * Scheduler : only the init_sched and deinit_sched needs to be defined to
  303. * implement a Modularized Scheduler */
  304. struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy =
  305. {
  306. .init_sched = initialize_eager_prefetching_center_policy,
  307. .deinit_sched = deinitialize_eager_prefetching_center_policy,
  308. .add_workers = starpu_sched_tree_add_workers,
  309. .remove_workers = starpu_sched_tree_remove_workers,
  310. .push_task = starpu_sched_tree_push_task,
  311. .pop_task = starpu_sched_tree_pop_task,
  312. .pre_exec_hook = starpu_sched_component_worker_pre_exec_hook,
  313. .post_exec_hook = starpu_sched_component_worker_post_exec_hook,
  314. .pop_every_task = NULL,
  315. .policy_name = "tree-eager-prefetching",
  316. .policy_description = "eager with prefetching tree policy"
  317. };
  318. \endcode
  319. \subsection WriteASchedulingComponent Writing a Scheduling Component
  320. \subsubsection GenericSchedulingComponent Generic Scheduling Component
  321. Each Scheduling Component is instantiated from a Generic Scheduling Component,
  322. which implements a generic version of the Interface. The generic implementation
  323. of Pull, Can_Pull and Can_Push functions are recursive calls to their parents
  324. (respectively to their children). However, as a Generic Scheduling Component do
  325. not know how much children it will have when it will be instantiated, it does
  326. not implement the Push function.
  327. \subsubsection InstantiationRedefineInterface Instantiation : Redefining the Interface
  328. A Scheduling Component must implement all the functions of the Interface. It is
  329. so necessary to implement a Push function to instantiate a Scheduling Component.
  330. The implemented Push function is the "fingerprint" of a Scheduling Component.
  331. Depending on how functionalities or properties the programmer wants to give
  332. to the Scheduling Component he is implementing, it is possible to reimplement
  333. all the functions of the Interface. For example, a Flow-control Component
  334. reimplements the Pull and the Can_Push functions of the Interface, allowing him
  335. to catch the generic recursive calls of these functions. The Pull function of
  336. a Flow-control Component can, for example, pop a task from the local storage
  337. queue of the Component, and give it to the calling Component which asks for it.
  338. \subsubsection DetailedProgressionAndValidationRules Detailed Progression and Validation Rules
  339. - A Reservoir is a Scheduling Component which redefines a Push and a Pull
  340. function, in order to store tasks into it. A Reservoir delimit Scheduling
  341. Areas in the Scheduling Tree.
  342. - A Pump is the engine source of the Scheduler : it pushes/pulls tasks
  343. to/from a Scheduling Component to an other. Native Pumps of a Scheduling
  344. Tree are located at the root of the Tree (incoming Push calls from StarPU),
  345. and at the leafs of the Tree (Pop calls coming from StarPU Workers).
  346. Pre-implemented Scheduling Components currently shipped with Pumps are
  347. Flow-Control Components and the Resource-Mapping Component Heft, within
  348. their defined Can_Push functions.
  349. - A correct Scheduling Tree requires a Pump per Scheduling Area and per
  350. Execution Flow.
  351. The Tree-Eager-Prefetching Scheduler shown in Section
  352. \ref ExampleTreeEagerPrefetchingStrategy follows the previous assumptions :
  353. <pre>
  354. starpu_push_task
  355. <b>Pump</b>
  356. |
  357. Area 1 |
  358. |
  359. v
  360. -----------------------Fifo_Component-----------------------------
  361. <b>Pump</b>
  362. | ^
  363. Push | | Can_Push
  364. v |
  365. Area 2 Eager_Component
  366. | ^
  367. | |
  368. v |
  369. --------><-------------------><---------
  370. | ^ | ^
  371. Push | | Can_Push Push | | Can_Push
  372. v | v |
  373. -----Fifo_Component-----------------------Fifo_Component----------
  374. | ^ | ^
  375. Pull | | Can_Pull Pull | | Can_Pull
  376. Area 3 v | v |
  377. <b>Pump</b> <b>Pump</b>
  378. Worker_Component Worker_Component
  379. </pre>
  380. \section GraphScheduling Graph-based Scheduling
  381. For performance reasons, most of the schedulers shipped with StarPU use simple
  382. list-scheduling heuristics, assuming that the application has already set
  383. priorities. This is why they do their scheduling between when tasks become
  384. available for execution and when a worker becomes idle, without looking at the
  385. task graph.
  386. Other heuristics can however look at the task graph. Recording the task graph
  387. is expensive, so it is not available by default, the scheduling heuristic has
  388. to set \c _starpu_graph_record to \c 1 from the initialization function, to make it
  389. available. Then the <c>_starpu_graph*</c> functions can be used.
  390. <c>src/sched_policies/graph_test_policy.c</c> is an example of simple greedy
  391. policy which automatically computes priorities by bottom-up rank.
  392. The idea is that while the application submits tasks, they are only pushed
  393. to a bag of tasks. When the application is finished with submitting tasks,
  394. it calls starpu_do_schedule() (or starpu_task_wait_for_all(), which calls
  395. starpu_do_schedule()), and the starpu_sched_policy::do_schedule method of the
  396. scheduler is called. This method calls _starpu_graph_compute_depths to compute
  397. the bottom-up ranks, and then uses these ranks to set priorities over tasks.
  398. It then has two priority queues, one for CPUs, and one for GPUs, and uses a dumb
  399. heuristic based on the duration of the task over CPUs and GPUs to decide between
  400. the two queues. CPU workers can then pop from the CPU priority queue, and GPU
  401. workers from the GPU priority queue.
  402. \section DebuggingScheduling Debugging Scheduling
  403. All the \ref OnlinePerformanceTools and \ref OfflinePerformanceTools can
  404. be used to get information about how well the execution proceeded, and thus the
  405. overall quality of the execution.
  406. Precise debugging can also be performed by using the
  407. \ref STARPU_TASK_BREAK_ON_PUSH, \ref STARPU_TASK_BREAK_ON_SCHED,
  408. \ref STARPU_TASK_BREAK_ON_POP, and \ref STARPU_TASK_BREAK_ON_EXEC environment variables.
  409. By setting the job_id of a task
  410. in these environment variables, StarPU will raise <c>SIGTRAP</c> when the task is being
  411. scheduled, pushed, or popped by the scheduler. This means that when one notices
  412. that a task is being scheduled in a seemingly odd way, one can just reexecute
  413. the application in a debugger, with some of those variables set, and the
  414. execution will stop exactly at the scheduling points of this task, thus allowing
  415. to inspect the scheduler state, etc.
  416. */