010_core.doxy 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2018 Inria
  4. *
  5. * StarPU is free software; you can redistribute it and/or modify
  6. * it under the terms of the GNU Lesser General Public License as published by
  7. * the Free Software Foundation; either version 2.1 of the License, or (at
  8. * your option) any later version.
  9. *
  10. * StarPU is distributed in the hope that it will be useful, but
  11. * WITHOUT ANY WARRANTY; without even the implied warranty of
  12. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  13. *
  14. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  15. */
  16. /*! \page StarPUCore StarPU Core
  17. \section CoreEntities StarPU Core Entities
  18. TODO
  19. \subsection CoreEntitiesOverview Overview
  20. Execution entities:
  21. - worker: A worker (see \ref CoreEntitiesWorkers, \ref
  22. CoreEntitiesWorkersAndContexts) entity is a thread created by StarPU to manage
  23. one computing unit. The computing unit can be a local CPU core, an accelerator
  24. or GPU device, or --- on the master side when running in master-slave
  25. distributed mode --- a remote slave computing node. It is responsible for
  26. querying scheduling policies for tasks to execute.
  27. - sched_context: A scheduling context (see \ref CoreEntitiesContexts, \ref
  28. CoreEntitiesWorkersAndContexts) is a logical set of workers governed by an
  29. instance of a scheduling policy. It defines the computing units to which the
  30. scheduling policy instance may assign work entities.
  31. - driver: A driver is the set of hardware-dependent routines used by a
  32. worker to initialize its associated computing unit, execute work entities on
  33. it, and finalize the computing unit usage at the end of the session.
  34. Work entities:
  35. - task: TODO
  36. - job: TODO
  37. Data entities:
  38. - data handle
  39. - data replicate: TODO
  40. \subsection CoreEntitiesWorkers Workers
  41. TODO
  42. \subsubsection CoreEntitiesWorkersStates States
  43. Scheduling operations related state
  44. While a worker is conducting a scheduling operations, e.g. the worker is in the
  45. process of selecting a new task to execute, flag state_sched_op_pending is set
  46. to !0, otherwise it is set to 0.
  47. While state_sched_op_pending is !0, the following exhaustive list of operations on that
  48. workers are restricted in the stated way:
  49. - adding the worker to a context is not allowed;
  50. - removing the worker from a context is not allowed;
  51. - adding the worker to a parallel task team is not allowed;
  52. - removing the worker from a parallel task team is not allowed;
  53. - querying state information about the worker is only allowed while
  54. state_relax_refcnt > 0;
  55. - in particular, querying whether the worker is blocked on a parallel team entry is only
  56. allowed while state_relax_refcnt > 0.
  57. Entering and leaving the state_sched_op_pending state is done through calls to
  58. _starpu_worker_enter_sched_op() and _starpu_worker_leave_sched_op()
  59. respectively (see these functions in use in functions _starpu_get_worker_task() and
  60. _starpu_get_multi_worker_task()). These calls ensure that any pending
  61. conflicting operation deferred while the worker was in the
  62. state_sched_op_pending state is performed in an orderly manner.
  63. Scheduling contexts related states
  64. Flag state_changing_ctx_notice is set to !0 when a thread is about to
  65. add to a scheduling context or remove it from a scheduling context, and is
  66. currently waiting for a safe window to do, until the targeted worker is not in a
  67. scheduling operation or parallel task operation anymore. This flag set to !0 will also
  68. prevent the targeted worker to attempt a fresh scheduling operation or parallel
  69. task operation to avoid starving conditions. However, a scheduling operation
  70. that was already in process before the notice is allowed to complete.
  71. Flag state_changing_ctx_waiting is set to !0 when a scheduling context worker
  72. addition or removal involving the targeted worker is about to occur and the
  73. worker is currently performing a scheduling operation to tell the targeted
  74. worker that the initiator thread is waiting for the scheduling operation to
  75. complete and should be woken up upon completion.
  76. Relaxed synchronization related states
  77. Any StarPU worker may participate to scheduling operations, and in this process,
  78. may be forced to observe state information from other workers.
  79. A StarPU worker thread may therefore be observed by any thread, even
  80. other StarPU workers. Since workers may observe each other in any order, it is
  81. not possible to rely exclusively on the sched_mutex of each worker to protect the
  82. observation of worker state flags by other workers, because
  83. worker A observing worker B would involve locking workers in (A B) sequence,
  84. while worker B observing worker A would involve locking workers in (B A)
  85. sequence, leading to lock inversion deadlocks.
  86. In consequence, no thread must hold more than one worker's sched_mutex at any time.
  87. Instead, workers implement a relaxed locking scheme based on the state_relax_refcnt
  88. counter, itself protected by the worker's sched_mutex. When state_relax_refcnt
  89. > 0, the targeted worker state flags may be observed, otherwise the thread attempting
  90. >the observation must repeatedly wait on the targeted worker's sched_cond
  91. >condition until state_relax_refcnt > 0.
  92. The relaxed mode, while on, can actually be seen as a transactional consistency
  93. model, where concurrent accesses are authorized and potential conflicts are
  94. resolved after the fact. When the relaxed mode is off, the consistency model
  95. becomes a mutual exclusion model, where the sched_mutex of the worker must be
  96. held in order to access or change the worker state.
  97. Parallel tasks related states
  98. When a worker is scheduled to participate to the execution of a parallel task,
  99. it must wait for the whole team of workers participating to the execution of
  100. this task to be ready. While the worker waits for its teammates, it is not
  101. available to run other tasks or perform other operations. Such a waiting
  102. operation can therefore not start while conflicting operations such as
  103. scheduling operations and scheduling context resizing involving the worker are
  104. on-going. Conversely these operations and other may query weather the worker is
  105. blocked on a parallel task entry with starpu_worker_is_blocked_in_parallel().
  106. The starpu_worker_is_blocked_in_parallel() function is allowed to proceed while
  107. and only while state_relax_refcnt > 0. Due to the relaxed worker locking scheme,
  108. the state_blocked_in_parallel flag of the targeted worker may change after it
  109. has been observed by an observer thread. In consequence, flag
  110. state_blocked_in_parallel_observed of the targeted worker is set to 1 by the
  111. observer immediately after the observation to "taint" the targeted worker. The
  112. targeted worker will clear the state_blocked_in_parallel_observed flag tainting
  113. and defer the processing of parallel task related requests until a full
  114. scheduling operation shot completes without the
  115. state_blocked_in_parallel_observed flag being tainted again. The purpose of this
  116. tainting flag is to prevent parallel task operations to be started immediately
  117. after the observation of a transient scheduling state.
  118. Worker's management of parallel tasks is
  119. governed by the following set of state flags and counters:
  120. - state_blocked_in_parallel: set to !0 while the worker is currently blocked on a parallel
  121. task;
  122. - state_blocked_in_parallel_observed: set to !0 to taint the worker when a
  123. thread has observed the state_blocked_in_parallel flag of this worker while
  124. its state_relax_refcnt state counter was >0. Any pending request to add or
  125. remove the worker from a parallel task team will be deferred until a whole
  126. scheduling operation shot completes without being tainted again.
  127. - state_block_in_parallel_req: set to !0 when a thread is waiting on a request
  128. for the worker to be added to a parallel task team. Must be protected by the
  129. worker's sched_mutex.
  130. - state_block_in_parallel_ack: set to !0 by the worker when acknowledging a
  131. request for being added to a parallel task team. Must be protected by the
  132. worker's sched_mutex.
  133. - state_unblock_in_parallel_req: set to !0 when a thread is waiting on a request
  134. for the worker to be removed from a parallel task team. Must be protected by the
  135. worker's sched_mutex.
  136. - state_unblock_in_parallel_ack: set to !0 by the worker when acknowledging a
  137. request for being removed from a parallel task team. Must be protected by the
  138. worker's sched_mutex.
  139. - block_in_parallel_ref_count: counts the number of consecutive pending requests
  140. to enter parallel task teams. Only the first of a train of requests for
  141. entering parallel task teams triggers the transition of the
  142. state_block_in_parallel_req flag from 0 to 1. Only the last of a train of
  143. requests to leave a parallel task team triggers the transition of flag
  144. state_unblock_in_parallel_req from 0 to 1. Must be protected by the
  145. worker's sched_mutex.
  146. \subsection CoreEntitiesContexts Scheduling Contexts
  147. TODO
  148. \subsection CoreEntitiesWorkersAndContexts Workers and Scheduling Contexts
  149. TODO
  150. */