| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395 | /* StarPU --- Runtime system for heterogeneous multicore architectures. * * Copyright (C) 2018                                     Inria * * StarPU is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or (at * your option) any later version. * * StarPU is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * See the GNU Lesser General Public License in COPYING.LGPL for more details. *//*! \page StarPUCore StarPU Core\section CoreEntities StarPU Core EntitiesTODO\subsection CoreEntitiesOverview OverviewExecution entities:- <b>worker</b>: A worker (see \ref CoreEntitiesWorkers, \ref  CoreEntitiesWorkersAndContexts) entity is a CPU thread created by StarPU to manage  one computing unit. The computing unit can be a local CPU core, an accelerator  or GPU device, or --- on the master side when running in master-slave  distributed mode --- a remote slave computing node. It is responsible for  querying scheduling policies for tasks to execute.- <b>sched_context</b>: A scheduling context (see \ref CoreEntitiesContexts, \ref  CoreEntitiesWorkersAndContexts) is a logical set of workers governed by an  instance of a scheduling policy. It defines the computing units to which the  scheduling policy instance may assign work entities.- <b>driver</b>: A driver is the set of hardware-dependent routines used by a  worker to initialize its associated computing unit, execute work entities on  it, and finalize the computing unit usage at the end of the session.Work entities:- <b>task</b>: A task is a high level work request submitted to StarPU by the  application, or internally by StarPU itself.- <b>job</b>: A job is a low level view of a work request. It is not exposed to  the application. A job structure may be shared among several task structures  in the case of a parallel task.Data entities:- <b>data handle</b>: A data handle is a high-level, application opaque object designating a  piece of data currently registered to the StarPU data management layer.  Internally, it is a \ref _starpu_data_state structure.- <b>data replicate</b>: A data replicate is a low-level object designating one  copy of a piece of data registered to StarPU as a data handle, residing in one  memory node managed by StarPU. It is not exposed to the application.\subsection CoreEntitiesWorkers WorkersA <b>worker</b> is a CPU thread created by StarPU. Its role is to manage one computingunit. This computing unit can be a local CPU core, in which case, the workerthread manages the actual CPU core to which it is assigned; or it can be acomputing device such as a GPU or an accelerator (or even a remote computingnode when StarPU is running in distributed master-slave mode.) When a workermanages a computing device, the CPU core to which the worker's thread isby default exclusively assigned to the device management work and does notparticipate to computation.\subsubsection CoreEntitiesWorkersStates States<b>Scheduling operations related state</b>While a worker is conducting a scheduling operations, e.g. the worker is in theprocess of selecting a new task to execute, flag state_sched_op_pending is setto \c !0, otherwise it is set to \c 0.While state_sched_op_pending is !0, the following exhaustive list of operations on thatworkers are restricted in the stated way:- adding the worker to a context is not allowed;- removing the worker from a context is not allowed;- adding the worker to a parallel task team is not allowed;- removing the worker from a parallel task team is not allowed;- querying state information about the worker is only allowed while  <code>state_relax_refcnt > 0</code>;  - in particular, querying whether the worker is blocked on a parallel team entry is only  allowed while <code>state_relax_refcnt > 0</code>.Entering and leaving the state_sched_op_pending state is done through calls to\ref _starpu_worker_enter_sched_op() and \ref _starpu_worker_leave_sched_op()respectively (see these functions in use in functions \ref _starpu_get_worker_task() and\ref _starpu_get_multi_worker_task()). These calls ensure that any pendingconflicting operation deferred while the worker was in thestate_sched_op_pending state is performed in an orderly manner.<br><b>Scheduling contexts related states</b>Flag \c state_changing_ctx_notice is set to \c !0 when a thread is about toadd the worker to a scheduling context or remove it from a scheduling context, and iscurrently waiting for a safe window to do so, until the targeted worker is not in ascheduling operation or parallel task operation anymore. This flag set to \c !0 will alsoprevent the targeted worker to attempt a fresh scheduling operation or paralleltask operation to avoid starving conditions. However, a scheduling operationthat was already in progress before the notice is allowed to complete.Flag \c state_changing_ctx_waiting is set to \c !0 when a scheduling context workeraddition or removal involving the targeted worker is about to occur and theworker is currently performing a scheduling operation to tell the targetedworker that the initiator thread is waiting for the scheduling operation tocomplete and should be woken up upon completion.<br><b>Relaxed synchronization related states</b>Any StarPU worker may participate to scheduling operations, and in this process,may be forced to observe state information from other workers. A StarPU worker thread may therefore be observed by any thread, evenother StarPU workers. Since workers may observe each other in any order, it isnot possible to rely exclusively on the \c sched_mutex of each worker to protect theobservation of worker state flags by other workers, becauseworker A observing worker B would involve locking workers in (A B) sequence,while worker B observing worker A would involve locking workers in (B A)sequence, leading to lock inversion deadlocks.In consequence, no thread must hold more than one worker's sched_mutex at any time.Instead, workers implement a relaxed locking scheme based on the \c state_relax_refcntcounter, itself protected by the worker's sched_mutex. When <code>state_relax_refcnt> 0</code>, the targeted worker state flags may be observed, otherwise the thread attemptingthe observation must repeatedly wait on the targeted worker's \c sched_condcondition until <code>state_relax_refcnt > 0</code>.The relaxed mode, while on, can actually be seen as a transactional consistencymodel, where concurrent accesses are authorized and potential conflicts areresolved after the fact. When the relaxed mode is off, the consistency modelbecomes a mutual exclusion model, where the sched_mutex of the worker must beheld in order to access or change the worker state.<br><b>Parallel tasks related states</b>When a worker is scheduled to participate to the execution of a parallel task,it must wait for the whole team of workers participating to the execution ofthis task to be ready. While the worker waits for its teammates, it is notavailable to run other tasks or perform other operations. Such a waitingoperation can therefore not start while conflicting operations such asscheduling operations and scheduling context resizing involving the worker areon-going. Conversely these operations and other may query weather the worker isblocked on a parallel task entry with \ref starpu_worker_is_blocked_in_parallel().The \ref starpu_worker_is_blocked_in_parallel() function is allowed to proceed whileand only while <code>state_relax_refcnt > 0</code>. Due to the relaxed worker locking scheme,the \c state_blocked_in_parallel flag of the targeted worker may change after ithas been observed by an observer thread. In consequence, flag\c state_blocked_in_parallel_observed of the targeted worker is set to \c 1 by theobserver immediately after the observation to "taint" the targeted worker. Thetargeted worker will clear the \c state_blocked_in_parallel_observed flag taintingand defer the processing of parallel task related requests until a fullscheduling operation shot completes without the\c state_blocked_in_parallel_observed flag being tainted again. The purpose of thistainting flag is to prevent parallel task operations to be started immediatelyafter the observation of a transient scheduling state.Worker's management of parallel tasks isgoverned by the following set of state flags and counters:- \c state_blocked_in_parallel: set to \c !0 while the worker is currently blocked on a parallel  task;- \c state_blocked_in_parallel_observed: set to \c !0 to taint the worker when a  thread has observed the state_blocked_in_parallel flag of this worker while  its \c state_relax_refcnt state counter was \c >0. Any pending request to add or  remove the worker from a parallel task team will be deferred until a whole  scheduling operation shot completes without being tainted again.- \c state_block_in_parallel_req: set to \c !0 when a thread is waiting on a request  for the worker to be added to a parallel task team. Must be protected by the  worker's \c sched_mutex.- \c state_block_in_parallel_ack: set to \c !0 by the worker when acknowledging a  request for being added to a parallel task team. Must be protected by the  worker's \c sched_mutex.- \c state_unblock_in_parallel_req: set to \c !0 when a thread is waiting on a request  for the worker to be removed from a parallel task team. Must be protected by the  worker's \c sched_mutex.- \c state_unblock_in_parallel_ack: set to \c !0 by the worker when acknowledging a  request for being removed from a parallel task team. Must be protected by the  worker's \c sched_mutex.- \c block_in_parallel_ref_count: counts the number of consecutive pending requests  to enter parallel task teams. Only the first of a train of requests for  entering parallel task teams triggers the transition of the  \c state_block_in_parallel_req flag from \c 0 to \c 1. Only the last of a train of  requests to leave a parallel task team triggers the transition of flag  \c state_unblock_in_parallel_req from \c 0 to \c 1. Must be protected by the  worker's \c sched_mutex.\subsubsection CoreEntitiesWorkersOperations Operations<b>Entry point</b>All the operations of a worker are handled in an iterative fashion, either bythe application code on a thread launched by the application, or automaticallyby StarPU on a device-dependent CPU thread launched by StarPU. Whether aworker's operation cycle is managed automatically ornot is controlled per session by the field \c not_launched_drivers of the \cstarpu_conf struct, and is decided in \ref _starpu_launch_drivers() function.When managed automatically, cycles of operations for a worker are handled by the correspondingdriver specific <code>_starpu_<DRV>_worker()</code> function, where \c DRV is a driver name such ascpu (\c _starpu_cpu_worker) or cuda (\c _starpu_cuda_worker), for instance. Otherwise, the application must supply a thread which will repeatedly call \refstarpu_driver_run_once() for the corresponding worker.In both cases, control is then transferred to \ref _starpu_cpu_driver_run_once() (or the corresponding driver specific func).The cycle of operations typically includes, at least, the following operations:- <b>task scheduling</b>- <b>parallel task team build-up</b>- <b>task input processing</b>- <b>data transfer processing</b>- <b>task execution</b>When the worker cycles are handled by StarPU automatically, the iterativeoperation processing ends when the \c running field of \c _starpu_configbecomes false. This field should not be read directly, instead it should be readthrough the \ref _starpu_machine_is_running() function.<br><b>Task scheduling</b>If the worker does not yet have a queued task, it calls_starpu_get_worker_task() to try and obtain a task. This may involve schedulingoperations such as stealing a queued but not yet executed task from anotherworker. The operation may not necessarily succeed if no tasks are ready and/orsuitable to run on the worker's computing unit.<br><b>Parallel task team build-up</b>If the worker has a task ready to run and the corresponding job has a size\c >1, then the task is a parallel job and the worker must synchronize with theother workers participating to the parallel execution of the job to assign aunique rank for each worker. The synchronization is done through the job's \csync_mutex mutex.<br><b>Task input processing</b>Before the task can be executed, its input data must be made available on amemory node reachable by the worker's computing unit. To do so, the worker calls\ref _starpu_fetch_task_input()<br><b>Data transfer processing</b>The worker makes pending data transfers (involving memory node(s) that it isdriving) progress, with a call to \ref __starpu_datawizard_progress(),<br><b>Task execution</b>Once the worker has a pending task assigned and the input data for that task areavailable in the memory node reachable by the worker's computing unit, theworker calls \ref _starpu_cpu_driver_execute_task() (or the corresponding driverspecific function) to proceed to the execution of the task.\subsection CoreEntitiesContexts Scheduling ContextsA scheduling context is a logical set of workers governed by an instance of ascheduling policy. Tasks submitted to a given scheduling context are confined tothe computing units governed by the workers belonging to this scheduling contextat the time they get scheduled.A scheduling context is identified by an unsigned integer identifier between \c0 and <code>STARPU_NMAX_SCHED_CTXS - 1</code>. The \c STARPU_NMAX_SCHED_CTXSidentifier value is reserved to indicated an unallocated, invalid or deletedscheduling context.Accesses to the scheduling context structure are governed by amultiple-readers/single-writer lock (\c rwlock field). Changes to the structurecontents, additions or removals of workers, statistics updates, all must be donewith proper exclusive write access. \subsection CoreEntitiesWorkersAndContexts Workers and Scheduling ContextsA worker can be assigned to one or more <b>scheduling contexts</b>. Itexclusively receives tasks submitted to the scheduling context(s) it iscurrently assigned at the time such tasks are scheduled. A worker may add itselfto or remove itself from a scheduling context.<br><b>Locking and synchronization rules between workers and scheduling contexts</b>A thread currently holding a worker sched_mutex must not attempt to acquire ascheduling context rwlock, neither for writing nor for reading. Such an attemptconstitutes a lock inversion and may result in a deadlock. A worker currently in a scheduling operation must enter the relaxed state beforeattempting to acquire a scheduling context rwlock, either for reading or forwriting.When the set of workers assigned to a scheduling context is about to bemodified, all the workers in the union between the workers belonging to thescheduling context before the change and the workers expected to belong to thescheduling context after the change must be notified using the \refnotify_workers_about_changing_ctx_pending() function prior to the update. Afterthe update, all the workers in that same union must be notified for the updatecompletion with a call to \ref notify_workers_about_changing_ctx_done().The function \ref notify_workers_about_changing_ctx_pending() places everyworker passed in argument in a state compatible with changing the schedulingcontext assignment of that worker, possibly blocking until that worker leavesincompatible states such as a pending scheduling operation. If the caller of\c notify_workers_about_changing_ctx_pending() is itself a worker included in the setof workers passed in argument, it does not notify itself, with the assumptionthat the worker is already calling \c notify_workers_about_changing_ctx_pending()from a state compatible with a scheduling context assignment update.Once a worker has been notified about a scheduling context change pending, itcannot proceed with incompatible operations such as a scheduling operation untilit receives a notification that the context update operation is complete.\subsection CoreEntitiesDrivers DriversEach driver defines a set of routines depending on some specific hardware. Theseroutines include hardware discovery/initialization, task execution, devicememory management and data transfers.While most hardware dependent routines are in source files located in the \c/src/drivers subdirectory of the StarPU tree, some can be found elsewhere in thetree such as \c src/datawizard/malloc.c for memory allocation routines or thesubdirectories of \c src/datawizard/interfaces/ for data transfer routines.The driver ABI defined in the \ref _starpu_driver_ops structure includes thefollowing operations:- \c .init: initialize a driver instance for the calling worker   managing a hardware computing unit compatible with  this driver.- \c .run_once: perform a single driver progress cycle for the calling worker  (see \ref CoreEntitiesWorkersOperations).- \c .deinit: deinitialize the driver instance for the calling worker- \c .run: executes the following sequence automatically: call \c .init,  repeatedly call \c .run_once until the function \ref  _starpu_machine_is_running() returns false, call \c .deinit.The source code common to all drivers is shared in<code>src/drivers/driver_common/driver_common.[ch]</code>. This file includesservices such as grabbing a new task to execute on a worker, managing statisticsaccounting on job startup and completion and updating the worker status\subsubsection CoreEntitiesDriversMP Master/Slave DriversA subset of the drivers corresponds to drivers managing computing units inmaster/slave mode, that is, drivers involving a local master instance managingone or more remote slave instances on the targeted device(s). This includesdevices such as discrete manycore accelerators (e.g. Intel's Knight Cornersboard, for instance), or pseudo devices such as a cluster of cpu nodes driverthrough StarPU's MPI master/slave mode. A driver instance on the master sideis named the \b source, while a driver instances on the slave side is namedthe \b sink.A significant part of the work realized on the source and sink sides ofmaster/slave drivers is identical among all master/slave drivers, due to thesimilarities in the software pattern. Therefore, many routines are shared amongall these drivers in the \c src/drivers/mp_common subdirectory. In particular, aset of default commands to be used between sources and sinks is defined,assuming the availability of some communication channel between them (see enum\ref _starpu_mp_command)TODO\subsection CoreEntitiesTasksJobs Tasks and JobsTODO\subsection CoreEntitiesData DataTODO*/
 |