| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476 | /* StarPU --- Runtime system for heterogeneous multicore architectures. * * Copyright (C) 2014-2017, 2019                          CNRS * Copyright (C) 2014                                     Inria * * StarPU is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or (at * your option) any later version. * * StarPU is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * See the GNU Lesser General Public License in COPYING.LGPL for more details. *//*! \page OpenMPRuntimeSupport The StarPU OpenMP Runtime Support (SORS)StarPU provides the necessary routines and support to implement an OpenMP(http://www.openmp.org/) runtime compliant with therevision 3.1 of the language specification, and compliant with thetask-related data dependency functionalities introduced in the revision4.0 of the language. This StarPU OpenMP Runtime Support (SORS) has beendesigned to be targetted by OpenMP compilers such as the Klang-OMPcompiler. Most supported OpenMP directives can both be implementedinline or as outlined functions.All functions are defined in \ref API_OpenMP_Runtime_Support.\section Implementation Implementation Details and Specificities\subsection MainThread Main ThreadWhen using the SORS, the main thread gets involved in executing OpenMP tasksjust like every other threads, in order to be compliant with thespecification execution model. This contrasts with StarPU's usualexecution model where the main thread submit tasks but does not takepart in executing them.\subsection TaskSemantics Extended Task SemanticsThe semantics of tasks generated by the SORS are extended with respectto regular StarPU tasks in that SORS' tasks may block and be preemptedby SORS call, whereas regular StarPU tasks cannot. SORS tasks maycoexist with regular StarPU tasks. However, only the tasks created usingSORS API functions inherit from extended semantics.\section Configuration ConfigurationThe SORS can be compiled into <c>libstarpu</c> throughthe \c configure option \ref enable-openmp "--enable-openmp".Conditional compiled source codes may check for theavailability of the OpenMP Runtime Support by testing whether the Cpreprocessor macro <c>STARPU_OPENMP</c> is defined or not.\section InitExit Initialization and ShutdownThe SORS needs to be executed/terminated by thestarpu_omp_init() / starpu_omp_shutdown() instead ofstarpu_init() / starpu_shutdown(). This requirement is necessary to makesure that the main thread gets the proper execution environment to runOpenMP tasks. These calls will usually be performed by a compilerruntime. Thus, they can be executed from a constructor/destructor suchas this:\code{.c}__attribute__((constructor))static void omp_constructor(void){	int ret = starpu_omp_init();	STARPU_CHECK_RETURN_VALUE(ret, "starpu_omp_init");}__attribute__((destructor))static void omp_destructor(void){	starpu_omp_shutdown();}\endcode\sa starpu_omp_init()\sa starpu_omp_shutdown()\section Parallel Parallel Regions and WorksharingThe SORS provides functions to create OpenMP parallel regions as well asmapping work on participating workers. The current implementation doesnot provide nested active parallel regions: Parallel regions may becreated recursively, however only the first level parallel region mayhave more than one worker. From an internal point-of-view, the SORS'parallel regions are implemented as a set of implicit, extended semanticsStarPU tasks, following the execution model of the OpenMP specification.Thus the SORS' parallel region tasks may block and be preempted, bySORS calls, enabling constructs such as barriers.\subsection OMPParallel Parallel RegionsParallel regions can be created with the functionstarpu_omp_parallel_region() which accepts a set of attributes asparameter. The execution of the calling task is suspended until theparallel region completes. The field starpu_omp_parallel_region_attr::clis a regular StarPU codelet. However only CPU codelets aresupported for parallel regions.Here is an example of use:\code{.c}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	pthread_t tid = pthread_self();	int worker_id = starpu_worker_get_id();	printf("[tid %p] task thread = %d\n", (void *)tid, worker_id);}void f(void){	struct starpu_omp_parallel_region_attr attr;	memset(&attr, 0, sizeof(attr));	attr.cl.cpu_funcs[0] = parallel_region_f;	attr.cl.where        = STARPU_CPU;	attr.if_clause       = 1;	starpu_omp_parallel_region(&attr);	return 0;}\endcode\sa struct starpu_omp_parallel_region_attr\sa starpu_omp_parallel_region()\subsection OMPFor Parallel ForOpenMP <c>for</c> loops are provided by the starpu_omp_for() group offunctions. Variants are available for inline or outlinedimplementations. The SORS supports <c>static</c>, <c>dynamic</c>, and<c>guided</c> loop scheduling clauses. The <c>auto</c> scheduling clauseis implemented as <c>static</c>. The <c>runtime</c> scheduling clausehonors the scheduling mode selected through the environment variable\c OMP_SCHEDULE or the starpu_omp_set_schedule() function. For loops withthe <c>ordered</c> clause are also supported. An implicit barrier can beenforced or skipped at the end of the worksharing construct, accordingto the value of the <c>nowait</c> parameter.The canonical family of starpu_omp_for() functions provide each instancewith the first iteration number and the number of iterations (possiblyzero) to perform. The alternate family of starpu_omp_for_alt() functionsprovide each instance with the (possibly empty) range of iterations toperform, including the first and excluding the last.The family of starpu_omp_ordered() functions enable to implementOpenMP's ordered construct, a region with a parallel for loop that isguaranteed to be executed in the sequential order of the loopiterations.\code{.c}void for_g(unsigned long long i, unsigned long long nb_i, void *arg){	(void) arg;	for (; nb_i > 0; i++, nb_i--)	{		array[i] = 1;	}}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	starpu_omp_for(for_g, NULL, NB_ITERS, CHUNK, starpu_omp_sched_static, 0, 0);}\endcode\sa starpu_omp_for()\sa starpu_omp_for_inline_first()\sa starpu_omp_for_inline_next()\sa starpu_omp_for_alt()\sa starpu_omp_for_inline_first_alt()\sa starpu_omp_for_inline_next_alt()\sa starpu_omp_ordered()\sa starpu_omp_ordered_inline_begin()\sa starpu_omp_ordered_inline_end()\subsection OMPSections SectionsOpenMP <c>sections</c> worksharing constructs are supported using theset of starpu_omp_sections() variants. The general principle is eitherto provide an array of per-section functions or a single function thatwill redirect to execution to the suitable per-section functions. Animplicit barrier can be enforced or skipped at the end of theworksharing construct, according to the value of the <c>nowait</c>parameter.\code{.c}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	section_funcs[0] = f;	section_funcs[1] = g;	section_funcs[2] = h;	section_funcs[3] = i;	section_args[0] = arg_f;	section_args[1] = arg_g;	section_args[2] = arg_h;	section_args[3] = arg_i;	starpu_omp_sections(4, section_f, section_args, 0);}\endcode\sa starpu_omp_sections()\sa starpu_omp_sections_combined()\subsection OMPSingle SingleOpenMP <c>single</c> workharing constructs are supported using the setof starpu_omp_single() variants. Animplicit barrier can be enforced or skipped at the end of theworksharing construct, according to the value of the <c>nowait</c>parameter.\code{.c}void single_f(void *arg){	(void) arg;	pthread_t tid = pthread_self();	int worker_id = starpu_worker_get_id();	printf("[tid %p] task thread = %d -- single\n", (void *)tid, worker_id);}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	starpu_omp_single(single_f, NULL, 0);}\endcodeThe SORS also provides dedicated support for  <c>single</c> sectionswith <c>copyprivate</c> clauses through thestarpu_omp_single_copyprivate() function variants. The OpenMP<c>master</c> directive is supported as well using thestarpu_omp_master() function variants.\sa starpu_omp_master()\sa starpu_omp_master_inline()\sa starpu_omp_single()\sa starpu_omp_single_inline()\sa starpu_omp_single_copyprivate()\sa starpu_omp_single_copyprivate_inline_begin()\sa starpu_omp_single_copyprivate_inline_end()\section Task TasksThe SORS implements the necessary support of OpenMP 3.1 and OpenMP 4.0'sso-called explicit tasks, together with OpenMP 4.0's data dependencymanagement.\subsection OMPTask Explicit TasksExplicit OpenMP tasks are created with the SORS using thestarpu_omp_task_region() function. The implementation supports<c>if</c>, <c>final</c>, <c>untied</c> and <c>mergeable</c> clausesas defined in the OpenMP specification. Unless specified otherwise bythe appropriate clause(s), the created task may be executed by anyparticipating worker of the current parallel region.The current SORS implementation requires explicit tasks to be createdwithin the context of an active parallel region. In particular, anexplicit task cannot be created by the main thread outside of a parallelregion. Explicit OpenMP tasks created using starpu_omp_task_region() areimplemented as StarPU tasks with extended semantics, and may as such beblocked and preempted by SORS routines.The current SORS implementation supports recursive explicit taskscreation, to ensure compliance with the OpenMP specification. However,it should be noted that StarPU is not designed nor optimized forefficiently scheduling of recursive task applications.The code below shows how to create 4 explicit tasks within a parallelregion.\code{.c}void task_region_g(void *buffers[], void *args){	(void) buffers;	(void) args;	pthread tid = pthread_self();	int worker_id = starpu_worker_get_id();	printf("[tid %p] task thread = %d: explicit task \"g\"\n", (void *)tid, worker_id);}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	struct starpu_omp_task_region_attr attr;	memset(&attr, 0, sizeof(attr));	attr.cl.cpu_funcs[0]  = task_region_g;	attr.cl.where         = STARPU_CPU;	attr.if_clause        = 1;	attr.final_clause     = 0;	attr.untied_clause    = 1;	attr.mergeable_clause = 0;	starpu_omp_task_region(&attr);	starpu_omp_task_region(&attr);	starpu_omp_task_region(&attr);	starpu_omp_task_region(&attr);}\endcode\sa struct starpu_omp_task_region_attr\sa starpu_omp_task_region()\subsection DataDependencies Data DependenciesThe SORS implements inter-tasks data dependencies as specified in OpenMP4.0. Data dependencies are expressed using regular StarPU data handles(\ref starpu_data_handle_t) plugged into the task's <c>attr.cl</c>codelet. The family of starpu_vector_data_register() -like functions and thestarpu_data_lookup() function may be used to register a memory area andto retrieve the current data handle associated with a pointerrespectively. The testcase <c>./tests/openmp/task_02.c</c> gives adetailed example of using OpenMP 4.0 tasks dependencies with the SORSimplementation.Note: the OpenMP 4.0 specification only supports data dependenciesbetween sibling tasks, that is tasks created by the same implicit orexplicit parent task. The current SORS implementation also only supports datadependencies between sibling tasks. Consequently the behaviour isunspecified if dependencies are expressed beween tasks that have notbeen created by the same parent task.\subsection TaskSyncs TaskWait and TaskGroupThe SORS implements both the <c>taskwait</c> and <c>taskgroup</c> OpenMPtask synchronization constructs specified in OpenMP 4.0, with thestarpu_omp_taskwait() and starpu_omp_taskgroup() functions respectively.An example of starpu_omp_taskwait() use, creating two explicit tasks andwaiting for their completion:\code{.c}void task_region_g(void *buffers[], void *args){	(void) buffers;	(void) args;	printf("Hello, World!\n");}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	struct starpu_omp_task_region_attr attr;	memset(&attr, 0, sizeof(attr));	attr.cl.cpu_funcs[0]  = task_region_g;	attr.cl.where         = STARPU_CPU;	attr.if_clause        = 1;	attr.final_clause     = 0;	attr.untied_clause    = 1;	attr.mergeable_clause = 0;	starpu_omp_task_region(&attr);	starpu_omp_task_region(&attr);	starpu_omp_taskwait();\endcodeAn example of starpu_omp_taskgroup() use, creating a task group of two explicit tasks:\code{.c}void task_region_g(void *buffers[], void *args){	(void) buffers;	(void) args;	printf("Hello, World!\n");}void taskgroup_f(void *arg){	(void)arg;	struct starpu_omp_task_region_attr attr;	memset(&attr, 0, sizeof(attr));	attr.cl.cpu_funcs[0]  = task_region_g;	attr.cl.where         = STARPU_CPU;	attr.if_clause        = 1;	attr.final_clause     = 0;	attr.untied_clause    = 1;	attr.mergeable_clause = 0;	starpu_omp_task_region(&attr);	starpu_omp_task_region(&attr);}void parallel_region_f(void *buffers[], void *args){	(void) buffers;	(void) args;	starpu_omp_taskgroup(taskgroup_f, (void *)NULL);}\endcode\sa starpu_omp_task_region()\sa starpu_omp_taskwait()\sa starpu_omp_taskgroup()\sa starpu_omp_taskgroup_inline_begin()\sa starpu_omp_taskgroup_inline_end()\section Synchronization Synchronization SupportThe SORS implements objects and method to build common OpenMPsynchronization constructs.\subsection SimpleLock Simple LocksThe SORS Simple Locks are opaque starpu_omp_lock_t objects enabling multipletasks to synchronize with each others, following the Simple Lockconstructs defined by the OpenMP specification. In accordance with suchspecification, simple locks may not by acquired multiple times by thesame task, without being released in-between; otherwise, deadlocks mayresult. Codes requiring the possibility to lock multiple timesrecursively should use Nestable Locks (\ref NestableLock). Codes NOTrequiring the possibility to lock multiple times recursively should useSimple Locks as they incur less processing overhead than Nestable Locks.\sa starpu_omp_lock_t\sa starpu_omp_init_lock()\sa starpu_omp_destroy_lock()\sa starpu_omp_set_lock()\sa starpu_omp_unset_lock()\sa starpu_omp_test_lock()\subsection NestableLock Nestable LocksThe SORS Nestable Locks are opaque starpu_omp_nest_lock_t objects enablingmultiple tasks to synchronize with each others, following the NestableLock constructs defined by the OpenMP specification. In accordance withsuch specification, nestable locks may by acquired multiple timesrecursively by the same task without deadlocking. Nested locking andunlocking operations must be well parenthesized at any time, otherwisedeadlock and/or undefined behaviour may occur.  Codes requiring thepossibility to lock multiple times recursively should use NestableLocks. Codes NOT requiring the possibility to lock multiple timesrecursively should use Simple Locks (\ref SimpleLock) instead, as theyincur less processing overhead than Nestable Locks.\sa starpu_omp_nest_lock_t\sa starpu_omp_init_nest_lock()\sa starpu_omp_destroy_nest_lock()\sa starpu_omp_set_nest_lock()\sa starpu_omp_unset_nest_lock()\sa starpu_omp_test_nest_lock()\subsection Critical Critical SectionsThe SORS implements support for OpenMP critical sections through thefamily of \ref starpu_omp_critical functions. Critical sections may optionallybe named. There is a single, common anonymous critical section. Mutualexclusion only occur within the scope of single critical section, eithera named one or the anonymous one.\sa starpu_omp_critical()\sa starpu_omp_critical_inline_begin()\sa starpu_omp_critical_inline_end()\subsection Barrier BarriersThe SORS provides the starpu_omp_barrier() function to implementbarriers over parallel region teams. In accordance with the OpenMPspecification, the starpu_omp_barrier() function waits for everyimplicit task of the parallel region to reach the barrier and everyexplicit task launched by the parallel region to complete, beforereturning.\sa starpu_omp_barrier()*/
 |