| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361 | /* * This file is part of the StarPU Handbook. * Copyright (C) 2009--2011  Universit@'e de Bordeaux 1 * Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique * See the file version.doxy for copying conditions. *//*! \page cExtensions C ExtensionsWhen GCC plug-in support is available, StarPU builds a plug-in for theGNU Compiler Collection (GCC), which defines extensions to languages ofthe C family (C, C++, Objective-C) that make it easier to write StarPUcode. This feature is only available for GCC 4.5 and later; itis known to work with GCC 4.5, 4.6, and 4.7.  Youmay need to install a specific <c>-dev</c> package of your distro, suchas <c>gcc-4.6-plugin-dev</c> on Debian and derivatives.  In addition,the plug-in's test suite is only run when <a href="http://www.gnu.org/software/guile/">GNU Guile</a> is found at<c>configure</c>-time.  Building the GCC plug-incan be disabled by configuring with \ref disable-gcc-extensions "--disable-gcc-extensions".Those extensions include syntactic sugar for definingtasks and their implementations, invoking a task, and manipulating databuffers.  Use of these extensions can be made conditional on theavailability of the plug-in, leading to valid C sequential code when theplug-in is not used (\ref UsingCExtensionsConditionally).When StarPU has been installed with its GCC plug-in, programs that usethese extensions can be compiled this way:\verbatim$ gcc -c -fplugin=`pkg-config starpu-1.2 --variable=gccplugin` foo.c\endverbatimWhen the plug-in is not available, the above <c>pkg-config</c>command returns the empty string.In addition, the <c>-fplugin-arg-starpu-verbose</c> flag can be used toobtain feedback from the compiler as it analyzes the C extensions usedin source files.This section describes the C extensions implemented by StarPU's GCCplug-in.  It does not require detailed knowledge of the StarPU library.Note: this is still an area under development and subject to change.\section DefiningTasks Defining TasksThe StarPU GCC plug-in views tasks as ``extended'' C functions:<ul><Li>tasks may have several implementations---e.g., one for CPUs, one writtenin OpenCL, one written in CUDA;</li><Li>tasks may have several implementations of the same target---e.g.,several CPU implementations;</li><li>when a task is invoked, it may run in parallel, and StarPU is free tochoose any of its implementations.</li></ul>Tasks and their implementations must be <em>declared</em>.  Thesedeclarations are annotated with attributes(http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute-Syntax):the declaration of a task is a regular C function declaration with anadditional <c>task</c> attribute, and task implementations aredeclared with a <c>task_implementation</c> attribute.The following function attributes are provided:<dl><dt><c>task</c></dt><dd>Declare the given function as a StarPU task.  Its return type must be<c>void</c>.  When a function declared as <c>task</c> has a user-definedbody, that body is interpreted as the implicit definition of thetask's CPU implementation (see example below).  In all cases, theactual definition of a task's body is automatically generated by thecompiler.Under the hood, declaring a task leads to the declaration of thecorresponding <c>codelet</c> (\ref CodeletAndTasks).  If one ormore task implementations are declared in the same compilation unit,then the codelet and the function itself are also defined; they inheritthe scope of the task.Scalar arguments to the task are passed by value and copied to thetarget device if need be---technically, they are passed as the bufferstarpu_task::cl_arg (\ref CodeletAndTasks).Pointer arguments are assumed to be registered data buffers---thehandles argument of a task (starpu_task::handles) ; <c>const</c>-qualifiedpointer arguments are viewed as read-only buffers (::STARPU_R), andnon-<c>const</c>-qualified buffers are assumed to be used read-write(::STARPU_RW).  In addition, the <c>output</c> type attribute can beas a type qualifier for output pointer or array parameters(::STARPU_W).</dd><dt><c>task_implementation (target, task)</c></dt><dd>Declare the given function as an implementation of <c>task</c> to run on<c>target</c>.  <c>target</c> must be a string, currently one of<c>"cpu"</c>, <c>"opencl"</c>, or <c>"cuda"</c>.\internalFIXME: Update when OpenCL support is ready.\endinternal</dd></dl>Here is an example:\code{.c}#define __output  __attribute__ ((output))static void matmul (const float *A, const float *B,                    __output float *C,                    unsigned nx, unsigned ny, unsigned nz)  __attribute__ ((task));static void matmul_cpu (const float *A, const float *B,                        __output float *C,                        unsigned nx, unsigned ny, unsigned nz)  __attribute__ ((task_implementation ("cpu", matmul)));static voidmatmul_cpu (const float *A, const float *B, __output float *C,            unsigned nx, unsigned ny, unsigned nz){  unsigned i, j, k;  for (j = 0; j < ny; j++)    for (i = 0; i < nx; i++)      {        for (k = 0; k < nz; k++)          C[j * nx + i] += A[j * nz + k] * B[k * nx + i];      }}\endcodeA <c>matmult</c> task is defined; it has only one implementation,<c>matmult_cpu</c>, which runs on the CPU.  Variables <c>A</c> and<c>B</c> are input buffers, whereas <c>C</c> is considered an input/outputbuffer.For convenience, when a function declared with the <c>task</c> attributehas a user-defined body, that body is assumed to be that of the CPUimplementation of a task, which we call an implicit task CPUimplementation.  Thus, the above snippet can be simplified like this:\code{.c}#define __output  __attribute__ ((output))static void matmul (const float *A, const float *B,                    __output float *C,                    unsigned nx, unsigned ny, unsigned nz)  __attribute__ ((task));/* Implicit definition of the CPU implementation of the   `matmul' task.  */static voidmatmul (const float *A, const float *B, __output float *C,        unsigned nx, unsigned ny, unsigned nz){  unsigned i, j, k;  for (j = 0; j < ny; j++)    for (i = 0; i < nx; i++)      {        for (k = 0; k < nz; k++)          C[j * nx + i] += A[j * nz + k] * B[k * nx + i];      }}\endcodeUse of implicit CPU task implementations as above has the advantage thatthe code is valid sequential code when StarPU's GCC plug-in is not used(\ref UsingCExtensionsConditionally).CUDA and OpenCL implementations can be declared in a similar way:\code{.c}static void matmul_cuda (const float *A, const float *B, float *C,                         unsigned nx, unsigned ny, unsigned nz)  __attribute__ ((task_implementation ("cuda", matmul)));static void matmul_opencl (const float *A, const float *B, float *C,                           unsigned nx, unsigned ny, unsigned nz)  __attribute__ ((task_implementation ("opencl", matmul)));\endcodeThe CUDA and OpenCL implementations typically either invoke a kernelwritten in CUDA or OpenCL (for similar code, \ref CUDAKernel, and\ref OpenCLKernel), or call a library function that uses CUDA orOpenCL under the hood, such as CUBLAS functions:\code{.c}static voidmatmul_cuda (const float *A, const float *B, float *C,             unsigned nx, unsigned ny, unsigned nz){  cublasSgemm ('n', 'n', nx, ny, nz,               1.0f, A, 0, B, 0,               0.0f, C, 0);  cudaStreamSynchronize (starpu_cuda_get_local_stream ());}\endcodeA task can be invoked like a regular C function:\code{.c}matmul (&A[i * zdim * bydim + k * bzdim * bydim],        &B[k * xdim * bzdim + j * bxdim * bzdim],        &C[i * xdim * bydim + j * bxdim * bydim],        bxdim, bydim, bzdim);\endcodeThis leads to an asynchronous invocation, whereby <c>matmult</c>'simplementation may run in parallel with the continuation of the caller.The next section describes how memory buffers must be handled inStarPU-GCC code.  For a complete example, see the<c>gcc-plugin/examples</c> directory of the source distribution, and\ref VectorScalingUsingTheCExtension.\section InitializationTerminationAndSynchronization Initialization, Termination, and SynchronizationThe following pragmas allow user code to control StarPU's life time andto synchronize with tasks.<dl><dt><c>\#pragma starpu initialize</c></dt><dd>Initialize StarPU.  This call is compulsory and is <em>never</em> addedimplicitly.  One of the reasons this has to be done explicitly is thatit provides greater control to user code over its resource usage.</dd><dt><c>\#pragma starpu shutdown</c></dt><dd>Shut down StarPU, giving it an opportunity to write profiling info to afile on disk, for instance (\ref Off-linePerformanceFeedback).</dd><dt><c>\#pragma starpu wait</c></dt><dd>Wait for all task invocations to complete, as withstarpu_task_wait_for_all().</dd></dl>\section RegisteredDataBuffers Registered Data BuffersData buffers such as matrices and vectors that are to be passed to tasksmust be registered.  Registration allows StarPU to handle datatransfers among devices---e.g., transferring an input buffer from theCPU's main memory to a task scheduled to run a GPU (\ref StarPUDataManagementLibrary).The following pragmas are provided:<dl><dt><c>\#pragma starpu register ptr [size]</c></dt><dd>Register <c>ptr</c> as a <c>size</c>-element buffer.  When <c>ptr</c> hasan array type whose size is known, <c>size</c> may be omitted.Alternatively, the <c>registered</c> attribute can be used (see below.)</dd><dt><c>\#pragma starpu unregister ptr</c></dt><dd>Unregister the previously-registered memory area pointed to by<c>ptr</c>.  As a side-effect, <c>ptr</c> points to a valid copy in mainmemory.</dd><dt><c>\#pragma starpu acquire ptr</c></dt><dd>Acquire in main memory an up-to-date copy of the previously-registeredmemory area pointed to by <c>ptr</c>, for read-write access.</dd><dt><c>\#pragma starpu release ptr</c></dt><dd>Release the previously-register memory area pointed to by <c>ptr</c>,making it available to the tasks.</dd></dl>Additionally, the following attributes offer a simple way to allocateand register storage for arrays:<dl><dt><c>registered</c></dt><dd>This attributes applies to local variables with an array type.  Itseffect is to automatically register the array's storage, as per<c>\#pragma starpu register</c>.  The array is automatically unregisteredwhen the variable's scope is left.  This attribute is typically used inconjunction with the <c>heap_allocated</c> attribute, described below.</dd><dt><c>heap_allocated</c></dt><dd>This attributes applies to local variables with an array type.  Itseffect is to automatically allocate the array's storage onthe heap, using starpu_malloc() under the hood.  The heap-allocated array is automaticallyfreed when the variable's scope is left, as withautomatic variables.</dd></dl>The following example illustrates use of the <c>heap_allocated</c>attribute:\snippet cholesky_pragma.c To be included\section UsingCExtensionsConditionally Using C Extensions ConditionallyThe C extensions described in this chapter are only available when GCCand its StarPU plug-in are in use.  Yet, it is possible to make use ofthese extensions when they are available---leading to hybrid CPU/GPUcode---and discard them when they are not available---leading to validsequential code.To that end, the GCC plug-in defines the C preprocessor macro ---<c>STARPU_GCC_PLUGIN</c> --- when it is being used. When defined, thismacro expands to an integer denoting the version of the supported Cextensions.The code below illustrates how to define a task and its implementationsin a way that allows it to be compiled without the GCC plug-in:\snippet matmul_pragma.c To be includedThe above program is a valid StarPU program when StarPU's GCC plug-in isused; it is also a valid sequential program when the plug-in is notused.Note that attributes such as <c>task</c> as well as <c>starpu</c>pragmas are simply ignored by GCC when the StarPU plug-in is not loaded.However, <c>gcc -Wall</c> emits a warning for unknown attributes andpragmas, which can be inconvenient.  In addition, other compilers may beunable to parse the attribute syntax (In practice, Clang andseveral proprietary compilers implement attributes.), so you may want towrap attributes in macros like this:\snippet matmul_pragma2.c To be included*/
 |