| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378 | @c -*-texinfo-*-@c This file is part of the StarPU Handbook.@c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique@c See the file starpu.texi for copying conditions.@cindex C extensions@cindex GCC plug-inWhen GCC plug-in support is available, StarPU builds a plug-in for theGNU Compiler Collection (GCC), which defines extensions to languages ofthe C family (C, C++, Objective-C) that make it easier to write StarPUcode@footnote{This feature is only available for GCC 4.5 and later.  Itcan be disabled by configuring with @code{--disable-gcc-extensions}.}.Those extensions include syntactic sugar for definingtasks and their implementations, invoking a task, and manipulating databuffers.  Use of these extensions can be made conditional on theavailability of the plug-in, leading to valid C sequential code when theplug-in is not used (@pxref{Conditional Extensions}).When StarPU has been installed with its GCC plug-in, programs that usethese extensions can be compiled this way:@example$ gcc -c -fplugin=`pkg-config starpu-1.0 --variable=gccplugin` foo.c@end example@noindentWhen the plug-in is not available, the above @command{pkg-config}command returns the empty string.This section describes the C extensions implemented by StarPU's GCCplug-in.  It does not require detailed knowledge of the StarPU library.Note: as of StarPU @value{VERSION}, this is still an area underdevelopment and subject to change.@menu* Defining Tasks::              Defining StarPU tasks* Registered Data Buffers::     Manipulating data buffers* Conditional Extensions::      Using C extensions only when available@end menu@node Defining Tasks@section Defining Tasks@cindex task@cindex task implementationThe StarPU GCC plug-in views @dfn{tasks} as ``extended'' C functions:@enumerate@itemtasks may have several implementations---e.g., one for CPUs, one writtenin OpenCL, one written in CUDA;@itemtasks may have several implementations of the same target---e.g.,several CPU implementations;@itemwhen a task is invoked, it may run in parallel, and StarPU is free tochoose any of its implementations.@end enumerateTasks and their implementations must be @emph{declared}.  Thesedeclarations are annotated with @dfn{attributes} (@pxref{AttributeSyntax, attributes in GNU C,, gcc, Using the GNU Compiler Collection(GCC)}): the declaration of a task is a regular C function declarationwith an additional @code{task} attribute, and task implementations aredeclared with a @code{task_implementation} attribute.The following function attributes are provided:@table @code@item task@cindex @code{task} attributeDeclare the given function as a StarPU task.  Its return type must be@code{void}, and it must not be defined---instead, a definition willautomatically be provided by the compiler.Under the hood, declaring a task leads to the declaration of thecorresponding @code{codelet} (@pxref{Codelet and Tasks}).  If one ormore task implementations are declared in the same compilation unit,then the codelet and the function itself are also defined; they inheritthe scope of the task.Scalar arguments to the task are passed by value and copied to thetarget device if need be---technically, they are passed as the@code{cl_arg} buffer (@pxref{Codelets and Tasks, @code{cl_arg}}).@cindex @code{output} type attributePointer arguments are assumed to be registered data buffers---the@code{buffers} argument of a task (@pxref{Codelets and Tasks,@code{buffers}}); @code{const}-qualified pointer arguments are viewed asread-only buffers (@code{STARPU_R}), and non-@code{const}-qualifiedbuffers are assumed to be used read-write (@code{STARPU_RW}).  Inaddition, the @code{output} type attribute can be as a type qualifierfor output pointer or array parameters (@code{STARPU_W}).@item task_implementation (@var{target}, @var{task})@cindex @code{task_implementation} attributeDeclare the given function as an implementation of @var{task} to run on@var{target}.  @var{target} must be a string, currently one of@code{"cpu"}, @code{"opencl"}, or @code{"cuda"}.@c FIXME: Update when OpenCL support is ready.@end tableHere is an example:@cartouche@smallexample#define __output  __attribute__ ((output))static void matmul (const float *A, const float *B,                    __output float *C,                    size_t nx, size_t ny, size_t nz)  __attribute__ ((task));static void matmul_cpu (const float *A, const float *B,                        __output float *C,                        size_t nx, size_t ny, size_t nz)  __attribute__ ((task_implementation ("cpu", matmul)));static voidmatmul_cpu (const float *A, const float *B, __output float *C,            size_t nx, size_t ny, size_t nz)@{  size_t i, j, k;  for (j = 0; j < ny; j++)    for (i = 0; i < nx; i++)      @{        for (k = 0; k < nz; k++)          C[j * nx + i] += A[j * nz + k] * B[k * nx + i];      @}@}@end smallexample@end cartouche@noindentA @code{matmult} task is defined; it has only one implementation,@code{matmult_cpu}, which runs on the CPU.  Variables @var{A} and@var{B} are input buffers, whereas @var{C} is considered an input/outputbuffer.CUDA and OpenCL implementations can be declared in a similar way:@cartouche@smallexamplestatic void matmul_cuda (const float *A, const float *B, float *C,                         size_t nx, size_t ny, size_t nz)  __attribute__ ((task_implementation ("cuda", matmul)));static void matmul_opencl (const float *A, const float *B, float *C,                           size_t nx, size_t ny, size_t nz)  __attribute__ ((task_implementation ("opencl", matmul)));@end smallexample@end cartouche@noindentThe CUDA and OpenCL implementations typically either invoke a kernelwritten in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and@pxref{OpenCL Kernel}), or call a library function that uses CUDA orOpenCL under the hood, such as CUBLAS functions:@cartouche@smallexamplestatic voidmatmul_cuda (const float *A, const float *B, float *C,             size_t nx, size_t ny, size_t nz)@{  cublasSgemm ('n', 'n', nx, ny, nz,               1.0f, A, 0, B, 0,               0.0f, C, 0);  cudaStreamSynchronize (starpu_cuda_get_local_stream ());@}@end smallexample@end cartoucheA task can be invoked like a regular C function:@cartouche@smallexamplematmul (&A[i * zdim * bydim + k * bzdim * bydim],        &B[k * xdim * bzdim + j * bxdim * bzdim],        &C[i * xdim * bydim + j * bxdim * bydim],        bxdim, bydim, bzdim);@end smallexample@end cartouche@noindentThis leads to an @dfn{asynchronous invocation}, whereby @code{matmult}'simplementation may run in parallel with the continuation of the caller.The next section describes how memory buffers must be handled inStarPU-GCC code.@node Registered Data Buffers@section Registered Data BuffersData buffers such as matrices and vectors that are to be passed to tasksmust be @dfn{registered}.  Registration allows StarPU to handle datatransfers among devices---e.g., transferring an input buffer from theCPU's main memory to a task scheduled to run a GPU (@pxref{StarPU DataManagement Library}).The following pragmas are provided:@table @code@item #pragma starpu register @var{ptr} [@var{size}]Register @var{ptr} as a @var{size}-element buffer.  When @var{ptr} hasan array type whose size is known, @var{size} may be omitted.@item #pragma starpu unregister @var{ptr}Unregister the previously-registered memory area pointed to by@var{ptr}.  As a side-effect, @var{ptr} points to a valid copy in mainmemory.@item #pragma starpu acquire @var{ptr}Acquire in main memory an up-to-date copy of the previously-registeredmemory area pointed to by @var{ptr}, for read-write access.@item #pragma starpu release @var{ptr}Release the previously-register memory area pointed to by @var{ptr},making it available to the tasks.@end tableAs a substitute for the @code{register} and @code{unregister} pragmas,the @code{heap_allocated} variable attribute offers a higher-levelmechanism:@table @code@item heap_allocated@cindex @code{heap_allocated} attributeThis attributes applies to local variables with an array type.  Itseffect is to automatically allocate and register the array's storage onthe heap, using @code{starpu_malloc} under the hood (@pxref{Basic DataLibrary API, starpu_malloc}).  The heap-allocated array is automaticallyfreed and unregistered when the variable's scope is left, as withautomatic variables@footnote{This is achieved by using the@code{cleanup} attribute (@pxref{Variable Attributes,,, gcc, Using theGNU Compiler Collection (GCC)})}.@end table@noindentThe following example illustrates use of the @code{heap_allocated}attribute:@exampleextern void cholesky(unsigned nblocks, unsigned size,                    float mat[nblocks][nblocks][size])  __attribute__ ((task));intmain (int argc, char *argv[])@{#pragma starpu initialize  /* ... */  int nblocks, size;  parse_args (&nblocks, &size);  /* Allocate an array of the required size on the heap,     and register it.  */  float matrix[nblocks][nblocks][size]    __attribute__ ((heap_allocated));  cholesky (nblocks, size, matrix);#pragma starpu shutdown  /* MATRIX is automatically freed upon return.  */  return EXIT_SUCCESS;@}@end example@node Conditional Extensions@section Using C Extensions ConditionallyThe C extensions described in this chapter are only available when GCCand its StarPU plug-in are in use.  Yet, it is possible to make use ofthese extensions when they are available---leading to hybrid CPU/GPUcode---and discard them when they are not available---leading to validsequential code.To that end, the GCC plug-in defines a C preprocessor macro when it isbeing used:@defmac STARPU_GCC_PLUGINDefined for code being compiled with the StarPU GCC plug-in.  Whendefined, this macro expands to an integer denoting the version of thesupported C extensions.@end defmacThe code below illustrates how to define a task and its implementationsin a way that allows it to be compiled without the GCC plug-in:@cartouche@smallexample/* The macros below abstract over the attributes specific to   StarPU-GCC and the name of the CPU implementation.  */#ifdef STARPU_GCC_PLUGIN# define __task  __attribute__ ((task))# define CPU_TASK_IMPL(task)  task ## _cpu#else# define __task# define CPU_TASK_IMPL(task)  task#endif#include <stdlib.h>static void matmul (const float *A, const float *B, float *C,                    size_t nx, size_t ny, size_t nz) __task;#ifdef STARPU_GCC_PLUGINstatic void matmul_cpu (const float *A, const float *B, float *C,                        size_t nx, size_t ny, size_t nz)  __attribute__ ((task_implementation ("cpu", matmul)));#endifstatic voidCPU_TASK_IMPL (matmul) (const float *A, const float *B, float *C,                        size_t nx, size_t ny, size_t nz)@{  /* Code of the CPU kernel here...  */@}intmain (int argc, char *argv[])@{  /* The pragmas below are simply ignored when StarPU-GCC     is not used.  */#pragma starpu initialize  float A[123][42][7], B[123][42][7], C[123][42][7];#pragma starpu register A#pragma starpu register B#pragma starpu register C  /* When StarPU-GCC is used, the call below is asynchronous;     otherwise, it is synchronous.  */  matmul (A, B, C, 123, 42, 7);#pragma starpu wait#pragma starpu shutdown  return EXIT_SUCCESS;@}@end smallexample@end cartoucheNote that attributes such as @code{task} are simply ignored by GCC whenthe StarPU plug-in is not loaded, so the @code{__task} macro could beomitted altogether.  However, @command{gcc -Wall} emits a warning forunknown attributes, which can be inconvenient, and other compilers maybe unable to parse the attribute syntax.  Thus, using macros such as@code{__task} above is recommended.@c Local Variables:@c TeX-master: "../starpu.texi"@c ispell-local-dictionary: "american"@c End:
 |