| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486 | 
							- @c -*-texinfo-*-
 
- @c This file is part of the StarPU Handbook.
 
- @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 
- @c See the file starpu.texi for copying conditions.
 
- @cindex C extensions
 
- @cindex GCC plug-in
 
- When GCC plug-in support is available, StarPU builds a plug-in for the
 
- GNU Compiler Collection (GCC), which defines extensions to languages of
 
- the C family (C, C++, Objective-C) that make it easier to write StarPU
 
- code@footnote{This feature is only available for GCC 4.5 and later; it
 
- is known to work with GCC 4.5, 4.6, and 4.7.  You
 
- may need to install a specific @code{-dev} package of your distro, such
 
- as @code{gcc-4.6-plugin-dev} on Debian and derivatives.  In addition,
 
- the plug-in's test suite is only run when
 
- @url{http://www.gnu.org/software/guile/, GNU@tie{}Guile} is found at
 
- @code{configure}-time.  Building the GCC plug-in
 
- can be disabled by configuring with @code{--disable-gcc-extensions}.}.
 
- Those extensions include syntactic sugar for defining
 
- tasks and their implementations, invoking a task, and manipulating data
 
- buffers.  Use of these extensions can be made conditional on the
 
- availability of the plug-in, leading to valid C sequential code when the
 
- plug-in is not used (@pxref{Conditional Extensions}).
 
- When StarPU has been installed with its GCC plug-in, programs that use
 
- these extensions can be compiled this way:
 
- @example
 
- $ gcc -c -fplugin=`pkg-config starpu-1.1 --variable=gccplugin` foo.c
 
- @end example
 
- @noindent
 
- When the plug-in is not available, the above @command{pkg-config}
 
- command returns the empty string.
 
- In addition, the @code{-fplugin-arg-starpu-verbose} flag can be used to
 
- obtain feedback from the compiler as it analyzes the C extensions used
 
- in source files.
 
- This section describes the C extensions implemented by StarPU's GCC
 
- plug-in.  It does not require detailed knowledge of the StarPU library.
 
- Note: as of StarPU @value{VERSION}, this is still an area under
 
- development and subject to change.
 
- @menu
 
- * Defining Tasks::              Defining StarPU tasks
 
- * Synchronization and Other Pragmas:: Synchronization, and more.
 
- * Registered Data Buffers::     Manipulating data buffers
 
- * Conditional Extensions::      Using C extensions only when available
 
- @end menu
 
- @node Defining Tasks
 
- @section Defining Tasks
 
- @cindex task
 
- @cindex task implementation
 
- The StarPU GCC plug-in views @dfn{tasks} as ``extended'' C functions:
 
- @enumerate
 
- @item
 
- tasks may have several implementations---e.g., one for CPUs, one written
 
- in OpenCL, one written in CUDA;
 
- @item
 
- tasks may have several implementations of the same target---e.g.,
 
- several CPU implementations;
 
- @item
 
- when a task is invoked, it may run in parallel, and StarPU is free to
 
- choose any of its implementations.
 
- @end enumerate
 
- Tasks and their implementations must be @emph{declared}.  These
 
- declarations are annotated with @dfn{attributes} (@pxref{Attribute
 
- Syntax, attributes in GNU C,, gcc, Using the GNU Compiler Collection
 
- (GCC)}): the declaration of a task is a regular C function declaration
 
- with an additional @code{task} attribute, and task implementations are
 
- declared with a @code{task_implementation} attribute.
 
- The following function attributes are provided:
 
- @table @code
 
- @item task
 
- @cindex @code{task} attribute
 
- Declare the given function as a StarPU task.  Its return type must be
 
- @code{void}.  When a function declared as @code{task} has a user-defined
 
- body, that body is interpreted as the @dfn{implicit definition of the
 
- task's CPU implementation} (see example below).  In all cases, the
 
- actual definition of a task's body is automatically generated by the
 
- compiler.
 
- Under the hood, declaring a task leads to the declaration of the
 
- corresponding @code{codelet} (@pxref{Codelet and Tasks}).  If one or
 
- more task implementations are declared in the same compilation unit,
 
- then the codelet and the function itself are also defined; they inherit
 
- the scope of the task.
 
- Scalar arguments to the task are passed by value and copied to the
 
- target device if need be---technically, they are passed as the
 
- @code{cl_arg} buffer (@pxref{Codelets and Tasks, @code{cl_arg}}).
 
- @cindex @code{output} type attribute
 
- Pointer arguments are assumed to be registered data buffers---the
 
- @code{buffers} argument of a task (@pxref{Codelets and Tasks,
 
- @code{buffers}}); @code{const}-qualified pointer arguments are viewed as
 
- read-only buffers (@code{STARPU_R}), and non-@code{const}-qualified
 
- buffers are assumed to be used read-write (@code{STARPU_RW}).  In
 
- addition, the @code{output} type attribute can be as a type qualifier
 
- for output pointer or array parameters (@code{STARPU_W}).
 
- @item task_implementation (@var{target}, @var{task})
 
- @cindex @code{task_implementation} attribute
 
- Declare the given function as an implementation of @var{task} to run on
 
- @var{target}.  @var{target} must be a string, currently one of
 
- @code{"cpu"}, @code{"opencl"}, or @code{"cuda"}.
 
- @c FIXME: Update when OpenCL support is ready.
 
- @end table
 
- Here is an example:
 
- @cartouche
 
- @smallexample
 
- #define __output  __attribute__ ((output))
 
- static void matmul (const float *A, const float *B,
 
-                     __output float *C,
 
-                     unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task));
 
- static void matmul_cpu (const float *A, const float *B,
 
-                         __output float *C,
 
-                         unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task_implementation ("cpu", matmul)));
 
- static void
 
- matmul_cpu (const float *A, const float *B, __output float *C,
 
-             unsigned nx, unsigned ny, unsigned nz)
 
- @{
 
-   unsigned i, j, k;
 
-   for (j = 0; j < ny; j++)
 
-     for (i = 0; i < nx; i++)
 
-       @{
 
-         for (k = 0; k < nz; k++)
 
-           C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
 
-       @}
 
- @}
 
- @end smallexample
 
- @end cartouche
 
- @noindent
 
- A @code{matmult} task is defined; it has only one implementation,
 
- @code{matmult_cpu}, which runs on the CPU.  Variables @var{A} and
 
- @var{B} are input buffers, whereas @var{C} is considered an input/output
 
- buffer.
 
- @cindex implicit task CPU implementation
 
- For convenience, when a function declared with the @code{task} attribute
 
- has a user-defined body, that body is assumed to be that of the CPU
 
- implementation of a task, which we call an @dfn{implicit task CPU
 
- implementation}.  Thus, the above snippet can be simplified like this:
 
- @cartouche
 
- @smallexample
 
- #define __output  __attribute__ ((output))
 
- static void matmul (const float *A, const float *B,
 
-                     __output float *C,
 
-                     unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task));
 
- /* Implicit definition of the CPU implementation of the
 
-    `matmul' task.  */
 
- static void
 
- matmul (const float *A, const float *B, __output float *C,
 
-         unsigned nx, unsigned ny, unsigned nz)
 
- @{
 
-   unsigned i, j, k;
 
-   for (j = 0; j < ny; j++)
 
-     for (i = 0; i < nx; i++)
 
-       @{
 
-         for (k = 0; k < nz; k++)
 
-           C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
 
-       @}
 
- @}
 
- @end smallexample
 
- @end cartouche
 
- @noindent
 
- Use of implicit CPU task implementations as above has the advantage that
 
- the code is valid sequential code when StarPU's GCC plug-in is not used
 
- (@pxref{Conditional Extensions}).
 
- CUDA and OpenCL implementations can be declared in a similar way:
 
- @cartouche
 
- @smallexample
 
- static void matmul_cuda (const float *A, const float *B, float *C,
 
-                          unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task_implementation ("cuda", matmul)));
 
- static void matmul_opencl (const float *A, const float *B, float *C,
 
-                            unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task_implementation ("opencl", matmul)));
 
- @end smallexample
 
- @end cartouche
 
- @noindent
 
- The CUDA and OpenCL implementations typically either invoke a kernel
 
- written in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and
 
- @pxref{OpenCL Kernel}), or call a library function that uses CUDA or
 
- OpenCL under the hood, such as CUBLAS functions:
 
- @cartouche
 
- @smallexample
 
- static void
 
- matmul_cuda (const float *A, const float *B, float *C,
 
-              unsigned nx, unsigned ny, unsigned nz)
 
- @{
 
-   cublasSgemm ('n', 'n', nx, ny, nz,
 
-                1.0f, A, 0, B, 0,
 
-                0.0f, C, 0);
 
-   cudaStreamSynchronize (starpu_cuda_get_local_stream ());
 
- @}
 
- @end smallexample
 
- @end cartouche
 
- A task can be invoked like a regular C function:
 
- @cartouche
 
- @smallexample
 
- matmul (&A[i * zdim * bydim + k * bzdim * bydim],
 
-         &B[k * xdim * bzdim + j * bxdim * bzdim],
 
-         &C[i * xdim * bydim + j * bxdim * bydim],
 
-         bxdim, bydim, bzdim);
 
- @end smallexample
 
- @end cartouche
 
- @noindent
 
- This leads to an @dfn{asynchronous invocation}, whereby @code{matmult}'s
 
- implementation may run in parallel with the continuation of the caller.
 
- The next section describes how memory buffers must be handled in
 
- StarPU-GCC code.  For a complete example, see the
 
- @code{gcc-plugin/examples} directory of the source distribution, and
 
- @ref{Vector Scaling Using the C Extension, the vector-scaling
 
- example}.
 
- @node Synchronization and Other Pragmas
 
- @section Initialization, Termination, and Synchronization
 
- The following pragmas allow user code to control StarPU's life time and
 
- to synchronize with tasks.
 
- @table @code
 
- @item #pragma starpu initialize
 
- Initialize StarPU.  This call is compulsory and is @emph{never} added
 
- implicitly.  One of the reasons this has to be done explicitly is that
 
- it provides greater control to user code over its resource usage.
 
- @item #pragma starpu shutdown
 
- Shut down StarPU, giving it an opportunity to write profiling info to a
 
- file on disk, for instance (@pxref{Off-line, off-line performance
 
- feedback}).
 
- @item #pragma starpu wait
 
- Wait for all task invocations to complete, as with
 
- @code{starpu_wait_for_all} (@pxref{Codelets and Tasks,
 
- starpu_wait_for_all}).
 
- @end table
 
- @node Registered Data Buffers
 
- @section Registered Data Buffers
 
- Data buffers such as matrices and vectors that are to be passed to tasks
 
- must be @dfn{registered}.  Registration allows StarPU to handle data
 
- transfers among devices---e.g., transferring an input buffer from the
 
- CPU's main memory to a task scheduled to run a GPU (@pxref{StarPU Data
 
- Management Library}).
 
- The following pragmas are provided:
 
- @table @code
 
- @item #pragma starpu register @var{ptr} [@var{size}]
 
- Register @var{ptr} as a @var{size}-element buffer.  When @var{ptr} has
 
- an array type whose size is known, @var{size} may be omitted.
 
- Alternatively, the @code{registered} attribute can be used (see below.)
 
- @item #pragma starpu unregister @var{ptr}
 
- Unregister the previously-registered memory area pointed to by
 
- @var{ptr}.  As a side-effect, @var{ptr} points to a valid copy in main
 
- memory.
 
- @item #pragma starpu acquire @var{ptr}
 
- Acquire in main memory an up-to-date copy of the previously-registered
 
- memory area pointed to by @var{ptr}, for read-write access.
 
- @item #pragma starpu release @var{ptr}
 
- Release the previously-register memory area pointed to by @var{ptr},
 
- making it available to the tasks.
 
- @end table
 
- Additionally, the following attributes offer a simple way to allocate
 
- and register storage for arrays:
 
- @table @code
 
- @item registered
 
- @cindex @code{registered} attribute
 
- This attributes applies to local variables with an array type.  Its
 
- effect is to automatically register the array's storage, as per
 
- @code{#pragma starpu register}.  The array is automatically unregistered
 
- when the variable's scope is left.  This attribute is typically used in
 
- conjunction with the @code{heap_allocated} attribute, described below.
 
- @item heap_allocated
 
- @cindex @code{heap_allocated} attribute
 
- This attributes applies to local variables with an array type.  Its
 
- effect is to automatically allocate the array's storage on
 
- the heap, using @code{starpu_malloc} under the hood (@pxref{Basic Data
 
- Management API, starpu_malloc}).  The heap-allocated array is automatically
 
- freed when the variable's scope is left, as with
 
- automatic variables.
 
- @end table
 
- @noindent
 
- The following example illustrates use of the @code{heap_allocated}
 
- attribute:
 
- @example
 
- extern void cholesky(unsigned nblocks, unsigned size,
 
-                     float mat[nblocks][nblocks][size])
 
-   __attribute__ ((task));
 
- int
 
- main (int argc, char *argv[])
 
- @{
 
- #pragma starpu initialize
 
-   /* ... */
 
-   int nblocks, size;
 
-   parse_args (&nblocks, &size);
 
-   /* Allocate an array of the required size on the heap,
 
-      and register it.  */
 
-   @{
 
-     float matrix[nblocks][nblocks][size]
 
-       __attribute__ ((heap_allocated, registered));
 
-     cholesky (nblocks, size, matrix);
 
- #pragma starpu wait
 
-   @}   /* MATRIX is automatically unregistered & freed here.  */
 
- #pragma starpu shutdown
 
-   return EXIT_SUCCESS;
 
- @}
 
- @end example
 
- @node Conditional Extensions
 
- @section Using C Extensions Conditionally
 
- The C extensions described in this chapter are only available when GCC
 
- and its StarPU plug-in are in use.  Yet, it is possible to make use of
 
- these extensions when they are available---leading to hybrid CPU/GPU
 
- code---and discard them when they are not available---leading to valid
 
- sequential code.
 
- To that end, the GCC plug-in defines a C preprocessor macro when it is
 
- being used:
 
- @defmac STARPU_GCC_PLUGIN
 
- Defined for code being compiled with the StarPU GCC plug-in.  When
 
- defined, this macro expands to an integer denoting the version of the
 
- supported C extensions.
 
- @end defmac
 
- The code below illustrates how to define a task and its implementations
 
- in a way that allows it to be compiled without the GCC plug-in:
 
- @smallexample
 
- /* This program is valid, whether or not StarPU's GCC plug-in
 
-    is being used.  */
 
- #include <stdlib.h>
 
- /* The attribute below is ignored when GCC is not used.  */
 
- static void matmul (const float *A, const float *B, float * C,
 
-                     unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task));
 
- static void
 
- matmul (const float *A, const float *B, float * C,
 
-         unsigned nx, unsigned ny, unsigned nz)
 
- @{
 
-   /* Code of the CPU kernel here...  */
 
- @}
 
- #ifdef STARPU_GCC_PLUGIN
 
- /* Optional OpenCL task implementation.  */
 
- static void matmul_opencl (const float *A, const float *B, float * C,
 
-                            unsigned nx, unsigned ny, unsigned nz)
 
-   __attribute__ ((task_implementation ("opencl", matmul)));
 
- static void
 
- matmul_opencl (const float *A, const float *B, float * C,
 
-                unsigned nx, unsigned ny, unsigned nz)
 
- @{
 
-   /* Code that invokes the OpenCL kernel here...  */
 
- @}
 
- #endif
 
- int
 
- main (int argc, char *argv[])
 
- @{
 
-   /* The pragmas below are simply ignored when StarPU-GCC
 
-      is not used.  */
 
- #pragma starpu initialize
 
-   float A[123][42][7], B[123][42][7], C[123][42][7];
 
- #pragma starpu register A
 
- #pragma starpu register B
 
- #pragma starpu register C
 
-   /* When StarPU-GCC is used, the call below is asynchronous;
 
-      otherwise, it is synchronous.  */
 
-   matmul ((float *) A, (float *) B, (float *) C, 123, 42, 7);
 
- #pragma starpu wait
 
- #pragma starpu shutdown
 
-   return EXIT_SUCCESS;
 
- @}
 
- @end smallexample
 
- @noindent
 
- The above program is a valid StarPU program when StarPU's GCC plug-in is
 
- used; it is also a valid sequential program when the plug-in is not
 
- used.
 
- Note that attributes such as @code{task} as well as @code{starpu}
 
- pragmas are simply ignored by GCC when the StarPU plug-in is not loaded.
 
- However, @command{gcc -Wall} emits a warning for unknown attributes and
 
- pragmas, which can be inconvenient.  In addition, other compilers may be
 
- unable to parse the attribute syntax@footnote{In practice, Clang and
 
- several proprietary compilers implement attributes.}, so you may want to
 
- wrap attributes in macros like this:
 
- @smallexample
 
- /* Use the `task' attribute only when StarPU's GCC plug-in
 
-    is available.   */
 
- #ifdef STARPU_GCC_PLUGIN
 
- # define __task  __attribute__ ((task))
 
- #else
 
- # define __task
 
- #endif
 
- static void matmul (const float *A, const float *B, float *C,
 
-                     unsigned nx, unsigned ny, unsigned nz) __task;
 
- @end smallexample
 
- @c Local Variables:
 
- @c TeX-master: "../starpu.texi"
 
- @c ispell-local-dictionary: "american"
 
- @c End:
 
 
  |