@c -*-texinfo-*- @c This file is part of the StarPU Handbook. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique @c See the file starpu.texi for copying conditions. @cindex C extensions @cindex GCC plug-in When GCC plug-in support is available, StarPU builds a plug-in for the GNU Compiler Collection (GCC), which defines extensions to languages of the C family (C, C++, Objective-C) that make it easier to write StarPU code@footnote{This feature is only available for GCC 4.5 and later; it is known to work with GCC 4.5, 4.6, and 4.7. You may need to install a specific @code{-dev} package of your distro, such as @code{gcc-4.6-plugin-dev} on Debian and derivatives. In addition, the plug-in's test suite is only run when @url{http://www.gnu.org/software/guile/, GNU@tie{}Guile} is found at @code{configure}-time. Building the GCC plug-in can be disabled by configuring with @code{--disable-gcc-extensions}.}. Those extensions include syntactic sugar for defining tasks and their implementations, invoking a task, and manipulating data buffers. Use of these extensions can be made conditional on the availability of the plug-in, leading to valid C sequential code when the plug-in is not used (@pxref{Conditional Extensions}). When StarPU has been installed with its GCC plug-in, programs that use these extensions can be compiled this way: @example $ gcc -c -fplugin=`pkg-config starpu-1.1 --variable=gccplugin` foo.c @end example @noindent When the plug-in is not available, the above @command{pkg-config} command returns the empty string. In addition, the @code{-fplugin-arg-starpu-verbose} flag can be used to obtain feedback from the compiler as it analyzes the C extensions used in source files. This section describes the C extensions implemented by StarPU's GCC plug-in. It does not require detailed knowledge of the StarPU library. Note: as of StarPU @value{VERSION}, this is still an area under development and subject to change. @menu * Defining Tasks:: Defining StarPU tasks * Synchronization and Other Pragmas:: Synchronization, and more. * Registered Data Buffers:: Manipulating data buffers * Conditional Extensions:: Using C extensions only when available @end menu @node Defining Tasks @section Defining Tasks @cindex task @cindex task implementation The StarPU GCC plug-in views @dfn{tasks} as ``extended'' C functions: @enumerate @item tasks may have several implementations---e.g., one for CPUs, one written in OpenCL, one written in CUDA; @item tasks may have several implementations of the same target---e.g., several CPU implementations; @item when a task is invoked, it may run in parallel, and StarPU is free to choose any of its implementations. @end enumerate Tasks and their implementations must be @emph{declared}. These declarations are annotated with @dfn{attributes} (@pxref{Attribute Syntax, attributes in GNU C,, gcc, Using the GNU Compiler Collection (GCC)}): the declaration of a task is a regular C function declaration with an additional @code{task} attribute, and task implementations are declared with a @code{task_implementation} attribute. The following function attributes are provided: @table @code @item task @cindex @code{task} attribute Declare the given function as a StarPU task. Its return type must be @code{void}. When a function declared as @code{task} has a user-defined body, that body is interpreted as the @dfn{implicit definition of the task's CPU implementation} (see example below). In all cases, the actual definition of a task's body is automatically generated by the compiler. Under the hood, declaring a task leads to the declaration of the corresponding @code{codelet} (@pxref{Codelet and Tasks}). If one or more task implementations are declared in the same compilation unit, then the codelet and the function itself are also defined; they inherit the scope of the task. Scalar arguments to the task are passed by value and copied to the target device if need be---technically, they are passed as the @code{cl_arg} buffer (@pxref{Codelets and Tasks, @code{cl_arg}}). @cindex @code{output} type attribute Pointer arguments are assumed to be registered data buffers---the @code{buffers} argument of a task (@pxref{Codelets and Tasks, @code{buffers}}); @code{const}-qualified pointer arguments are viewed as read-only buffers (@code{STARPU_R}), and non-@code{const}-qualified buffers are assumed to be used read-write (@code{STARPU_RW}). In addition, the @code{output} type attribute can be as a type qualifier for output pointer or array parameters (@code{STARPU_W}). @item task_implementation (@var{target}, @var{task}) @cindex @code{task_implementation} attribute Declare the given function as an implementation of @var{task} to run on @var{target}. @var{target} must be a string, currently one of @code{"cpu"}, @code{"opencl"}, or @code{"cuda"}. @c FIXME: Update when OpenCL support is ready. @end table Here is an example: @cartouche @smallexample #define __output __attribute__ ((output)) static void matmul (const float *A, const float *B, __output float *C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task)); static void matmul_cpu (const float *A, const float *B, __output float *C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task_implementation ("cpu", matmul))); static void matmul_cpu (const float *A, const float *B, __output float *C, unsigned nx, unsigned ny, unsigned nz) @{ unsigned i, j, k; for (j = 0; j < ny; j++) for (i = 0; i < nx; i++) @{ for (k = 0; k < nz; k++) C[j * nx + i] += A[j * nz + k] * B[k * nx + i]; @} @} @end smallexample @end cartouche @noindent A @code{matmult} task is defined; it has only one implementation, @code{matmult_cpu}, which runs on the CPU. Variables @var{A} and @var{B} are input buffers, whereas @var{C} is considered an input/output buffer. @cindex implicit task CPU implementation For convenience, when a function declared with the @code{task} attribute has a user-defined body, that body is assumed to be that of the CPU implementation of a task, which we call an @dfn{implicit task CPU implementation}. Thus, the above snippet can be simplified like this: @cartouche @smallexample #define __output __attribute__ ((output)) static void matmul (const float *A, const float *B, __output float *C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task)); /* Implicit definition of the CPU implementation of the `matmul' task. */ static void matmul (const float *A, const float *B, __output float *C, unsigned nx, unsigned ny, unsigned nz) @{ unsigned i, j, k; for (j = 0; j < ny; j++) for (i = 0; i < nx; i++) @{ for (k = 0; k < nz; k++) C[j * nx + i] += A[j * nz + k] * B[k * nx + i]; @} @} @end smallexample @end cartouche @noindent Use of implicit CPU task implementations as above has the advantage that the code is valid sequential code when StarPU's GCC plug-in is not used (@pxref{Conditional Extensions}). CUDA and OpenCL implementations can be declared in a similar way: @cartouche @smallexample static void matmul_cuda (const float *A, const float *B, float *C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task_implementation ("cuda", matmul))); static void matmul_opencl (const float *A, const float *B, float *C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task_implementation ("opencl", matmul))); @end smallexample @end cartouche @noindent The CUDA and OpenCL implementations typically either invoke a kernel written in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and @pxref{OpenCL Kernel}), or call a library function that uses CUDA or OpenCL under the hood, such as CUBLAS functions: @cartouche @smallexample static void matmul_cuda (const float *A, const float *B, float *C, unsigned nx, unsigned ny, unsigned nz) @{ cublasSgemm ('n', 'n', nx, ny, nz, 1.0f, A, 0, B, 0, 0.0f, C, 0); cudaStreamSynchronize (starpu_cuda_get_local_stream ()); @} @end smallexample @end cartouche A task can be invoked like a regular C function: @cartouche @smallexample matmul (&A[i * zdim * bydim + k * bzdim * bydim], &B[k * xdim * bzdim + j * bxdim * bzdim], &C[i * xdim * bydim + j * bxdim * bydim], bxdim, bydim, bzdim); @end smallexample @end cartouche @noindent This leads to an @dfn{asynchronous invocation}, whereby @code{matmult}'s implementation may run in parallel with the continuation of the caller. The next section describes how memory buffers must be handled in StarPU-GCC code. For a complete example, see the @code{gcc-plugin/examples} directory of the source distribution, and @ref{Vector Scaling Using the C Extension, the vector-scaling example}. @node Synchronization and Other Pragmas @section Initialization, Termination, and Synchronization The following pragmas allow user code to control StarPU's life time and to synchronize with tasks. @table @code @item #pragma starpu initialize Initialize StarPU. This call is compulsory and is @emph{never} added implicitly. One of the reasons this has to be done explicitly is that it provides greater control to user code over its resource usage. @item #pragma starpu shutdown Shut down StarPU, giving it an opportunity to write profiling info to a file on disk, for instance (@pxref{Off-line, off-line performance feedback}). @item #pragma starpu wait Wait for all task invocations to complete, as with @code{starpu_wait_for_all} (@pxref{Codelets and Tasks, starpu_wait_for_all}). @end table @node Registered Data Buffers @section Registered Data Buffers Data buffers such as matrices and vectors that are to be passed to tasks must be @dfn{registered}. Registration allows StarPU to handle data transfers among devices---e.g., transferring an input buffer from the CPU's main memory to a task scheduled to run a GPU (@pxref{StarPU Data Management Library}). The following pragmas are provided: @table @code @item #pragma starpu register @var{ptr} [@var{size}] Register @var{ptr} as a @var{size}-element buffer. When @var{ptr} has an array type whose size is known, @var{size} may be omitted. Alternatively, the @code{registered} attribute can be used (see below.) @item #pragma starpu unregister @var{ptr} Unregister the previously-registered memory area pointed to by @var{ptr}. As a side-effect, @var{ptr} points to a valid copy in main memory. @item #pragma starpu acquire @var{ptr} Acquire in main memory an up-to-date copy of the previously-registered memory area pointed to by @var{ptr}, for read-write access. @item #pragma starpu release @var{ptr} Release the previously-register memory area pointed to by @var{ptr}, making it available to the tasks. @end table Additionally, the following attributes offer a simple way to allocate and register storage for arrays: @table @code @item registered @cindex @code{registered} attribute This attributes applies to local variables with an array type. Its effect is to automatically register the array's storage, as per @code{#pragma starpu register}. The array is automatically unregistered when the variable's scope is left. This attribute is typically used in conjunction with the @code{heap_allocated} attribute, described below. @item heap_allocated @cindex @code{heap_allocated} attribute This attributes applies to local variables with an array type. Its effect is to automatically allocate the array's storage on the heap, using @code{starpu_malloc} under the hood (@pxref{Basic Data Management API, starpu_malloc}). The heap-allocated array is automatically freed when the variable's scope is left, as with automatic variables. @end table @noindent The following example illustrates use of the @code{heap_allocated} attribute: @example extern void cholesky(unsigned nblocks, unsigned size, float mat[nblocks][nblocks][size]) __attribute__ ((task)); int main (int argc, char *argv[]) @{ #pragma starpu initialize /* ... */ int nblocks, size; parse_args (&nblocks, &size); /* Allocate an array of the required size on the heap, and register it. */ @{ float matrix[nblocks][nblocks][size] __attribute__ ((heap_allocated, registered)); cholesky (nblocks, size, matrix); #pragma starpu wait @} /* MATRIX is automatically unregistered & freed here. */ #pragma starpu shutdown return EXIT_SUCCESS; @} @end example @node Conditional Extensions @section Using C Extensions Conditionally The C extensions described in this chapter are only available when GCC and its StarPU plug-in are in use. Yet, it is possible to make use of these extensions when they are available---leading to hybrid CPU/GPU code---and discard them when they are not available---leading to valid sequential code. To that end, the GCC plug-in defines a C preprocessor macro when it is being used: @defmac STARPU_GCC_PLUGIN Defined for code being compiled with the StarPU GCC plug-in. When defined, this macro expands to an integer denoting the version of the supported C extensions. @end defmac The code below illustrates how to define a task and its implementations in a way that allows it to be compiled without the GCC plug-in: @smallexample /* This program is valid, whether or not StarPU's GCC plug-in is being used. */ #include /* The attribute below is ignored when GCC is not used. */ static void matmul (const float *A, const float *B, float * C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task)); static void matmul (const float *A, const float *B, float * C, unsigned nx, unsigned ny, unsigned nz) @{ /* Code of the CPU kernel here... */ @} #ifdef STARPU_GCC_PLUGIN /* Optional OpenCL task implementation. */ static void matmul_opencl (const float *A, const float *B, float * C, unsigned nx, unsigned ny, unsigned nz) __attribute__ ((task_implementation ("opencl", matmul))); static void matmul_opencl (const float *A, const float *B, float * C, unsigned nx, unsigned ny, unsigned nz) @{ /* Code that invokes the OpenCL kernel here... */ @} #endif int main (int argc, char *argv[]) @{ /* The pragmas below are simply ignored when StarPU-GCC is not used. */ #pragma starpu initialize float A[123][42][7], B[123][42][7], C[123][42][7]; #pragma starpu register A #pragma starpu register B #pragma starpu register C /* When StarPU-GCC is used, the call below is asynchronous; otherwise, it is synchronous. */ matmul ((float *) A, (float *) B, (float *) C, 123, 42, 7); #pragma starpu wait #pragma starpu shutdown return EXIT_SUCCESS; @} @end smallexample @noindent The above program is a valid StarPU program when StarPU's GCC plug-in is used; it is also a valid sequential program when the plug-in is not used. Note that attributes such as @code{task} as well as @code{starpu} pragmas are simply ignored by GCC when the StarPU plug-in is not loaded. However, @command{gcc -Wall} emits a warning for unknown attributes and pragmas, which can be inconvenient. In addition, other compilers may be unable to parse the attribute syntax@footnote{In practice, Clang and several proprietary compilers implement attributes.}, so you may want to wrap attributes in macros like this: @smallexample /* Use the `task' attribute only when StarPU's GCC plug-in is available. */ #ifdef STARPU_GCC_PLUGIN # define __task __attribute__ ((task)) #else # define __task #endif static void matmul (const float *A, const float *B, float *C, unsigned nx, unsigned ny, unsigned nz) __task; @end smallexample @c Local Variables: @c TeX-master: "../starpu.texi" @c ispell-local-dictionary: "american" @c End: