123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359 |
- /*
- * This file is part of the StarPU Handbook.
- * Copyright (C) 2009--2011 Universit@'e de Bordeaux
- * Copyright (C) 2010, 2011, 2012, 2013 Centre National de la Recherche Scientifique
- * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
- * See the file version.doxy for copying conditions.
- */
- /*! \page cExtensions C Extensions
- When GCC plug-in support is available, StarPU builds a plug-in for the
- GNU Compiler Collection (GCC), which defines extensions to languages of
- the C family (C, C++, Objective-C) that make it easier to write StarPU
- code. This feature is only available for GCC 4.5 and later; it
- is known to work with GCC 4.5, 4.6, and 4.7. You
- may need to install a specific <c>-dev</c> package of your distro, such
- as <c>gcc-4.6-plugin-dev</c> on Debian and derivatives. In addition,
- the plug-in's test suite is only run when <a href="http://www.gnu.org/software/guile/">GNU Guile</a> is found at
- <c>configure</c>-time. Building the GCC plug-in
- can be disabled by configuring with \ref disable-gcc-extensions "--disable-gcc-extensions".
- Those extensions include syntactic sugar for defining
- tasks and their implementations, invoking a task, and manipulating data
- buffers. Use of these extensions can be made conditional on the
- availability of the plug-in, leading to valid C sequential code when the
- plug-in is not used (\ref UsingCExtensionsConditionally).
- When StarPU has been installed with its GCC plug-in, programs that use
- these extensions can be compiled this way:
- \verbatim
- $ gcc -c -fplugin=`pkg-config starpu-1.2 --variable=gccplugin` foo.c
- \endverbatim
- When the plug-in is not available, the above <c>pkg-config</c>
- command returns the empty string.
- In addition, the <c>-fplugin-arg-starpu-verbose</c> flag can be used to
- obtain feedback from the compiler as it analyzes the C extensions used
- in source files.
- This section describes the C extensions implemented by StarPU's GCC
- plug-in. It does not require detailed knowledge of the StarPU library.
- Note: this is still an area under development and subject to change.
- \section DefiningTasks Defining Tasks
- The StarPU GCC plug-in views tasks as ``extended'' C functions:
- <ul>
- <Li>
- tasks may have several implementations---e.g., one for CPUs, one written
- in OpenCL, one written in CUDA;
- </li>
- <Li>
- tasks may have several implementations of the same target---e.g.,
- several CPU implementations;
- </li>
- <li>
- when a task is invoked, it may run in parallel, and StarPU is free to
- choose any of its implementations.
- </li>
- </ul>
- Tasks and their implementations must be <em>declared</em>. These
- declarations are annotated with attributes
- (http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute-Syntax):
- the declaration of a task is a regular C function declaration with an
- additional <c>task</c> attribute, and task implementations are
- declared with a <c>task_implementation</c> attribute.
- The following function attributes are provided:
- <dl>
- <dt><c>task</c></dt>
- <dd>
- Declare the given function as a StarPU task. Its return type must be
- <c>void</c>. When a function declared as <c>task</c> has a user-defined
- body, that body is interpreted as the implicit definition of the
- task's CPU implementation (see example below). In all cases, the
- actual definition of a task's body is automatically generated by the
- compiler.
- Under the hood, declaring a task leads to the declaration of the
- corresponding <c>codelet</c> (\ref CodeletAndTasks). If one or
- more task implementations are declared in the same compilation unit,
- then the codelet and the function itself are also defined; they inherit
- the scope of the task.
- Scalar arguments to the task are passed by value and copied to the
- target device if need be---technically, they are passed as the buffer
- starpu_task::cl_arg (\ref CodeletAndTasks).
- Pointer arguments are assumed to be registered data buffers---the
- handles argument of a task (starpu_task::handles) ; <c>const</c>-qualified
- pointer arguments are viewed as read-only buffers (::STARPU_R), and
- non-<c>const</c>-qualified buffers are assumed to be used read-write
- (::STARPU_RW). In addition, the <c>output</c> type attribute can be
- as a type qualifier for output pointer or array parameters
- (::STARPU_W).
- </dd>
- <dt><c>task_implementation (target, task)</c></dt>
- <dd>
- Declare the given function as an implementation of <c>task</c> to run on
- <c>target</c>. <c>target</c> must be a string, currently one of
- <c>"cpu"</c>, <c>"opencl"</c>, or <c>"cuda"</c>.
- // FIXME: Update when OpenCL support is ready.
- </dd>
- </dl>
- Here is an example:
- \code{.c}
- #define __output __attribute__ ((output))
- static void matmul (const float *A, const float *B,
- __output float *C,
- unsigned nx, unsigned ny, unsigned nz)
- __attribute__ ((task));
- static void matmul_cpu (const float *A, const float *B,
- __output float *C,
- unsigned nx, unsigned ny, unsigned nz)
- __attribute__ ((task_implementation ("cpu", matmul)));
- static void
- matmul_cpu (const float *A, const float *B, __output float *C,
- unsigned nx, unsigned ny, unsigned nz)
- {
- unsigned i, j, k;
- for (j = 0; j < ny; j++)
- for (i = 0; i < nx; i++)
- {
- for (k = 0; k < nz; k++)
- C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
- }
- }
- \endcode
- A <c>matmult</c> task is defined; it has only one implementation,
- <c>matmult_cpu</c>, which runs on the CPU. Variables <c>A</c> and
- <c>B</c> are input buffers, whereas <c>C</c> is considered an input/output
- buffer.
- For convenience, when a function declared with the <c>task</c> attribute
- has a user-defined body, that body is assumed to be that of the CPU
- implementation of a task, which we call an implicit task CPU
- implementation. Thus, the above snippet can be simplified like this:
- \code{.c}
- #define __output __attribute__ ((output))
- static void matmul (const float *A, const float *B,
- __output float *C,
- unsigned nx, unsigned ny, unsigned nz)
- __attribute__ ((task));
- /* Implicit definition of the CPU implementation of the
- `matmul' task. */
- static void
- matmul (const float *A, const float *B, __output float *C,
- unsigned nx, unsigned ny, unsigned nz)
- {
- unsigned i, j, k;
- for (j = 0; j < ny; j++)
- for (i = 0; i < nx; i++)
- {
- for (k = 0; k < nz; k++)
- C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
- }
- }
- \endcode
- Use of implicit CPU task implementations as above has the advantage that
- the code is valid sequential code when StarPU's GCC plug-in is not used
- (\ref UsingCExtensionsConditionally).
- CUDA and OpenCL implementations can be declared in a similar way:
- \code{.c}
- static void matmul_cuda (const float *A, const float *B, float *C,
- unsigned nx, unsigned ny, unsigned nz)
- __attribute__ ((task_implementation ("cuda", matmul)));
- static void matmul_opencl (const float *A, const float *B, float *C,
- unsigned nx, unsigned ny, unsigned nz)
- __attribute__ ((task_implementation ("opencl", matmul)));
- \endcode
- The CUDA and OpenCL implementations typically either invoke a kernel
- written in CUDA or OpenCL (for similar code, \ref CUDAKernel, and
- \ref OpenCLKernel), or call a library function that uses CUDA or
- OpenCL under the hood, such as CUBLAS functions:
- \code{.c}
- static void
- matmul_cuda (const float *A, const float *B, float *C,
- unsigned nx, unsigned ny, unsigned nz)
- {
- cublasSgemm ('n', 'n', nx, ny, nz,
- 1.0f, A, 0, B, 0,
- 0.0f, C, 0);
- cudaStreamSynchronize (starpu_cuda_get_local_stream ());
- }
- \endcode
- A task can be invoked like a regular C function:
- \code{.c}
- matmul (&A[i * zdim * bydim + k * bzdim * bydim],
- &B[k * xdim * bzdim + j * bxdim * bzdim],
- &C[i * xdim * bydim + j * bxdim * bydim],
- bxdim, bydim, bzdim);
- \endcode
- This leads to an asynchronous invocation, whereby <c>matmult</c>'s
- implementation may run in parallel with the continuation of the caller.
- The next section describes how memory buffers must be handled in
- StarPU-GCC code. For a complete example, see the
- <c>gcc-plugin/examples</c> directory of the source distribution, and
- \ref VectorScalingUsingTheCExtension.
- \section InitializationTerminationAndSynchronization Initialization, Termination, and Synchronization
- The following pragmas allow user code to control StarPU's life time and
- to synchronize with tasks.
- <dl>
- <dt><c>\#pragma starpu initialize</c></dt>
- <dd>
- Initialize StarPU. This call is compulsory and is <em>never</em> added
- implicitly. One of the reasons this has to be done explicitly is that
- it provides greater control to user code over its resource usage.
- </dd>
- <dt><c>\#pragma starpu shutdown</c></dt>
- <dd>
- Shut down StarPU, giving it an opportunity to write profiling info to a
- file on disk, for instance (\ref Off-linePerformanceFeedback).
- </dd>
- <dt><c>\#pragma starpu wait</c></dt>
- <dd>
- Wait for all task invocations to complete, as with
- starpu_task_wait_for_all().
- </dd>
- </dl>
- \section RegisteredDataBuffers Registered Data Buffers
- Data buffers such as matrices and vectors that are to be passed to tasks
- must be registered. Registration allows StarPU to handle data
- transfers among devices---e.g., transferring an input buffer from the
- CPU's main memory to a task scheduled to run a GPU (\ref StarPUDataManagementLibrary).
- The following pragmas are provided:
- <dl>
- <dt><c>\#pragma starpu register ptr [size]</c></dt>
- <dd>
- Register <c>ptr</c> as a <c>size</c>-element buffer. When <c>ptr</c> has
- an array type whose size is known, <c>size</c> may be omitted.
- Alternatively, the <c>registered</c> attribute can be used (see below.)
- </dd>
- <dt><c>\#pragma starpu unregister ptr</c></dt>
- <dd>
- Unregister the previously-registered memory area pointed to by
- <c>ptr</c>. As a side-effect, <c>ptr</c> points to a valid copy in main
- memory.
- </dd>
- <dt><c>\#pragma starpu acquire ptr</c></dt>
- <dd>
- Acquire in main memory an up-to-date copy of the previously-registered
- memory area pointed to by <c>ptr</c>, for read-write access.
- </dd>
- <dt><c>\#pragma starpu release ptr</c></dt>
- <dd>
- Release the previously-register memory area pointed to by <c>ptr</c>,
- making it available to the tasks.
- </dd>
- </dl>
- Additionally, the following attributes offer a simple way to allocate
- and register storage for arrays:
- <dl>
- <dt><c>registered</c></dt>
- <dd>
- This attributes applies to local variables with an array type. Its
- effect is to automatically register the array's storage, as per
- <c>\#pragma starpu register</c>. The array is automatically unregistered
- when the variable's scope is left. This attribute is typically used in
- conjunction with the <c>heap_allocated</c> attribute, described below.
- </dd>
- <dt><c>heap_allocated</c></dt>
- <dd>
- This attributes applies to local variables with an array type. Its
- effect is to automatically allocate the array's storage on
- the heap, using starpu_malloc() under the hood. The heap-allocated array is automatically
- freed when the variable's scope is left, as with
- automatic variables.
- </dd>
- </dl>
- The following example illustrates use of the <c>heap_allocated</c>
- attribute:
- \snippet cholesky_pragma.c To be included. You should update doxygen if you see this text.
- \section UsingCExtensionsConditionally Using C Extensions Conditionally
- The C extensions described in this chapter are only available when GCC
- and its StarPU plug-in are in use. Yet, it is possible to make use of
- these extensions when they are available---leading to hybrid CPU/GPU
- code---and discard them when they are not available---leading to valid
- sequential code.
- To that end, the GCC plug-in defines the C preprocessor macro ---
- <c>STARPU_GCC_PLUGIN</c> --- when it is being used. When defined, this
- macro expands to an integer denoting the version of the supported C
- extensions.
- The code below illustrates how to define a task and its implementations
- in a way that allows it to be compiled without the GCC plug-in:
- \snippet matmul_pragma.c To be included. You should update doxygen if you see this text.
- The above program is a valid StarPU program when StarPU's GCC plug-in is
- used; it is also a valid sequential program when the plug-in is not
- used.
- Note that attributes such as <c>task</c> as well as <c>starpu</c>
- pragmas are simply ignored by GCC when the StarPU plug-in is not loaded.
- However, <c>gcc -Wall</c> emits a warning for unknown attributes and
- pragmas, which can be inconvenient. In addition, other compilers may be
- unable to parse the attribute syntax (In practice, Clang and
- several proprietary compilers implement attributes.), so you may want to
- wrap attributes in macros like this:
- \snippet matmul_pragma2.c To be included. You should update doxygen if you see this text.
- */
|