c-extensions.texi 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  4. @c See the file starpu.texi for copying conditions.
  5. @cindex C extensions
  6. @cindex GCC plug-in
  7. When GCC plug-in support is available, StarPU builds a plug-in for the
  8. GNU Compiler Collection (GCC), which defines extensions to languages of
  9. the C family (C, C++, Objective-C) that make it easier to write StarPU
  10. code@footnote{This feature is only available for GCC 4.5 and later; it
  11. is known to work with GCC 4.5, 4.6, and 4.7. You
  12. may need to install a specific @code{-dev} package of your distro, such
  13. as @code{gcc-4.6-plugin-dev} on Debian and derivatives. In addition,
  14. the plug-in's test suite is only run when
  15. @url{http://www.gnu.org/software/guile/, GNU@tie{}Guile} is found at
  16. @code{configure}-time. Building the GCC plug-in
  17. can be disabled by configuring with @code{--disable-gcc-extensions}.}.
  18. Those extensions include syntactic sugar for defining
  19. tasks and their implementations, invoking a task, and manipulating data
  20. buffers. Use of these extensions can be made conditional on the
  21. availability of the plug-in, leading to valid C sequential code when the
  22. plug-in is not used (@pxref{Conditional Extensions}).
  23. When StarPU has been installed with its GCC plug-in, programs that use
  24. these extensions can be compiled this way:
  25. @example
  26. $ gcc -c -fplugin=`pkg-config starpu-1.0 --variable=gccplugin` foo.c
  27. @end example
  28. @noindent
  29. When the plug-in is not available, the above @command{pkg-config}
  30. command returns the empty string.
  31. In addition, the @code{-fplugin-arg-starpu-verbose} flag can be used to
  32. obtain feedback from the compiler as it analyzes the C extensions used
  33. in source files.
  34. This section describes the C extensions implemented by StarPU's GCC
  35. plug-in. It does not require detailed knowledge of the StarPU library.
  36. Note: as of StarPU @value{VERSION}, this is still an area under
  37. development and subject to change.
  38. @menu
  39. * Defining Tasks:: Defining StarPU tasks
  40. * Synchronization and Other Pragmas:: Synchronization, and more.
  41. * Registered Data Buffers:: Manipulating data buffers
  42. * Conditional Extensions:: Using C extensions only when available
  43. @end menu
  44. @node Defining Tasks
  45. @section Defining Tasks
  46. @cindex task
  47. @cindex task implementation
  48. The StarPU GCC plug-in views @dfn{tasks} as ``extended'' C functions:
  49. @enumerate
  50. @item
  51. tasks may have several implementations---e.g., one for CPUs, one written
  52. in OpenCL, one written in CUDA;
  53. @item
  54. tasks may have several implementations of the same target---e.g.,
  55. several CPU implementations;
  56. @item
  57. when a task is invoked, it may run in parallel, and StarPU is free to
  58. choose any of its implementations.
  59. @end enumerate
  60. Tasks and their implementations must be @emph{declared}. These
  61. declarations are annotated with @dfn{attributes} (@pxref{Attribute
  62. Syntax, attributes in GNU C,, gcc, Using the GNU Compiler Collection
  63. (GCC)}): the declaration of a task is a regular C function declaration
  64. with an additional @code{task} attribute, and task implementations are
  65. declared with a @code{task_implementation} attribute.
  66. The following function attributes are provided:
  67. @table @code
  68. @item task
  69. @cindex @code{task} attribute
  70. Declare the given function as a StarPU task. Its return type must be
  71. @code{void}, and it must not be defined---instead, a definition will
  72. automatically be provided by the compiler.
  73. Under the hood, declaring a task leads to the declaration of the
  74. corresponding @code{codelet} (@pxref{Codelet and Tasks}). If one or
  75. more task implementations are declared in the same compilation unit,
  76. then the codelet and the function itself are also defined; they inherit
  77. the scope of the task.
  78. Scalar arguments to the task are passed by value and copied to the
  79. target device if need be---technically, they are passed as the
  80. @code{cl_arg} buffer (@pxref{Codelets and Tasks, @code{cl_arg}}).
  81. @cindex @code{output} type attribute
  82. Pointer arguments are assumed to be registered data buffers---the
  83. @code{buffers} argument of a task (@pxref{Codelets and Tasks,
  84. @code{buffers}}); @code{const}-qualified pointer arguments are viewed as
  85. read-only buffers (@code{STARPU_R}), and non-@code{const}-qualified
  86. buffers are assumed to be used read-write (@code{STARPU_RW}). In
  87. addition, the @code{output} type attribute can be as a type qualifier
  88. for output pointer or array parameters (@code{STARPU_W}).
  89. @item task_implementation (@var{target}, @var{task})
  90. @cindex @code{task_implementation} attribute
  91. Declare the given function as an implementation of @var{task} to run on
  92. @var{target}. @var{target} must be a string, currently one of
  93. @code{"cpu"}, @code{"opencl"}, or @code{"cuda"}.
  94. @c FIXME: Update when OpenCL support is ready.
  95. @end table
  96. Here is an example:
  97. @cartouche
  98. @smallexample
  99. #define __output __attribute__ ((output))
  100. static void matmul (const float *A, const float *B,
  101. __output float *C,
  102. unsigned nx, unsigned ny, unsigned nz)
  103. __attribute__ ((task));
  104. static void matmul_cpu (const float *A, const float *B,
  105. __output float *C,
  106. unsigned nx, unsigned ny, unsigned nz)
  107. __attribute__ ((task_implementation ("cpu", matmul)));
  108. static void
  109. matmul_cpu (const float *A, const float *B, __output float *C,
  110. unsigned nx, unsigned ny, unsigned nz)
  111. @{
  112. unsigned i, j, k;
  113. for (j = 0; j < ny; j++)
  114. for (i = 0; i < nx; i++)
  115. @{
  116. for (k = 0; k < nz; k++)
  117. C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
  118. @}
  119. @}
  120. @end smallexample
  121. @end cartouche
  122. @noindent
  123. A @code{matmult} task is defined; it has only one implementation,
  124. @code{matmult_cpu}, which runs on the CPU. Variables @var{A} and
  125. @var{B} are input buffers, whereas @var{C} is considered an input/output
  126. buffer.
  127. CUDA and OpenCL implementations can be declared in a similar way:
  128. @cartouche
  129. @smallexample
  130. static void matmul_cuda (const float *A, const float *B, float *C,
  131. unsigned nx, unsigned ny, unsigned nz)
  132. __attribute__ ((task_implementation ("cuda", matmul)));
  133. static void matmul_opencl (const float *A, const float *B, float *C,
  134. unsigned nx, unsigned ny, unsigned nz)
  135. __attribute__ ((task_implementation ("opencl", matmul)));
  136. @end smallexample
  137. @end cartouche
  138. @noindent
  139. The CUDA and OpenCL implementations typically either invoke a kernel
  140. written in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and
  141. @pxref{OpenCL Kernel}), or call a library function that uses CUDA or
  142. OpenCL under the hood, such as CUBLAS functions:
  143. @cartouche
  144. @smallexample
  145. static void
  146. matmul_cuda (const float *A, const float *B, float *C,
  147. unsigned nx, unsigned ny, unsigned nz)
  148. @{
  149. cublasSgemm ('n', 'n', nx, ny, nz,
  150. 1.0f, A, 0, B, 0,
  151. 0.0f, C, 0);
  152. cudaStreamSynchronize (starpu_cuda_get_local_stream ());
  153. @}
  154. @end smallexample
  155. @end cartouche
  156. A task can be invoked like a regular C function:
  157. @cartouche
  158. @smallexample
  159. matmul (&A[i * zdim * bydim + k * bzdim * bydim],
  160. &B[k * xdim * bzdim + j * bxdim * bzdim],
  161. &C[i * xdim * bydim + j * bxdim * bydim],
  162. bxdim, bydim, bzdim);
  163. @end smallexample
  164. @end cartouche
  165. @noindent
  166. This leads to an @dfn{asynchronous invocation}, whereby @code{matmult}'s
  167. implementation may run in parallel with the continuation of the caller.
  168. The next section describes how memory buffers must be handled in
  169. StarPU-GCC code. For a complete example, see the
  170. @code{gcc-plugin/examples} directory of the source distribution, and
  171. @ref{Vector Scaling Using the C Extension, the vector-scaling
  172. example}.
  173. @node Synchronization and Other Pragmas
  174. @section Initialization, Termination, and Synchronization
  175. The following pragmas allow user code to control StarPU's life time and
  176. to synchronize with tasks.
  177. @table @code
  178. @item #pragma starpu initialize
  179. Initialize StarPU. This call is compulsory and is @emph{never} added
  180. implicitly. One of the reasons this has to be done explicitly is that
  181. it provides greater control to user code over its resource usage.
  182. @item #pragma starpu shutdown
  183. Shut down StarPU, giving it an opportunity to write profiling info to a
  184. file on disk, for instance (@pxref{Off-line, off-line performance
  185. feedback}).
  186. @item #pragma starpu wait
  187. Wait for all task invocations to complete, as with
  188. @code{starpu_wait_for_all} (@pxref{Codelets and Tasks,
  189. starpu_wait_for_all}).
  190. @end table
  191. @node Registered Data Buffers
  192. @section Registered Data Buffers
  193. Data buffers such as matrices and vectors that are to be passed to tasks
  194. must be @dfn{registered}. Registration allows StarPU to handle data
  195. transfers among devices---e.g., transferring an input buffer from the
  196. CPU's main memory to a task scheduled to run a GPU (@pxref{StarPU Data
  197. Management Library}).
  198. The following pragmas are provided:
  199. @table @code
  200. @item #pragma starpu register @var{ptr} [@var{size}]
  201. Register @var{ptr} as a @var{size}-element buffer. When @var{ptr} has
  202. an array type whose size is known, @var{size} may be omitted.
  203. @item #pragma starpu unregister @var{ptr}
  204. Unregister the previously-registered memory area pointed to by
  205. @var{ptr}. As a side-effect, @var{ptr} points to a valid copy in main
  206. memory.
  207. @item #pragma starpu acquire @var{ptr}
  208. Acquire in main memory an up-to-date copy of the previously-registered
  209. memory area pointed to by @var{ptr}, for read-write access.
  210. @item #pragma starpu release @var{ptr}
  211. Release the previously-register memory area pointed to by @var{ptr},
  212. making it available to the tasks.
  213. @end table
  214. Additionally, the @code{heap_allocated} variable attribute offers a
  215. simple way to allocate storage for arrays on the heap:
  216. @table @code
  217. @item heap_allocated
  218. @cindex @code{heap_allocated} attribute
  219. This attributes applies to local variables with an array type. Its
  220. effect is to automatically allocate the array's storage on
  221. the heap, using @code{starpu_malloc} under the hood (@pxref{Basic Data
  222. Library API, starpu_malloc}). The heap-allocated array is automatically
  223. freed when the variable's scope is left, as with
  224. automatic variables.
  225. @end table
  226. @noindent
  227. The following example illustrates use of the @code{heap_allocated}
  228. attribute:
  229. @example
  230. extern void cholesky(unsigned nblocks, unsigned size,
  231. float mat[nblocks][nblocks][size])
  232. __attribute__ ((task));
  233. int
  234. main (int argc, char *argv[])
  235. @{
  236. #pragma starpu initialize
  237. /* ... */
  238. int nblocks, size;
  239. parse_args (&nblocks, &size);
  240. /* Allocate an array of the required size on the heap,
  241. and register it. */
  242. @{
  243. float matrix[nblocks][nblocks][size]
  244. __attribute__ ((heap_allocated));
  245. #pragma starpu register matrix
  246. cholesky (nblocks, size, matrix);
  247. #pragma starpu wait
  248. #pragma starpu unregister matrix
  249. @} /* MATRIX is automatically freed here. */
  250. #pragma starpu shutdown
  251. return EXIT_SUCCESS;
  252. @}
  253. @end example
  254. @node Conditional Extensions
  255. @section Using C Extensions Conditionally
  256. The C extensions described in this chapter are only available when GCC
  257. and its StarPU plug-in are in use. Yet, it is possible to make use of
  258. these extensions when they are available---leading to hybrid CPU/GPU
  259. code---and discard them when they are not available---leading to valid
  260. sequential code.
  261. To that end, the GCC plug-in defines a C preprocessor macro when it is
  262. being used:
  263. @defmac STARPU_GCC_PLUGIN
  264. Defined for code being compiled with the StarPU GCC plug-in. When
  265. defined, this macro expands to an integer denoting the version of the
  266. supported C extensions.
  267. @end defmac
  268. The code below illustrates how to define a task and its implementations
  269. in a way that allows it to be compiled without the GCC plug-in:
  270. @smallexample
  271. /* The macros below abstract over the attributes specific to
  272. StarPU-GCC and the name of the CPU implementation. */
  273. #ifdef STARPU_GCC_PLUGIN
  274. # define __task __attribute__ ((task))
  275. # define CPU_TASK_IMPL(task) task ## _cpu
  276. #else
  277. # define __task
  278. # define CPU_TASK_IMPL(task) task
  279. #endif
  280. #include <stdlib.h>
  281. static void matmul (const float *A, const float *B, float *C,
  282. unsigned nx, unsigned ny, unsigned nz) __task;
  283. #ifdef STARPU_GCC_PLUGIN
  284. static void matmul_cpu (const float *A, const float *B, float *C,
  285. unsigned nx, unsigned ny, unsigned nz)
  286. __attribute__ ((task_implementation ("cpu", matmul)));
  287. #endif
  288. static void
  289. CPU_TASK_IMPL (matmul) (const float *A, const float *B, float *C,
  290. unsigned nx, unsigned ny, unsigned nz)
  291. @{
  292. /* Code of the CPU kernel here... */
  293. @}
  294. int
  295. main (int argc, char *argv[])
  296. @{
  297. /* The pragmas below are simply ignored when StarPU-GCC
  298. is not used. */
  299. #pragma starpu initialize
  300. float A[123][42][7], B[123][42][7], C[123][42][7];
  301. #pragma starpu register A
  302. #pragma starpu register B
  303. #pragma starpu register C
  304. /* When StarPU-GCC is used, the call below is asynchronous;
  305. otherwise, it is synchronous. */
  306. matmul (A, B, C, 123, 42, 7);
  307. #pragma starpu wait
  308. #pragma starpu shutdown
  309. return EXIT_SUCCESS;
  310. @}
  311. @end smallexample
  312. Note that attributes such as @code{task} are simply ignored by GCC when
  313. the StarPU plug-in is not loaded, so the @code{__task} macro could be
  314. omitted altogether. However, @command{gcc -Wall} emits a warning for
  315. unknown attributes, which can be inconvenient, and other compilers may
  316. be unable to parse the attribute syntax. Thus, using macros such as
  317. @code{__task} above is recommended.
  318. @c Local Variables:
  319. @c TeX-master: "../starpu.texi"
  320. @c ispell-local-dictionary: "american"
  321. @c End: