c-extensions.texi 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  4. @c See the file starpu.texi for copying conditions.
  5. @cindex C extensions
  6. @cindex GCC plug-in
  7. When GCC plug-in support is available, StarPU builds a plug-in for the
  8. GNU Compiler Collection (GCC), which defines extensions to languages of
  9. the C family (C, C++, Objective-C) that make it easier to write StarPU
  10. code@footnote{This feature is only available for GCC 4.5 and later; it
  11. is known to work with GCC 4.5, 4.6, and 4.7. You
  12. may need to install a specific @code{-dev} package of your distro, such
  13. as @code{gcc-4.6-plugin-dev} on Debian and derivatives. In addition,
  14. the plug-in's test suite is only run when
  15. @url{http://www.gnu.org/software/guile/, GNU@tie{}Guile} is found at
  16. @code{configure}-time. Building the GCC plug-in
  17. can be disabled by configuring with @code{--disable-gcc-extensions}.}.
  18. Those extensions include syntactic sugar for defining
  19. tasks and their implementations, invoking a task, and manipulating data
  20. buffers. Use of these extensions can be made conditional on the
  21. availability of the plug-in, leading to valid C sequential code when the
  22. plug-in is not used (@pxref{Conditional Extensions}).
  23. When StarPU has been installed with its GCC plug-in, programs that use
  24. these extensions can be compiled this way:
  25. @example
  26. $ gcc -c -fplugin=`pkg-config starpu-1.1 --variable=gccplugin` foo.c
  27. @end example
  28. @noindent
  29. When the plug-in is not available, the above @command{pkg-config}
  30. command returns the empty string.
  31. In addition, the @code{-fplugin-arg-starpu-verbose} flag can be used to
  32. obtain feedback from the compiler as it analyzes the C extensions used
  33. in source files.
  34. This section describes the C extensions implemented by StarPU's GCC
  35. plug-in. It does not require detailed knowledge of the StarPU library.
  36. Note: as of StarPU @value{VERSION}, this is still an area under
  37. development and subject to change.
  38. @menu
  39. * Defining Tasks:: Defining StarPU tasks
  40. * Synchronization and Other Pragmas:: Synchronization, and more.
  41. * Registered Data Buffers:: Manipulating data buffers
  42. * Conditional Extensions:: Using C extensions only when available
  43. @end menu
  44. @node Defining Tasks
  45. @section Defining Tasks
  46. @cindex task
  47. @cindex task implementation
  48. The StarPU GCC plug-in views @dfn{tasks} as ``extended'' C functions:
  49. @enumerate
  50. @item
  51. tasks may have several implementations---e.g., one for CPUs, one written
  52. in OpenCL, one written in CUDA;
  53. @item
  54. tasks may have several implementations of the same target---e.g.,
  55. several CPU implementations;
  56. @item
  57. when a task is invoked, it may run in parallel, and StarPU is free to
  58. choose any of its implementations.
  59. @end enumerate
  60. Tasks and their implementations must be @emph{declared}. These
  61. declarations are annotated with @dfn{attributes} (@pxref{Attribute
  62. Syntax, attributes in GNU C,, gcc, Using the GNU Compiler Collection
  63. (GCC)}): the declaration of a task is a regular C function declaration
  64. with an additional @code{task} attribute, and task implementations are
  65. declared with a @code{task_implementation} attribute.
  66. The following function attributes are provided:
  67. @table @code
  68. @item task
  69. @cindex @code{task} attribute
  70. Declare the given function as a StarPU task. Its return type must be
  71. @code{void}. When a function declared as @code{task} has a user-defined
  72. body, that body is interpreted as the @dfn{implicit definition of the
  73. task's CPU implementation} (see example below). In all cases, the
  74. actual definition of a task's body is automatically generated by the
  75. compiler.
  76. Under the hood, declaring a task leads to the declaration of the
  77. corresponding @code{codelet} (@pxref{Codelet and Tasks}). If one or
  78. more task implementations are declared in the same compilation unit,
  79. then the codelet and the function itself are also defined; they inherit
  80. the scope of the task.
  81. Scalar arguments to the task are passed by value and copied to the
  82. target device if need be---technically, they are passed as the
  83. @code{cl_arg} buffer (@pxref{Codelets and Tasks, @code{cl_arg}}).
  84. @cindex @code{output} type attribute
  85. Pointer arguments are assumed to be registered data buffers---the
  86. @code{buffers} argument of a task (@pxref{Codelets and Tasks,
  87. @code{buffers}}); @code{const}-qualified pointer arguments are viewed as
  88. read-only buffers (@code{STARPU_R}), and non-@code{const}-qualified
  89. buffers are assumed to be used read-write (@code{STARPU_RW}). In
  90. addition, the @code{output} type attribute can be as a type qualifier
  91. for output pointer or array parameters (@code{STARPU_W}).
  92. @item task_implementation (@var{target}, @var{task})
  93. @cindex @code{task_implementation} attribute
  94. Declare the given function as an implementation of @var{task} to run on
  95. @var{target}. @var{target} must be a string, currently one of
  96. @code{"cpu"}, @code{"opencl"}, or @code{"cuda"}.
  97. @c FIXME: Update when OpenCL support is ready.
  98. @end table
  99. Here is an example:
  100. @cartouche
  101. @smallexample
  102. #define __output __attribute__ ((output))
  103. static void matmul (const float *A, const float *B,
  104. __output float *C,
  105. unsigned nx, unsigned ny, unsigned nz)
  106. __attribute__ ((task));
  107. static void matmul_cpu (const float *A, const float *B,
  108. __output float *C,
  109. unsigned nx, unsigned ny, unsigned nz)
  110. __attribute__ ((task_implementation ("cpu", matmul)));
  111. static void
  112. matmul_cpu (const float *A, const float *B, __output float *C,
  113. unsigned nx, unsigned ny, unsigned nz)
  114. @{
  115. unsigned i, j, k;
  116. for (j = 0; j < ny; j++)
  117. for (i = 0; i < nx; i++)
  118. @{
  119. for (k = 0; k < nz; k++)
  120. C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
  121. @}
  122. @}
  123. @end smallexample
  124. @end cartouche
  125. @noindent
  126. A @code{matmult} task is defined; it has only one implementation,
  127. @code{matmult_cpu}, which runs on the CPU. Variables @var{A} and
  128. @var{B} are input buffers, whereas @var{C} is considered an input/output
  129. buffer.
  130. @cindex implicit task CPU implementation
  131. For convenience, when a function declared with the @code{task} attribute
  132. has a user-defined body, that body is assumed to be that of the CPU
  133. implementation of a task, which we call an @dfn{implicit task CPU
  134. implementation}. Thus, the above snippet can be simplified like this:
  135. @cartouche
  136. @smallexample
  137. #define __output __attribute__ ((output))
  138. static void matmul (const float *A, const float *B,
  139. __output float *C,
  140. unsigned nx, unsigned ny, unsigned nz)
  141. __attribute__ ((task));
  142. /* Implicit definition of the CPU implementation of the
  143. `matmul' task. */
  144. static void
  145. matmul (const float *A, const float *B, __output float *C,
  146. unsigned nx, unsigned ny, unsigned nz)
  147. @{
  148. unsigned i, j, k;
  149. for (j = 0; j < ny; j++)
  150. for (i = 0; i < nx; i++)
  151. @{
  152. for (k = 0; k < nz; k++)
  153. C[j * nx + i] += A[j * nz + k] * B[k * nx + i];
  154. @}
  155. @}
  156. @end smallexample
  157. @end cartouche
  158. @noindent
  159. Use of implicit CPU task implementations as above has the advantage that
  160. the code is valid sequential code when StarPU's GCC plug-in is not used
  161. (@pxref{Conditional Extensions}).
  162. CUDA and OpenCL implementations can be declared in a similar way:
  163. @cartouche
  164. @smallexample
  165. static void matmul_cuda (const float *A, const float *B, float *C,
  166. unsigned nx, unsigned ny, unsigned nz)
  167. __attribute__ ((task_implementation ("cuda", matmul)));
  168. static void matmul_opencl (const float *A, const float *B, float *C,
  169. unsigned nx, unsigned ny, unsigned nz)
  170. __attribute__ ((task_implementation ("opencl", matmul)));
  171. @end smallexample
  172. @end cartouche
  173. @noindent
  174. The CUDA and OpenCL implementations typically either invoke a kernel
  175. written in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and
  176. @pxref{OpenCL Kernel}), or call a library function that uses CUDA or
  177. OpenCL under the hood, such as CUBLAS functions:
  178. @cartouche
  179. @smallexample
  180. static void
  181. matmul_cuda (const float *A, const float *B, float *C,
  182. unsigned nx, unsigned ny, unsigned nz)
  183. @{
  184. cublasSgemm ('n', 'n', nx, ny, nz,
  185. 1.0f, A, 0, B, 0,
  186. 0.0f, C, 0);
  187. cudaStreamSynchronize (starpu_cuda_get_local_stream ());
  188. @}
  189. @end smallexample
  190. @end cartouche
  191. A task can be invoked like a regular C function:
  192. @cartouche
  193. @smallexample
  194. matmul (&A[i * zdim * bydim + k * bzdim * bydim],
  195. &B[k * xdim * bzdim + j * bxdim * bzdim],
  196. &C[i * xdim * bydim + j * bxdim * bydim],
  197. bxdim, bydim, bzdim);
  198. @end smallexample
  199. @end cartouche
  200. @noindent
  201. This leads to an @dfn{asynchronous invocation}, whereby @code{matmult}'s
  202. implementation may run in parallel with the continuation of the caller.
  203. The next section describes how memory buffers must be handled in
  204. StarPU-GCC code. For a complete example, see the
  205. @code{gcc-plugin/examples} directory of the source distribution, and
  206. @ref{Vector Scaling Using the C Extension, the vector-scaling
  207. example}.
  208. @node Synchronization and Other Pragmas
  209. @section Initialization, Termination, and Synchronization
  210. The following pragmas allow user code to control StarPU's life time and
  211. to synchronize with tasks.
  212. @table @code
  213. @item #pragma starpu initialize
  214. Initialize StarPU. This call is compulsory and is @emph{never} added
  215. implicitly. One of the reasons this has to be done explicitly is that
  216. it provides greater control to user code over its resource usage.
  217. @item #pragma starpu shutdown
  218. Shut down StarPU, giving it an opportunity to write profiling info to a
  219. file on disk, for instance (@pxref{Off-line, off-line performance
  220. feedback}).
  221. @item #pragma starpu wait
  222. Wait for all task invocations to complete, as with
  223. @code{starpu_wait_for_all} (@pxref{Codelets and Tasks,
  224. starpu_wait_for_all}).
  225. @end table
  226. @node Registered Data Buffers
  227. @section Registered Data Buffers
  228. Data buffers such as matrices and vectors that are to be passed to tasks
  229. must be @dfn{registered}. Registration allows StarPU to handle data
  230. transfers among devices---e.g., transferring an input buffer from the
  231. CPU's main memory to a task scheduled to run a GPU (@pxref{StarPU Data
  232. Management Library}).
  233. The following pragmas are provided:
  234. @table @code
  235. @item #pragma starpu register @var{ptr} [@var{size}]
  236. Register @var{ptr} as a @var{size}-element buffer. When @var{ptr} has
  237. an array type whose size is known, @var{size} may be omitted.
  238. Alternatively, the @code{registered} attribute can be used (see below.)
  239. @item #pragma starpu unregister @var{ptr}
  240. Unregister the previously-registered memory area pointed to by
  241. @var{ptr}. As a side-effect, @var{ptr} points to a valid copy in main
  242. memory.
  243. @item #pragma starpu acquire @var{ptr}
  244. Acquire in main memory an up-to-date copy of the previously-registered
  245. memory area pointed to by @var{ptr}, for read-write access.
  246. @item #pragma starpu release @var{ptr}
  247. Release the previously-register memory area pointed to by @var{ptr},
  248. making it available to the tasks.
  249. @end table
  250. Additionally, the following attributes offer a simple way to allocate
  251. and register storage for arrays:
  252. @table @code
  253. @item registered
  254. @cindex @code{registered} attribute
  255. This attributes applies to local variables with an array type. Its
  256. effect is to automatically register the array's storage, as per
  257. @code{#pragma starpu register}. The array is automatically unregistered
  258. when the variable's scope is left. This attribute is typically used in
  259. conjunction with the @code{heap_allocated} attribute, described below.
  260. @item heap_allocated
  261. @cindex @code{heap_allocated} attribute
  262. This attributes applies to local variables with an array type. Its
  263. effect is to automatically allocate the array's storage on
  264. the heap, using @code{starpu_malloc} under the hood (@pxref{Basic Data
  265. Management API, starpu_malloc}). The heap-allocated array is automatically
  266. freed when the variable's scope is left, as with
  267. automatic variables.
  268. @end table
  269. @noindent
  270. The following example illustrates use of the @code{heap_allocated}
  271. attribute:
  272. @example
  273. extern void cholesky(unsigned nblocks, unsigned size,
  274. float mat[nblocks][nblocks][size])
  275. __attribute__ ((task));
  276. int
  277. main (int argc, char *argv[])
  278. @{
  279. #pragma starpu initialize
  280. /* ... */
  281. int nblocks, size;
  282. parse_args (&nblocks, &size);
  283. /* Allocate an array of the required size on the heap,
  284. and register it. */
  285. @{
  286. float matrix[nblocks][nblocks][size]
  287. __attribute__ ((heap_allocated, registered));
  288. cholesky (nblocks, size, matrix);
  289. #pragma starpu wait
  290. @} /* MATRIX is automatically unregistered & freed here. */
  291. #pragma starpu shutdown
  292. return EXIT_SUCCESS;
  293. @}
  294. @end example
  295. @node Conditional Extensions
  296. @section Using C Extensions Conditionally
  297. The C extensions described in this chapter are only available when GCC
  298. and its StarPU plug-in are in use. Yet, it is possible to make use of
  299. these extensions when they are available---leading to hybrid CPU/GPU
  300. code---and discard them when they are not available---leading to valid
  301. sequential code.
  302. To that end, the GCC plug-in defines a C preprocessor macro when it is
  303. being used:
  304. @defmac STARPU_GCC_PLUGIN
  305. Defined for code being compiled with the StarPU GCC plug-in. When
  306. defined, this macro expands to an integer denoting the version of the
  307. supported C extensions.
  308. @end defmac
  309. The code below illustrates how to define a task and its implementations
  310. in a way that allows it to be compiled without the GCC plug-in:
  311. @smallexample
  312. /* This program is valid, whether or not StarPU's GCC plug-in
  313. is being used. */
  314. #include <stdlib.h>
  315. /* The attribute below is ignored when GCC is not used. */
  316. static void matmul (const float *A, const float *B, float * C,
  317. unsigned nx, unsigned ny, unsigned nz)
  318. __attribute__ ((task));
  319. static void
  320. matmul (const float *A, const float *B, float * C,
  321. unsigned nx, unsigned ny, unsigned nz)
  322. @{
  323. /* Code of the CPU kernel here... */
  324. @}
  325. #ifdef STARPU_GCC_PLUGIN
  326. /* Optional OpenCL task implementation. */
  327. static void matmul_opencl (const float *A, const float *B, float * C,
  328. unsigned nx, unsigned ny, unsigned nz)
  329. __attribute__ ((task_implementation ("opencl", matmul)));
  330. static void
  331. matmul_opencl (const float *A, const float *B, float * C,
  332. unsigned nx, unsigned ny, unsigned nz)
  333. @{
  334. /* Code that invokes the OpenCL kernel here... */
  335. @}
  336. #endif
  337. int
  338. main (int argc, char *argv[])
  339. @{
  340. /* The pragmas below are simply ignored when StarPU-GCC
  341. is not used. */
  342. #pragma starpu initialize
  343. float A[123][42][7], B[123][42][7], C[123][42][7];
  344. #pragma starpu register A
  345. #pragma starpu register B
  346. #pragma starpu register C
  347. /* When StarPU-GCC is used, the call below is asynchronous;
  348. otherwise, it is synchronous. */
  349. matmul ((float *) A, (float *) B, (float *) C, 123, 42, 7);
  350. #pragma starpu wait
  351. #pragma starpu shutdown
  352. return EXIT_SUCCESS;
  353. @}
  354. @end smallexample
  355. @noindent
  356. The above program is a valid StarPU program when StarPU's GCC plug-in is
  357. used; it is also a valid sequential program when the plug-in is not
  358. used.
  359. Note that attributes such as @code{task} as well as @code{starpu}
  360. pragmas are simply ignored by GCC when the StarPU plug-in is not loaded.
  361. However, @command{gcc -Wall} emits a warning for unknown attributes and
  362. pragmas, which can be inconvenient. In addition, other compilers may be
  363. unable to parse the attribute syntax@footnote{In practice, Clang and
  364. several proprietary compilers implement attributes.}, so you may want to
  365. wrap attributes in macros like this:
  366. @smallexample
  367. /* Use the `task' attribute only when StarPU's GCC plug-in
  368. is available. */
  369. #ifdef STARPU_GCC_PLUGIN
  370. # define __task __attribute__ ((task))
  371. #else
  372. # define __task
  373. #endif
  374. static void matmul (const float *A, const float *B, float *C,
  375. unsigned nx, unsigned ny, unsigned nz) __task;
  376. @end smallexample
  377. @c Local Variables:
  378. @c TeX-master: "../starpu.texi"
  379. @c ispell-local-dictionary: "american"
  380. @c End: