basic-examples.texi 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @node Basic Examples
  8. @chapter Basic Examples
  9. @menu
  10. * Compiling and linking options::
  11. * Hello World:: Submitting Tasks
  12. * Scaling a Vector:: Manipulating Data
  13. * Vector Scaling on an Hybrid CPU/GPU Machine:: Handling Heterogeneous Architectures
  14. @end menu
  15. @node Compiling and linking options
  16. @section Compiling and linking options
  17. Let's suppose StarPU has been installed in the directory
  18. @code{$STARPU_DIR}. As explained in @ref{Setting flags for compiling and linking applications},
  19. the variable @code{PKG_CONFIG_PATH} needs to be set. It is also
  20. necessary to set the variable @code{LD_LIBRARY_PATH} to locate dynamic
  21. libraries at runtime.
  22. @example
  23. % PKG_CONFIG_PATH=$STARPU_DIR/lib/pkgconfig:$PKG_CONFIG_PATH
  24. % LD_LIBRARY_PATH=$STARPU_DIR/lib:$LD_LIBRARY_PATH
  25. @end example
  26. The Makefile could for instance contain the following lines to define which
  27. options must be given to the compiler and to the linker:
  28. @cartouche
  29. @example
  30. CFLAGS += $$(pkg-config --cflags libstarpu)
  31. LDFLAGS += $$(pkg-config --libs libstarpu)
  32. @end example
  33. @end cartouche
  34. @node Hello World
  35. @section Hello World
  36. @menu
  37. * Required Headers::
  38. * Defining a Codelet::
  39. * Submitting a Task::
  40. * Execution of Hello World::
  41. @end menu
  42. In this section, we show how to implement a simple program that submits a task to StarPU.
  43. @node Required Headers
  44. @subsection Required Headers
  45. The @code{starpu.h} header should be included in any code using StarPU.
  46. @cartouche
  47. @smallexample
  48. #include <starpu.h>
  49. @end smallexample
  50. @end cartouche
  51. @node Defining a Codelet
  52. @subsection Defining a Codelet
  53. @cartouche
  54. @smallexample
  55. struct params @{
  56. int i;
  57. float f;
  58. @};
  59. void cpu_func(void *buffers[], void *cl_arg)
  60. @{
  61. struct params *params = cl_arg;
  62. printf("Hello world (params = @{%i, %f@} )\n", params->i, params->f);
  63. @}
  64. struct starpu_codelet cl =
  65. @{
  66. .where = STARPU_CPU,
  67. .cpu_func = cpu_func,
  68. .nbuffers = 0
  69. @};
  70. @end smallexample
  71. @end cartouche
  72. A codelet is a structure that represents a computational kernel. Such a codelet
  73. may contain an implementation of the same kernel on different architectures
  74. (e.g. CUDA, Cell's SPU, x86, ...).
  75. The @code{nbuffers} field specifies the number of data buffers that are
  76. manipulated by the codelet: here the codelet does not access or modify any data
  77. that is controlled by our data management library. Note that the argument
  78. passed to the codelet (the @code{cl_arg} field of the @code{starpu_task}
  79. structure) does not count as a buffer since it is not managed by our data
  80. management library, but just contain trivial parameters.
  81. @c TODO need a crossref to the proper description of "where" see bla for more ...
  82. We create a codelet which may only be executed on the CPUs. The @code{where}
  83. field is a bitmask that defines where the codelet may be executed. Here, the
  84. @code{STARPU_CPU} value means that only CPUs can execute this codelet
  85. (@pxref{Codelets and Tasks} for more details on this field).
  86. When a CPU core executes a codelet, it calls the @code{cpu_func} function,
  87. which @emph{must} have the following prototype:
  88. @code{void (*cpu_func)(void *buffers[], void *cl_arg);}
  89. In this example, we can ignore the first argument of this function which gives a
  90. description of the input and output buffers (e.g. the size and the location of
  91. the matrices) since there is none.
  92. The second argument is a pointer to a buffer passed as an
  93. argument to the codelet by the means of the @code{cl_arg} field of the
  94. @code{starpu_task} structure.
  95. @c TODO rewrite so that it is a little clearer ?
  96. Be aware that this may be a pointer to a
  97. @emph{copy} of the actual buffer, and not the pointer given by the programmer:
  98. if the codelet modifies this buffer, there is no guarantee that the initial
  99. buffer will be modified as well: this for instance implies that the buffer
  100. cannot be used as a synchronization medium. If synchronization is needed, data
  101. has to be registered to StarPU, see @ref{Scaling a Vector}.
  102. @node Submitting a Task
  103. @subsection Submitting a Task
  104. @cartouche
  105. @smallexample
  106. void callback_func(void *callback_arg)
  107. @{
  108. printf("Callback function (arg %x)\n", callback_arg);
  109. @}
  110. int main(int argc, char **argv)
  111. @{
  112. /* @b{initialize StarPU} */
  113. starpu_init(NULL);
  114. struct starpu_task *task = starpu_task_create();
  115. task->cl = &cl; /* @b{Pointer to the codelet defined above} */
  116. struct params params = @{ 1, 2.0f @};
  117. task->cl_arg = &params;
  118. task->cl_arg_size = sizeof(params);
  119. task->callback_func = callback_func;
  120. task->callback_arg = 0x42;
  121. /* @b{starpu_task_submit will be a blocking call} */
  122. task->synchronous = 1;
  123. /* @b{submit the task to StarPU} */
  124. starpu_task_submit(task);
  125. /* @b{terminate StarPU} */
  126. starpu_shutdown();
  127. return 0;
  128. @}
  129. @end smallexample
  130. @end cartouche
  131. Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
  132. @code{NULL} argument specifies that we use default configuration. Tasks cannot
  133. be submitted after the termination of StarPU by a call to
  134. @code{starpu_shutdown}.
  135. In the example above, a task structure is allocated by a call to
  136. @code{starpu_task_create}. This function only allocates and fills the
  137. corresponding structure with the default settings (@pxref{Codelets and
  138. Tasks, starpu_task_create}), but it does not submit the task to StarPU.
  139. @c not really clear ;)
  140. The @code{cl} field is a pointer to the codelet which the task will
  141. execute: in other words, the codelet structure describes which computational
  142. kernel should be offloaded on the different architectures, and the task
  143. structure is a wrapper containing a codelet and the piece of data on which the
  144. codelet should operate.
  145. The optional @code{cl_arg} field is a pointer to a buffer (of size
  146. @code{cl_arg_size}) with some parameters for the kernel
  147. described by the codelet. For instance, if a codelet implements a computational
  148. kernel that multiplies its input vector by a constant, the constant could be
  149. specified by the means of this buffer, instead of registering it as a StarPU
  150. data. It must however be noted that StarPU avoids making copy whenever possible
  151. and rather passes the pointer as such, so the buffer which is pointed at must
  152. kept allocated until the task terminates, and if several tasks are submitted
  153. with various parameters, each of them must be given a pointer to their own
  154. buffer.
  155. Once a task has been executed, an optional callback function is be called.
  156. While the computational kernel could be offloaded on various architectures, the
  157. callback function is always executed on a CPU. The @code{callback_arg}
  158. pointer is passed as an argument of the callback. The prototype of a callback
  159. function must be:
  160. @code{void (*callback_function)(void *);}
  161. If the @code{synchronous} field is non-zero, task submission will be
  162. synchronous: the @code{starpu_task_submit} function will not return until the
  163. task was executed. Note that the @code{starpu_shutdown} method does not
  164. guarantee that asynchronous tasks have been executed before it returns,
  165. @code{starpu_task_wait_for_all} can be used to that effect, or data can be
  166. unregistered (@code{starpu_data_unregister(vector_handle);}), which will
  167. implicitly wait for all the tasks scheduled to work on it, unless explicitly
  168. disabled thanks to @code{starpu_data_set_default_sequential_consistency_flag} or
  169. @code{starpu_data_set_sequential_consistency_flag}.
  170. @node Execution of Hello World
  171. @subsection Execution of Hello World
  172. @smallexample
  173. % make hello_world
  174. cc $(pkg-config --cflags libstarpu) $(pkg-config --libs libstarpu) hello_world.c -o hello_world
  175. % ./hello_world
  176. Hello world (params = @{1, 2.000000@} )
  177. Callback function (arg 42)
  178. @end smallexample
  179. @node Scaling a Vector
  180. @section Manipulating Data: Scaling a Vector
  181. The previous example has shown how to submit tasks. In this section,
  182. we show how StarPU tasks can manipulate data. The full source code for
  183. this example is given in @ref{Full source code for the 'Scaling a Vector' example}.
  184. @menu
  185. * Source code of Vector Scaling::
  186. * Execution of Vector Scaling::
  187. @end menu
  188. @node Source code of Vector Scaling
  189. @subsection Source code of Vector Scaling
  190. Programmers can describe the data layout of their application so that StarPU is
  191. responsible for enforcing data coherency and availability across the machine.
  192. Instead of handling complex (and non-portable) mechanisms to perform data
  193. movements, programmers only declare which piece of data is accessed and/or
  194. modified by a task, and StarPU makes sure that when a computational kernel
  195. starts somewhere (e.g. on a GPU), its data are available locally.
  196. Before submitting those tasks, the programmer first needs to declare the
  197. different pieces of data to StarPU using the @code{starpu_*_data_register}
  198. functions. To ease the development of applications for StarPU, it is possible
  199. to describe multiple types of data layout. A type of data layout is called an
  200. @b{interface}. There are different predefined interfaces available in StarPU:
  201. here we will consider the @b{vector interface}.
  202. The following lines show how to declare an array of @code{NX} elements of type
  203. @code{float} using the vector interface:
  204. @cartouche
  205. @smallexample
  206. float vector[NX];
  207. starpu_data_handle_t vector_handle;
  208. starpu_vector_data_register(&vector_handle, 0, (uintptr_t)vector, NX,
  209. sizeof(vector[0]));
  210. @end smallexample
  211. @end cartouche
  212. The first argument, called the @b{data handle}, is an opaque pointer which
  213. designates the array in StarPU. This is also the structure which is used to
  214. describe which data is used by a task. The second argument is the node number
  215. where the data originally resides. Here it is 0 since the @code{vector} array is in
  216. the main memory. Then comes the pointer @code{vector} where the data can be found in main memory,
  217. the number of elements in the vector and the size of each element.
  218. The following shows how to construct a StarPU task that will manipulate the
  219. vector and a constant factor.
  220. @cartouche
  221. @smallexample
  222. float factor = 3.14;
  223. struct starpu_task *task = starpu_task_create();
  224. task->cl = &cl; /* @b{Pointer to the codelet defined below} */
  225. task->buffers[0].handle = vector_handle; /* @b{First parameter of the codelet} */
  226. task->buffers[0].mode = STARPU_RW;
  227. task->cl_arg = &factor;
  228. task->cl_arg_size = sizeof(factor);
  229. task->synchronous = 1;
  230. starpu_task_submit(task);
  231. @end smallexample
  232. @end cartouche
  233. Since the factor is a mere constant float value parameter,
  234. it does not need a preliminary registration, and
  235. can just be passed through the @code{cl_arg} pointer like in the previous
  236. example. The vector parameter is described by its handle.
  237. There are two fields in each element of the @code{buffers} array.
  238. @code{handle} is the handle of the data, and @code{mode} specifies how the
  239. kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
  240. write-only and @code{STARPU_RW} for read and write access).
  241. The definition of the codelet can be written as follows:
  242. @cartouche
  243. @smallexample
  244. void scal_cpu_func(void *buffers[], void *cl_arg)
  245. @{
  246. unsigned i;
  247. float *factor = cl_arg;
  248. /* length of the vector */
  249. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  250. /* CPU copy of the vector pointer */
  251. float *val = (float *)STARPU_VECTOR_GET_PTR(buffers[0]);
  252. for (i = 0; i < n; i++)
  253. val[i] *= *factor;
  254. @}
  255. struct starpu_codelet cl = @{
  256. .where = STARPU_CPU,
  257. .cpu_func = scal_cpu_func,
  258. .nbuffers = 1
  259. @};
  260. @end smallexample
  261. @end cartouche
  262. The first argument is an array that gives
  263. a description of all the buffers passed in the @code{task->buffers}@ array. The
  264. size of this array is given by the @code{nbuffers} field of the codelet
  265. structure. For the sake of genericity, this array contains pointers to the
  266. different interfaces describing each buffer. In the case of the @b{vector
  267. interface}, the location of the vector (resp. its length) is accessible in the
  268. @code{ptr} (resp. @code{nx}) of this array. Since the vector is accessed in a
  269. read-write fashion, any modification will automatically affect future accesses
  270. to this vector made by other tasks.
  271. The second argument of the @code{scal_cpu_func} function contains a pointer to the
  272. parameters of the codelet (given in @code{task->cl_arg}), so that we read the
  273. constant factor from this pointer.
  274. @node Execution of Vector Scaling
  275. @subsection Execution of Vector Scaling
  276. @smallexample
  277. % make vector_scal
  278. cc $(pkg-config --cflags libstarpu) $(pkg-config --libs libstarpu) vector_scal.c -o vector_scal
  279. % ./vector_scal
  280. 0.000000 3.000000 6.000000 9.000000 12.000000
  281. @end smallexample
  282. @node Vector Scaling on an Hybrid CPU/GPU Machine
  283. @section Vector Scaling on an Hybrid CPU/GPU Machine
  284. Contrary to the previous examples, the task submitted in this example may not
  285. only be executed by the CPUs, but also by a CUDA device.
  286. @menu
  287. * Definition of the CUDA Kernel::
  288. * Definition of the OpenCL Kernel::
  289. * Definition of the Main Code::
  290. * Execution of Hybrid Vector Scaling::
  291. @end menu
  292. @node Definition of the CUDA Kernel
  293. @subsection Definition of the CUDA Kernel
  294. The CUDA implementation can be written as follows. It needs to be compiled with
  295. a CUDA compiler such as nvcc, the NVIDIA CUDA compiler driver. It must be noted
  296. that the vector pointer returned by STARPU_VECTOR_GET_PTR is here a pointer in GPU
  297. memory, so that it can be passed as such to the @code{vector_mult_cuda} kernel
  298. call.
  299. @cartouche
  300. @smallexample
  301. #include <starpu.h>
  302. #include <starpu_cuda.h>
  303. static __global__ void vector_mult_cuda(float *val, unsigned n,
  304. float factor)
  305. @{
  306. unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
  307. if (i < n)
  308. val[i] *= factor;
  309. @}
  310. extern "C" void scal_cuda_func(void *buffers[], void *_args)
  311. @{
  312. float *factor = (float *)_args;
  313. /* length of the vector */
  314. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  315. /* CUDA copy of the vector pointer */
  316. float *val = (float *)STARPU_VECTOR_GET_PTR(buffers[0]);
  317. unsigned threads_per_block = 64;
  318. unsigned nblocks = (n + threads_per_block-1) / threads_per_block;
  319. @i{ vector_mult_cuda<<<nblocks,threads_per_block, 0, starpu_cuda_get_local_stream()>>>(val, n, *factor);}
  320. @i{ cudaStreamSynchronize(starpu_cuda_get_local_stream());}
  321. @}
  322. @end smallexample
  323. @end cartouche
  324. @node Definition of the OpenCL Kernel
  325. @subsection Definition of the OpenCL Kernel
  326. The OpenCL implementation can be written as follows. StarPU provides
  327. tools to compile a OpenCL kernel stored in a file.
  328. @cartouche
  329. @smallexample
  330. __kernel void vector_mult_opencl(__global float* val, int nx, float factor)
  331. @{
  332. const int i = get_global_id(0);
  333. if (i < nx) @{
  334. val[i] *= factor;
  335. @}
  336. @}
  337. @end smallexample
  338. @end cartouche
  339. Similarly to CUDA, the pointer returned by @code{STARPU_VECTOR_GET_PTR} is here
  340. a device pointer, so that it is passed as such to the OpenCL kernel.
  341. @cartouche
  342. @smallexample
  343. #include <starpu.h>
  344. @i{#include <starpu_opencl.h>}
  345. @i{extern struct starpu_opencl_program programs;}
  346. void scal_opencl_func(void *buffers[], void *_args)
  347. @{
  348. float *factor = _args;
  349. @i{ int id, devid, err;}
  350. @i{ cl_kernel kernel;}
  351. @i{ cl_command_queue queue;}
  352. @i{ cl_event event;}
  353. /* length of the vector */
  354. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  355. /* OpenCL copy of the vector pointer */
  356. cl_mem val = (cl_mem) STARPU_VECTOR_GET_PTR(buffers[0]);
  357. @i{ id = starpu_worker_get_id();}
  358. @i{ devid = starpu_worker_get_devid(id);}
  359. @i{ err = starpu_opencl_load_kernel(&kernel, &queue, &programs,}
  360. @i{ "vector_mult_opencl", devid); /* @b{Name of the codelet defined above} */}
  361. @i{ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);}
  362. @i{ err = clSetKernelArg(kernel, 0, sizeof(val), &val);}
  363. @i{ err |= clSetKernelArg(kernel, 1, sizeof(n), &n);}
  364. @i{ err |= clSetKernelArg(kernel, 2, sizeof(*factor), factor);}
  365. @i{ if (err) STARPU_OPENCL_REPORT_ERROR(err);}
  366. @i{ @{}
  367. @i{ size_t global=1;}
  368. @i{ size_t local=1;}
  369. @i{ err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, &local, 0, NULL, &event);}
  370. @i{ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);}
  371. @i{ @}}
  372. @i{ clFinish(queue);}
  373. @i{ starpu_opencl_collect_stats(event);}
  374. @i{ clReleaseEvent(event);}
  375. @i{ starpu_opencl_release_kernel(kernel);}
  376. @}
  377. @end smallexample
  378. @end cartouche
  379. @node Definition of the Main Code
  380. @subsection Definition of the Main Code
  381. The CPU implementation is the same as in the previous section.
  382. Here is the source of the main application. You can notice the value of the
  383. field @code{where} for the codelet. We specify
  384. @code{STARPU_CPU|STARPU_CUDA|STARPU_OPENCL} to indicate to StarPU that the codelet
  385. can be executed either on a CPU or on a CUDA or an OpenCL device.
  386. @cartouche
  387. @smallexample
  388. #include <starpu.h>
  389. #define NX 2048
  390. extern void scal_cuda_func(void *buffers[], void *_args);
  391. extern void scal_cpu_func(void *buffers[], void *_args);
  392. extern void scal_opencl_func(void *buffers[], void *_args);
  393. /* @b{Definition of the codelet} */
  394. static struct starpu_codelet cl = @{
  395. .where = STARPU_CPU|STARPU_CUDA|STARPU_OPENCL; /* @b{It can be executed on a CPU,} */
  396. /* @b{on a CUDA device, or on an OpenCL device} */
  397. .cuda_func = scal_cuda_func,
  398. .cpu_func = scal_cpu_func,
  399. .opencl_func = scal_opencl_func,
  400. .nbuffers = 1
  401. @}
  402. #ifdef STARPU_USE_OPENCL
  403. /* @b{The compiled version of the OpenCL program} */
  404. struct starpu_opencl_program programs;
  405. #endif
  406. int main(int argc, char **argv)
  407. @{
  408. float *vector;
  409. int i, ret;
  410. float factor=3.0;
  411. struct starpu_task *task;
  412. starpu_data_handle_t vector_handle;
  413. starpu_init(NULL); /* @b{Initialising StarPU} */
  414. #ifdef STARPU_USE_OPENCL
  415. starpu_opencl_load_opencl_from_file(
  416. "examples/basic_examples/vector_scal_opencl_codelet.cl",
  417. &programs, NULL);
  418. #endif
  419. vector = malloc(NX*sizeof(vector[0]));
  420. assert(vector);
  421. for(i=0 ; i<NX ; i++) vector[i] = i;
  422. @end smallexample
  423. @end cartouche
  424. @cartouche
  425. @smallexample
  426. /* @b{Registering data within StarPU} */
  427. starpu_vector_data_register(&vector_handle, 0, (uintptr_t)vector,
  428. NX, sizeof(vector[0]));
  429. /* @b{Definition of the task} */
  430. task = starpu_task_create();
  431. task->cl = &cl;
  432. task->buffers[0].handle = vector_handle;
  433. task->buffers[0].mode = STARPU_RW;
  434. task->cl_arg = &factor;
  435. task->cl_arg_size = sizeof(factor);
  436. @end smallexample
  437. @end cartouche
  438. @cartouche
  439. @smallexample
  440. /* @b{Submitting the task} */
  441. ret = starpu_task_submit(task);
  442. if (ret == -ENODEV) @{
  443. fprintf(stderr, "No worker may execute this task\n");
  444. return 1;
  445. @}
  446. @c TODO: Mmm, should rather be an unregistration with an implicit dependency, no?
  447. /* @b{Waiting for its termination} */
  448. starpu_task_wait_for_all();
  449. /* @b{Update the vector in RAM} */
  450. starpu_data_acquire(vector_handle, STARPU_R);
  451. @end smallexample
  452. @end cartouche
  453. @cartouche
  454. @smallexample
  455. /* @b{Access the data} */
  456. for(i=0 ; i<NX; i++) @{
  457. fprintf(stderr, "%f ", vector[i]);
  458. @}
  459. fprintf(stderr, "\n");
  460. /* @b{Release the RAM view of the data before unregistering it and shutting down StarPU} */
  461. starpu_data_release(vector_handle);
  462. starpu_data_unregister(vector_handle);
  463. starpu_shutdown();
  464. return 0;
  465. @}
  466. @end smallexample
  467. @end cartouche
  468. @node Execution of Hybrid Vector Scaling
  469. @subsection Execution of Hybrid Vector Scaling
  470. The Makefile given at the beginning of the section must be extended to
  471. give the rules to compile the CUDA source code. Note that the source
  472. file of the OpenCL kernel does not need to be compiled now, it will
  473. be compiled at run-time when calling the function
  474. @code{starpu_opencl_load_opencl_from_file()} (@pxref{starpu_opencl_load_opencl_from_file}).
  475. @cartouche
  476. @smallexample
  477. CFLAGS += $(shell pkg-config --cflags libstarpu)
  478. LDFLAGS += $(shell pkg-config --libs libstarpu)
  479. CC = gcc
  480. vector_scal: vector_scal.o vector_scal_cpu.o vector_scal_cuda.o vector_scal_opencl.o
  481. %.o: %.cu
  482. nvcc $(CFLAGS) $< -c $@
  483. clean:
  484. rm -f vector_scal *.o
  485. @end smallexample
  486. @end cartouche
  487. @smallexample
  488. % make
  489. @end smallexample
  490. and to execute it, with the default configuration:
  491. @smallexample
  492. % ./vector_scal
  493. 0.000000 3.000000 6.000000 9.000000 12.000000
  494. @end smallexample
  495. or for example, by disabling CPU devices:
  496. @smallexample
  497. % STARPU_NCPUS=0 ./vector_scal
  498. 0.000000 3.000000 6.000000 9.000000 12.000000
  499. @end smallexample
  500. or by disabling CUDA devices (which may permit to enable the use of OpenCL,
  501. see @ref{Enabling OpenCL}):
  502. @smallexample
  503. % STARPU_NCUDA=0 ./vector_scal
  504. 0.000000 3.000000 6.000000 9.000000 12.000000
  505. @end smallexample