basic-examples.texi 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @node Basic Examples
  8. @chapter Basic Examples
  9. @menu
  10. * Compiling and linking options::
  11. * Hello World:: Submitting Tasks
  12. * Scaling a Vector:: Manipulating Data
  13. * Vector Scaling on an Hybrid CPU/GPU Machine:: Handling Heterogeneous Architectures
  14. @end menu
  15. @node Compiling and linking options
  16. @section Compiling and linking options
  17. Let's suppose StarPU has been installed in the directory
  18. @code{$STARPU_DIR}. As explained in @ref{Setting flags for compiling and linking applications},
  19. the variable @code{PKG_CONFIG_PATH} needs to be set. It is also
  20. necessary to set the variable @code{LD_LIBRARY_PATH} to locate dynamic
  21. libraries at runtime.
  22. @example
  23. % PKG_CONFIG_PATH=$STARPU_DIR/lib/pkgconfig:$PKG_CONFIG_PATH
  24. % LD_LIBRARY_PATH=$STARPU_DIR/lib:$LD_LIBRARY_PATH
  25. @end example
  26. The Makefile could for instance contain the following lines to define which
  27. options must be given to the compiler and to the linker:
  28. @cartouche
  29. @example
  30. CFLAGS += $$(pkg-config --cflags libstarpu)
  31. LDFLAGS += $$(pkg-config --libs libstarpu)
  32. @end example
  33. @end cartouche
  34. Also pass the @code{--static} option if the application is to be linked statically.
  35. @node Hello World
  36. @section Hello World
  37. @menu
  38. * Required Headers::
  39. * Defining a Codelet::
  40. * Submitting a Task::
  41. * Execution of Hello World::
  42. @end menu
  43. In this section, we show how to implement a simple program that submits a task to StarPU.
  44. @node Required Headers
  45. @subsection Required Headers
  46. The @code{starpu.h} header should be included in any code using StarPU.
  47. @cartouche
  48. @smallexample
  49. #include <starpu.h>
  50. @end smallexample
  51. @end cartouche
  52. @node Defining a Codelet
  53. @subsection Defining a Codelet
  54. @cartouche
  55. @smallexample
  56. struct params @{
  57. int i;
  58. float f;
  59. @};
  60. void cpu_func(void *buffers[], void *cl_arg)
  61. @{
  62. struct params *params = cl_arg;
  63. printf("Hello world (params = @{%i, %f@} )\n", params->i, params->f);
  64. @}
  65. struct starpu_codelet cl =
  66. @{
  67. .where = STARPU_CPU,
  68. .cpu_funcs = @{ cpu_func, NULL @},
  69. .nbuffers = 0
  70. @};
  71. @end smallexample
  72. @end cartouche
  73. A codelet is a structure that represents a computational kernel. Such a codelet
  74. may contain an implementation of the same kernel on different architectures
  75. (e.g. CUDA, Cell's SPU, x86, ...).
  76. The @code{nbuffers} field specifies the number of data buffers that are
  77. manipulated by the codelet: here the codelet does not access or modify any data
  78. that is controlled by our data management library. Note that the argument
  79. passed to the codelet (the @code{cl_arg} field of the @code{starpu_task}
  80. structure) does not count as a buffer since it is not managed by our data
  81. management library, but just contain trivial parameters.
  82. @c TODO need a crossref to the proper description of "where" see bla for more ...
  83. We create a codelet which may only be executed on the CPUs. The @code{where}
  84. field is a bitmask that defines where the codelet may be executed. Here, the
  85. @code{STARPU_CPU} value means that only CPUs can execute this codelet
  86. (@pxref{Codelets and Tasks} for more details on this field). Note that
  87. the @code{where} field is optional, when unset its value is
  88. automatically set based on the availability of the different
  89. @code{XXX_funcs} fields.
  90. When a CPU core executes a codelet, it calls the @code{cpu_func} function,
  91. which @emph{must} have the following prototype:
  92. @code{void (*cpu_func)(void *buffers[], void *cl_arg);}
  93. In this example, we can ignore the first argument of this function which gives a
  94. description of the input and output buffers (e.g. the size and the location of
  95. the matrices) since there is none.
  96. The second argument is a pointer to a buffer passed as an
  97. argument to the codelet by the means of the @code{cl_arg} field of the
  98. @code{starpu_task} structure.
  99. @c TODO rewrite so that it is a little clearer ?
  100. Be aware that this may be a pointer to a
  101. @emph{copy} of the actual buffer, and not the pointer given by the programmer:
  102. if the codelet modifies this buffer, there is no guarantee that the initial
  103. buffer will be modified as well: this for instance implies that the buffer
  104. cannot be used as a synchronization medium. If synchronization is needed, data
  105. has to be registered to StarPU, see @ref{Scaling a Vector}.
  106. @node Submitting a Task
  107. @subsection Submitting a Task
  108. @cartouche
  109. @smallexample
  110. void callback_func(void *callback_arg)
  111. @{
  112. printf("Callback function (arg %x)\n", callback_arg);
  113. @}
  114. int main(int argc, char **argv)
  115. @{
  116. /* @b{initialize StarPU} */
  117. starpu_init(NULL);
  118. struct starpu_task *task = starpu_task_create();
  119. task->cl = &cl; /* @b{Pointer to the codelet defined above} */
  120. struct params params = @{ 1, 2.0f @};
  121. task->cl_arg = &params;
  122. task->cl_arg_size = sizeof(params);
  123. task->callback_func = callback_func;
  124. task->callback_arg = 0x42;
  125. /* @b{starpu_task_submit will be a blocking call} */
  126. task->synchronous = 1;
  127. /* @b{submit the task to StarPU} */
  128. starpu_task_submit(task);
  129. /* @b{terminate StarPU} */
  130. starpu_shutdown();
  131. return 0;
  132. @}
  133. @end smallexample
  134. @end cartouche
  135. Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
  136. @code{NULL} argument specifies that we use default configuration. Tasks cannot
  137. be submitted after the termination of StarPU by a call to
  138. @code{starpu_shutdown}.
  139. In the example above, a task structure is allocated by a call to
  140. @code{starpu_task_create}. This function only allocates and fills the
  141. corresponding structure with the default settings (@pxref{Codelets and
  142. Tasks, starpu_task_create}), but it does not submit the task to StarPU.
  143. @c not really clear ;)
  144. The @code{cl} field is a pointer to the codelet which the task will
  145. execute: in other words, the codelet structure describes which computational
  146. kernel should be offloaded on the different architectures, and the task
  147. structure is a wrapper containing a codelet and the piece of data on which the
  148. codelet should operate.
  149. The optional @code{cl_arg} field is a pointer to a buffer (of size
  150. @code{cl_arg_size}) with some parameters for the kernel
  151. described by the codelet. For instance, if a codelet implements a computational
  152. kernel that multiplies its input vector by a constant, the constant could be
  153. specified by the means of this buffer, instead of registering it as a StarPU
  154. data. It must however be noted that StarPU avoids making copy whenever possible
  155. and rather passes the pointer as such, so the buffer which is pointed at must
  156. kept allocated until the task terminates, and if several tasks are submitted
  157. with various parameters, each of them must be given a pointer to their own
  158. buffer.
  159. Once a task has been executed, an optional callback function is be called.
  160. While the computational kernel could be offloaded on various architectures, the
  161. callback function is always executed on a CPU. The @code{callback_arg}
  162. pointer is passed as an argument of the callback. The prototype of a callback
  163. function must be:
  164. @code{void (*callback_function)(void *);}
  165. If the @code{synchronous} field is non-zero, task submission will be
  166. synchronous: the @code{starpu_task_submit} function will not return until the
  167. task was executed. Note that the @code{starpu_shutdown} method does not
  168. guarantee that asynchronous tasks have been executed before it returns,
  169. @code{starpu_task_wait_for_all} can be used to that effect, or data can be
  170. unregistered (@code{starpu_data_unregister(vector_handle);}), which will
  171. implicitly wait for all the tasks scheduled to work on it, unless explicitly
  172. disabled thanks to @code{starpu_data_set_default_sequential_consistency_flag} or
  173. @code{starpu_data_set_sequential_consistency_flag}.
  174. @node Execution of Hello World
  175. @subsection Execution of Hello World
  176. @smallexample
  177. % make hello_world
  178. cc $(pkg-config --cflags libstarpu) $(pkg-config --libs libstarpu) hello_world.c -o hello_world
  179. % ./hello_world
  180. Hello world (params = @{1, 2.000000@} )
  181. Callback function (arg 42)
  182. @end smallexample
  183. @node Scaling a Vector
  184. @section Manipulating Data: Scaling a Vector
  185. The previous example has shown how to submit tasks. In this section,
  186. we show how StarPU tasks can manipulate data. The full source code for
  187. this example is given in @ref{Full source code for the 'Scaling a Vector' example}.
  188. @menu
  189. * Source code of Vector Scaling::
  190. * Execution of Vector Scaling::
  191. @end menu
  192. @node Source code of Vector Scaling
  193. @subsection Source code of Vector Scaling
  194. Programmers can describe the data layout of their application so that StarPU is
  195. responsible for enforcing data coherency and availability across the machine.
  196. Instead of handling complex (and non-portable) mechanisms to perform data
  197. movements, programmers only declare which piece of data is accessed and/or
  198. modified by a task, and StarPU makes sure that when a computational kernel
  199. starts somewhere (e.g. on a GPU), its data are available locally.
  200. Before submitting those tasks, the programmer first needs to declare the
  201. different pieces of data to StarPU using the @code{starpu_*_data_register}
  202. functions. To ease the development of applications for StarPU, it is possible
  203. to describe multiple types of data layout. A type of data layout is called an
  204. @b{interface}. There are different predefined interfaces available in StarPU:
  205. here we will consider the @b{vector interface}.
  206. The following lines show how to declare an array of @code{NX} elements of type
  207. @code{float} using the vector interface:
  208. @cartouche
  209. @smallexample
  210. float vector[NX];
  211. starpu_data_handle_t vector_handle;
  212. starpu_vector_data_register(&vector_handle, 0, (uintptr_t)vector, NX,
  213. sizeof(vector[0]));
  214. @end smallexample
  215. @end cartouche
  216. The first argument, called the @b{data handle}, is an opaque pointer which
  217. designates the array in StarPU. This is also the structure which is used to
  218. describe which data is used by a task. The second argument is the node number
  219. where the data originally resides. Here it is 0 since the @code{vector} array is in
  220. the main memory. Then comes the pointer @code{vector} where the data can be found in main memory,
  221. the number of elements in the vector and the size of each element.
  222. The following shows how to construct a StarPU task that will manipulate the
  223. vector and a constant factor.
  224. @cartouche
  225. @smallexample
  226. float factor = 3.14;
  227. struct starpu_task *task = starpu_task_create();
  228. task->cl = &cl; /* @b{Pointer to the codelet defined below} */
  229. task->handles[0] = vector_handle; /* @b{First parameter of the codelet} */
  230. task->cl_arg = &factor;
  231. task->cl_arg_size = sizeof(factor);
  232. task->synchronous = 1;
  233. starpu_task_submit(task);
  234. @end smallexample
  235. @end cartouche
  236. Since the factor is a mere constant float value parameter,
  237. it does not need a preliminary registration, and
  238. can just be passed through the @code{cl_arg} pointer like in the previous
  239. example. The vector parameter is described by its handle.
  240. There are two fields in each element of the @code{buffers} array.
  241. @code{handle} is the handle of the data, and @code{mode} specifies how the
  242. kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
  243. write-only and @code{STARPU_RW} for read and write access).
  244. The definition of the codelet can be written as follows:
  245. @cartouche
  246. @smallexample
  247. void scal_cpu_func(void *buffers[], void *cl_arg)
  248. @{
  249. unsigned i;
  250. float *factor = cl_arg;
  251. /* length of the vector */
  252. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  253. /* CPU copy of the vector pointer */
  254. float *val = (float *)STARPU_VECTOR_GET_PTR(buffers[0]);
  255. for (i = 0; i < n; i++)
  256. val[i] *= *factor;
  257. @}
  258. struct starpu_codelet cl = @{
  259. .where = STARPU_CPU,
  260. .cpu_funcs = @{ scal_cpu_func, NULL @},
  261. .nbuffers = 1,
  262. .modes = @{ STARPU_RW @}
  263. @};
  264. @end smallexample
  265. @end cartouche
  266. The first argument is an array that gives
  267. a description of all the buffers passed in the @code{task->handles}@ array. The
  268. size of this array is given by the @code{nbuffers} field of the codelet
  269. structure. For the sake of genericity, this array contains pointers to the
  270. different interfaces describing each buffer. In the case of the @b{vector
  271. interface}, the location of the vector (resp. its length) is accessible in the
  272. @code{ptr} (resp. @code{nx}) of this array. Since the vector is accessed in a
  273. read-write fashion, any modification will automatically affect future accesses
  274. to this vector made by other tasks.
  275. The second argument of the @code{scal_cpu_func} function contains a pointer to the
  276. parameters of the codelet (given in @code{task->cl_arg}), so that we read the
  277. constant factor from this pointer.
  278. @node Execution of Vector Scaling
  279. @subsection Execution of Vector Scaling
  280. @smallexample
  281. % make vector_scal
  282. cc $(pkg-config --cflags libstarpu) $(pkg-config --libs libstarpu) vector_scal.c -o vector_scal
  283. % ./vector_scal
  284. 0.000000 3.000000 6.000000 9.000000 12.000000
  285. @end smallexample
  286. @node Vector Scaling on an Hybrid CPU/GPU Machine
  287. @section Vector Scaling on an Hybrid CPU/GPU Machine
  288. Contrary to the previous examples, the task submitted in this example may not
  289. only be executed by the CPUs, but also by a CUDA device.
  290. @menu
  291. * Definition of the CUDA Kernel::
  292. * Definition of the OpenCL Kernel::
  293. * Definition of the Main Code::
  294. * Execution of Hybrid Vector Scaling::
  295. @end menu
  296. @node Definition of the CUDA Kernel
  297. @subsection Definition of the CUDA Kernel
  298. The CUDA implementation can be written as follows. It needs to be compiled with
  299. a CUDA compiler such as nvcc, the NVIDIA CUDA compiler driver. It must be noted
  300. that the vector pointer returned by STARPU_VECTOR_GET_PTR is here a pointer in GPU
  301. memory, so that it can be passed as such to the @code{vector_mult_cuda} kernel
  302. call.
  303. @cartouche
  304. @smallexample
  305. #include <starpu.h>
  306. #include <starpu_cuda.h>
  307. static __global__ void vector_mult_cuda(float *val, unsigned n,
  308. float factor)
  309. @{
  310. unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
  311. if (i < n)
  312. val[i] *= factor;
  313. @}
  314. extern "C" void scal_cuda_func(void *buffers[], void *_args)
  315. @{
  316. float *factor = (float *)_args;
  317. /* length of the vector */
  318. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  319. /* CUDA copy of the vector pointer */
  320. float *val = (float *)STARPU_VECTOR_GET_PTR(buffers[0]);
  321. unsigned threads_per_block = 64;
  322. unsigned nblocks = (n + threads_per_block-1) / threads_per_block;
  323. @i{ vector_mult_cuda<<<nblocks,threads_per_block, 0, starpu_cuda_get_local_stream()>>>(val, n, *factor);}
  324. @i{ cudaStreamSynchronize(starpu_cuda_get_local_stream());}
  325. @}
  326. @end smallexample
  327. @end cartouche
  328. @node Definition of the OpenCL Kernel
  329. @subsection Definition of the OpenCL Kernel
  330. The OpenCL implementation can be written as follows. StarPU provides
  331. tools to compile a OpenCL kernel stored in a file.
  332. @cartouche
  333. @smallexample
  334. __kernel void vector_mult_opencl(__global float* val, int nx, float factor)
  335. @{
  336. const int i = get_global_id(0);
  337. if (i < nx) @{
  338. val[i] *= factor;
  339. @}
  340. @}
  341. @end smallexample
  342. @end cartouche
  343. Contrary to CUDA and CPU, @code{STARPU_VECTOR_GET_DEV_HANDLE} has to be used,
  344. which returns a @code{cl_mem} (which is not a device pointer, but an OpenCL
  345. handle), which can be passed as such to the OpenCL kernel. The difference is
  346. important when using partitioning, see @ref{Partitioning Data}.
  347. @cartouche
  348. @smallexample
  349. #include <starpu.h>
  350. @i{#include <starpu_opencl.h>}
  351. @i{extern struct starpu_opencl_program programs;}
  352. void scal_opencl_func(void *buffers[], void *_args)
  353. @{
  354. float *factor = _args;
  355. @i{ int id, devid, err;}
  356. @i{ cl_kernel kernel;}
  357. @i{ cl_command_queue queue;}
  358. @i{ cl_event event;}
  359. /* length of the vector */
  360. unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
  361. /* OpenCL copy of the vector pointer */
  362. cl_mem val = (cl_mem) STARPU_VECTOR_GET_DEV_HANDLE(buffers[0]);
  363. @i{ id = starpu_worker_get_id();}
  364. @i{ devid = starpu_worker_get_devid(id);}
  365. @i{ err = starpu_opencl_load_kernel(&kernel, &queue, &programs,}
  366. @i{ "vector_mult_opencl", devid); /* @b{Name of the codelet defined above} */}
  367. @i{ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);}
  368. @i{ err = clSetKernelArg(kernel, 0, sizeof(val), &val);}
  369. @i{ err |= clSetKernelArg(kernel, 1, sizeof(n), &n);}
  370. @i{ err |= clSetKernelArg(kernel, 2, sizeof(*factor), factor);}
  371. @i{ if (err) STARPU_OPENCL_REPORT_ERROR(err);}
  372. @i{ @{}
  373. @i{ size_t global=1;}
  374. @i{ size_t local=1;}
  375. @i{ err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, &local, 0, NULL, &event);}
  376. @i{ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);}
  377. @i{ @}}
  378. @i{ clFinish(queue);}
  379. @i{ starpu_opencl_collect_stats(event);}
  380. @i{ clReleaseEvent(event);}
  381. @i{ starpu_opencl_release_kernel(kernel);}
  382. @}
  383. @end smallexample
  384. @end cartouche
  385. @node Definition of the Main Code
  386. @subsection Definition of the Main Code
  387. The CPU implementation is the same as in the previous section.
  388. Here is the source of the main application. You can notice the value of the
  389. field @code{where} for the codelet. We specify
  390. @code{STARPU_CPU|STARPU_CUDA|STARPU_OPENCL} to indicate to StarPU that the codelet
  391. can be executed either on a CPU or on a CUDA or an OpenCL device.
  392. @cartouche
  393. @smallexample
  394. #include <starpu.h>
  395. #define NX 2048
  396. extern void scal_cuda_func(void *buffers[], void *_args);
  397. extern void scal_cpu_func(void *buffers[], void *_args);
  398. extern void scal_opencl_func(void *buffers[], void *_args);
  399. /* @b{Definition of the codelet} */
  400. static struct starpu_codelet cl = @{
  401. .where = STARPU_CPU|STARPU_CUDA|STARPU_OPENCL; /* @b{It can be executed on a CPU,} */
  402. /* @b{on a CUDA device, or on an OpenCL device} */
  403. .cuda_funcs = @{ scal_cuda_func, NULL @},
  404. .cpu_funcs = @{ scal_cpu_func, NULL @},
  405. .opencl_funcs = @{ scal_opencl_func, NULL @},
  406. .nbuffers = 1,
  407. .modes = @{ STARPU_RW @}
  408. @}
  409. #ifdef STARPU_USE_OPENCL
  410. /* @b{The compiled version of the OpenCL program} */
  411. struct starpu_opencl_program programs;
  412. #endif
  413. int main(int argc, char **argv)
  414. @{
  415. float *vector;
  416. int i, ret;
  417. float factor=3.0;
  418. struct starpu_task *task;
  419. starpu_data_handle_t vector_handle;
  420. starpu_init(NULL); /* @b{Initialising StarPU} */
  421. #ifdef STARPU_USE_OPENCL
  422. starpu_opencl_load_opencl_from_file(
  423. "examples/basic_examples/vector_scal_opencl_codelet.cl",
  424. &programs, NULL);
  425. #endif
  426. vector = malloc(NX*sizeof(vector[0]));
  427. assert(vector);
  428. for(i=0 ; i<NX ; i++) vector[i] = i;
  429. @end smallexample
  430. @end cartouche
  431. @cartouche
  432. @smallexample
  433. /* @b{Registering data within StarPU} */
  434. starpu_vector_data_register(&vector_handle, 0, (uintptr_t)vector,
  435. NX, sizeof(vector[0]));
  436. /* @b{Definition of the task} */
  437. task = starpu_task_create();
  438. task->cl = &cl;
  439. task->handles[0] = vector_handle;
  440. task->cl_arg = &factor;
  441. task->cl_arg_size = sizeof(factor);
  442. @end smallexample
  443. @end cartouche
  444. @cartouche
  445. @smallexample
  446. /* @b{Submitting the task} */
  447. ret = starpu_task_submit(task);
  448. if (ret == -ENODEV) @{
  449. fprintf(stderr, "No worker may execute this task\n");
  450. return 1;
  451. @}
  452. @c TODO: Mmm, should rather be an unregistration with an implicit dependency, no?
  453. /* @b{Waiting for its termination} */
  454. starpu_task_wait_for_all();
  455. /* @b{Update the vector in RAM} */
  456. starpu_data_acquire(vector_handle, STARPU_R);
  457. @end smallexample
  458. @end cartouche
  459. @cartouche
  460. @smallexample
  461. /* @b{Access the data} */
  462. for(i=0 ; i<NX; i++) @{
  463. fprintf(stderr, "%f ", vector[i]);
  464. @}
  465. fprintf(stderr, "\n");
  466. /* @b{Release the RAM view of the data before unregistering it and shutting down StarPU} */
  467. starpu_data_release(vector_handle);
  468. starpu_data_unregister(vector_handle);
  469. starpu_shutdown();
  470. return 0;
  471. @}
  472. @end smallexample
  473. @end cartouche
  474. @node Execution of Hybrid Vector Scaling
  475. @subsection Execution of Hybrid Vector Scaling
  476. The Makefile given at the beginning of the section must be extended to
  477. give the rules to compile the CUDA source code. Note that the source
  478. file of the OpenCL kernel does not need to be compiled now, it will
  479. be compiled at run-time when calling the function
  480. @code{starpu_opencl_load_opencl_from_file()} (@pxref{starpu_opencl_load_opencl_from_file}).
  481. @cartouche
  482. @smallexample
  483. CFLAGS += $(shell pkg-config --cflags libstarpu)
  484. LDFLAGS += $(shell pkg-config --libs libstarpu)
  485. CC = gcc
  486. vector_scal: vector_scal.o vector_scal_cpu.o vector_scal_cuda.o vector_scal_opencl.o
  487. %.o: %.cu
  488. nvcc $(CFLAGS) $< -c $@
  489. clean:
  490. rm -f vector_scal *.o
  491. @end smallexample
  492. @end cartouche
  493. @smallexample
  494. % make
  495. @end smallexample
  496. and to execute it, with the default configuration:
  497. @smallexample
  498. % ./vector_scal
  499. 0.000000 3.000000 6.000000 9.000000 12.000000
  500. @end smallexample
  501. or for example, by disabling CPU devices:
  502. @smallexample
  503. % STARPU_NCPUS=0 ./vector_scal
  504. 0.000000 3.000000 6.000000 9.000000 12.000000
  505. @end smallexample
  506. or by disabling CUDA devices (which may permit to enable the use of OpenCL,
  507. see @ref{Enabling OpenCL}):
  508. @smallexample
  509. % STARPU_NCUDA=0 ./vector_scal
  510. 0.000000 3.000000 6.000000 9.000000 12.000000
  511. @end smallexample