starpu.texi 40 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210
  1. \input texinfo @c -*-texinfo-*-
  2. @c %**start of header
  3. @setfilename starpu.info
  4. @settitle StarPU
  5. @c %**end of header
  6. @setchapternewpage odd
  7. @titlepage
  8. @title StarPU
  9. @page
  10. @vskip 0pt plus 1filll
  11. @comment For the @value{version-GCC} Version*
  12. @end titlepage
  13. @summarycontents
  14. @contents
  15. @page
  16. @node Top
  17. @top Preface
  18. @cindex Preface
  19. This manual documents the usage of StarPU
  20. @comment
  21. @comment When you add a new menu item, please keep the right hand
  22. @comment aligned to the same column. Do not use tabs. This provides
  23. @comment better formatting.
  24. @comment
  25. @menu
  26. * Introduction:: A basic introduction to using StarPU.
  27. * Installing StarPU:: How to configure, build and install StarPU.
  28. * Configuration options:: Configurations options
  29. * Environment variables:: Environment variables used by StarPU.
  30. * StarPU API:: The API to use StarPU.
  31. * Basic Examples:: Basic examples of the use of StarPU.
  32. * Advanced Topics:: Advanced use of StarPU.
  33. @end menu
  34. @c ---------------------------------------------------------------------
  35. @c Introduction to StarPU
  36. @c ---------------------------------------------------------------------
  37. @node Introduction
  38. @chapter Introduction to StarPU
  39. @menu
  40. * Motivation:: Why StarPU ?
  41. * StarPU in a Nutshell:: The Fundamentals of StarPU
  42. @end menu
  43. @node Motivation
  44. @section Motivation
  45. @c complex machines with heterogeneous cores/devices
  46. The use of specialized hardware such as accelerators or coprocessors offers an
  47. interesting approach to overcome the physical limits encountered by processor
  48. architects. As a result, many machines are now equipped with one or several
  49. accelerators (eg. a GPU), in addition to the usual processor(s). While a lot of
  50. efforts have been devoted to offload computation onto such accelerators, very
  51. little attention as been paid to portability concerns on the one hand, and to the
  52. possibility of having heterogeneous accelerators and processors to interact on the other hand.
  53. StarPU is a runtime system that offers support for heterogeneous multicore
  54. architectures, it not only offers a unified view of the computational resources
  55. (ie. CPUs and accelerators at the same time), but it also takes care to
  56. efficiently map and execute tasks onto an heterogeneous machine while
  57. transparently handling low-level issues in a portable fashion.
  58. @c this leads to a complicated distributed memory design
  59. @c which is not (easily) manageable by hand
  60. @c added value/benefits of StarPU
  61. @c - portability
  62. @c - scheduling, perf. portability
  63. @node StarPU in a Nutshell
  64. @section StarPU in a Nutshell
  65. From a programming point of view, StarPU is not a new language but a library
  66. that executes tasks explicitly submitted by the application. The data that a
  67. task manipulate are automatically transferred onto the accelerator so that the
  68. programmer does not have to take care of complex data movements. StarPU also
  69. takes particular care of scheduling those tasks efficiently and allows
  70. scheduling experts to implement custom scheduling policies in a portable
  71. fashion.
  72. @c explain the notion of codelet and task (ie. g(A, B)
  73. @subsection Codelet and Tasks
  74. One of StarPU primary data structure is the @b{codelet}. A codelet describes a
  75. computational kernel that can possibly be implemented on multiple architectures
  76. such as a CPU, a CUDA device or a Cell's SPU.
  77. @c TODO insert illustration f : f_spu, f_cpu, ...
  78. Another important data structure is the @b{task}. Executing a StarPU task
  79. consists in applying a codelet on a data set, on one of the architecture on
  80. which the codelet is implemented. In addition to the codelet that a task
  81. implements, it also describes which data are accessed, and how they are
  82. accessed during the computation (read and/or write).
  83. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  84. operation. The task structure can also specify a @b{callback} function that is
  85. called once StarPU has properly executed the task. It also contains optional
  86. fields that the application may use to give hints to the scheduler (such as
  87. priority levels).
  88. A task may be identified by a unique 64-bit number which we refer as a @b{tag}.
  89. Task dependencies can be enforced either by the means of callback functions, or
  90. by expressing dependencies between tags.
  91. @c TODO insert illustration f(Ar, Brw, Cr) + ..
  92. @c DSM
  93. @subsection StarPU Data Management Library
  94. Because StarPU schedules tasks at runtime, data transfers have to be
  95. done automatically and ``just-in-time'' between processing units,
  96. relieving the application programmer from explicit data transfers.
  97. Moreover, to avoid unnecessary transfers, StarPU keeps data
  98. where it was last needed, even if was modified there, and it
  99. allows multiple copies of the same data to reside at the same time on
  100. several processing units as long as it is not modified.
  101. @c ---------------------------------------------------------------------
  102. @c Installing StarPU
  103. @c ---------------------------------------------------------------------
  104. @node Installing StarPU
  105. @chapter Installing StarPU
  106. StarPU can be built and installed by the standard means of the GNU
  107. autotools. The following chapter is intended to briefly remind how these tools
  108. can be used to install StarPU.
  109. @section Configuring StarPU
  110. @subsection Generating Makefiles and configuration scripts
  111. This step is not necessary when using the tarball releases of StarPU. If you
  112. are using the source code from the svn repository, you first need to generate
  113. the configure scripts and the Makefiles.
  114. @example
  115. $ autoreconf -vfi
  116. @end example
  117. @subsection Configuring StarPU
  118. @example
  119. $ ./configure
  120. @end example
  121. @c TODO enumerate the list of interesting options: refer to a specific section
  122. @section Building and Installing StarPU
  123. @subsection Building
  124. @example
  125. $ make
  126. @end example
  127. @subsection Sanity Checks
  128. In order to make sure that StarPU is working properly on the system, it is also
  129. possible to run a test suite.
  130. @example
  131. $ make check
  132. @end example
  133. @subsection Installing
  134. In order to install StarPU at the location that was specified during
  135. configuration:
  136. @example
  137. $ make install
  138. @end example
  139. @subsection pkg-config configuration
  140. It is possible that compiling and linking an application against StarPU
  141. requires to use specific flags or libraries (for instance @code{CUDA} or
  142. @code{libspe2}). Therefore, it is possible to use the @code{pkg-config} tool.
  143. If StarPU was not installed at some standard location, the path of StarPU's
  144. library must be specified in the @code{PKG_CONFIG_PATH} environment variable so
  145. that @code{pkg-config} can find it. So if StarPU was installed in
  146. @code{$(prefix_dir)}:
  147. @example
  148. @c TODO: heu, c'est vraiment du shell ça ? :)
  149. $ PKG_CONFIG_PATH = @{PKG_CONFIG_PATH@}:$(prefix_dir)/lib/
  150. @end example
  151. The flags required to compiled or linked against StarPU are then
  152. accessible with the following commands:
  153. @example
  154. $ pkg-config --cflags libstarpu # options for the compiler
  155. $ pkg-config --libs libstarpu # options for the linker
  156. @end example
  157. @c ---------------------------------------------------------------------
  158. @c Configuration options
  159. @c ---------------------------------------------------------------------
  160. @node Configuration options
  161. @chapter Configuration options
  162. TODO
  163. @c ---------------------------------------------------------------------
  164. @c Environment variables
  165. @c ---------------------------------------------------------------------
  166. @node Environment variables
  167. @chapter Environment variables
  168. @menu
  169. * Workers:: Configuring workers
  170. * Scheduling:: Configuring the Scheduling engine
  171. * Misc:: Miscellaneous and debug
  172. @end menu
  173. TODO, explicit configuration (passed to starpu_init) overrides env variables.
  174. @node Workers
  175. @section Configuring workers
  176. @menu
  177. * NCPUS :: Number of CPU workers
  178. * NCUDA :: Number of CUDA workers
  179. * NGORDON :: Number of SPU workers (Cell)
  180. * WORKERS_CPUID :: Bind workers to specific CPUs
  181. * WORKERS_GPUID :: Select specific CUDA devices
  182. @end menu
  183. @node NCPUS
  184. @subsection @code{NCPUS} -- Number of CPU workers
  185. @table @asis
  186. @item @emph{Description}:
  187. TODO
  188. @end table
  189. @node NCUDA
  190. @subsection @code{NCUDA} -- Number of CUDA workers
  191. @table @asis
  192. @item @emph{Description}:
  193. TODO
  194. @end table
  195. @node NGORDON
  196. @subsection @code{NGORDON} -- Number of SPU workers (Cell)
  197. @table @asis
  198. @item @emph{Description}:
  199. TODO
  200. @end table
  201. @node WORKERS_CPUID
  202. @subsection @code{WORKERS_CPUID} -- Bind workers to specific CPUs
  203. @table @asis
  204. @item @emph{Description}:
  205. TODO
  206. @end table
  207. @node WORKERS_GPUID
  208. @subsection @code{WORKERS_GPUID} -- Select specific CUDA devices
  209. @table @asis
  210. @item @emph{Description}:
  211. TODO
  212. @end table
  213. @node Scheduling
  214. @section Configuring the Scheduling engine
  215. @menu
  216. * SCHED :: Scheduling policy
  217. * CALIBRATE :: Calibrate performance models
  218. * PREFETCH :: Use data prefetch
  219. * STARPU_SCHED_ALPHA :: Computation factor
  220. * STARPU_SCHED_BETA :: Communication factor
  221. @end menu
  222. @node SCHED
  223. @subsection @code{SCHED} -- Scheduling policy
  224. @table @asis
  225. @item @emph{Description}:
  226. TODO
  227. Use @code{SCHED=help} to get the list of available schedulers
  228. @end table
  229. @node CALIBRATE
  230. @subsection @code{CALIBRATE} -- Calibrate performance models
  231. @table @asis
  232. @item @emph{Description}:
  233. TODO
  234. Note: only applies to dm and dmda scheduling policies.
  235. @end table
  236. @node PREFETCH
  237. @subsection @code{PREFETCH} -- Use data prefetch
  238. @table @asis
  239. @item @emph{Description}:
  240. TODO
  241. @end table
  242. @node STARPU_SCHED_ALPHA
  243. @subsection @code{STARPU_SCHED_ALPHA} -- Computation factor
  244. @table @asis
  245. @item @emph{Description}:
  246. TODO
  247. @end table
  248. @node STARPU_SCHED_BETA
  249. @subsection @code{STARPU_SCHED_BETA} -- Communication factor
  250. @table @asis
  251. @item @emph{Description}:
  252. TODO
  253. @end table
  254. @node Misc
  255. @section Miscellaneous and debug
  256. @menu
  257. * LOGFILENAME :: Select debug file name
  258. @end menu
  259. @node LOGFILENAME
  260. @subsection @code{LOGFILENAME} -- Select debug file name
  261. @table @asis
  262. @item @emph{Description}:
  263. TODO
  264. @end table
  265. @c ---------------------------------------------------------------------
  266. @c StarPU API
  267. @c ---------------------------------------------------------------------
  268. @node StarPU API
  269. @chapter StarPU API
  270. @menu
  271. * Initialization and Termination:: Initialization and Termination methods
  272. * Workers' Properties:: Methods to enumerate workers' properties
  273. * Data Library:: Methods to manipulate data
  274. * Codelets and Tasks:: Methods to construct tasks
  275. * Tags:: Task dependencies
  276. @end menu
  277. @node Initialization and Termination
  278. @section Initialization and Termination
  279. @menu
  280. * starpu_init:: Initialize StarPU
  281. * struct starpu_conf:: StarPU runtime configuration
  282. * starpu_shutdown:: Terminate StarPU
  283. @end menu
  284. @node starpu_init
  285. @subsection @code{starpu_init} -- Initialize StarPU
  286. @table @asis
  287. @item @emph{Description}:
  288. This is StarPU initialization method, which must be called prior to any other
  289. StarPU call. It is possible to specify StarPU's configuration (eg. scheduling
  290. policy, number of cores, ...) by passing a non-null argument. Default
  291. configuration is used if the passed argument is @code{NULL}.
  292. @item @emph{Return value}:
  293. Upon successful completion, this function returns 0. Otherwise, @code{-ENODEV}
  294. indicates that no worker was available (so that StarPU was not be initialized).
  295. @item @emph{Prototype}:
  296. @code{int starpu_init(struct starpu_conf *conf);}
  297. @end table
  298. @node struct starpu_conf
  299. @subsection @code{struct starpu_conf} -- StarPU runtime configuration
  300. @table @asis
  301. @item @emph{Description}:
  302. This structure is passed to the @code{starpu_init} function in order configure
  303. StarPU. When the default value is used, StarPU automatically select the number
  304. of processing units and takes the default scheduling policy. This parameters
  305. overwrite the equivalent environnement variables.
  306. @item @emph{Fields}:
  307. @table @asis
  308. @item @code{sched_policy} (default = NULL):
  309. This is the name of the scheduling policy. This can also be specified with the
  310. @code{SCHED} environment variable.
  311. @item @code{ncpus} (default = -1):
  312. This is the maximum number of CPU cores that StarPU can use. This can also be
  313. specified with the @code{NCPUS} environment variable.
  314. @item @code{ncuda} (default = -1):
  315. This is the maximum number of CUDA devices that StarPU can use. This can also be
  316. specified with the @code{NCUDA} environment variable.
  317. @item @code{nspus} (default = -1):
  318. This is the maximum number of Cell SPUs that StarPU can use. This can also be
  319. specified with the @code{NGORDON} environment variable.
  320. @item @code{calibrate} (default = 0):
  321. If this flag is set, StarPU will calibrate the performance models when
  322. executing tasks. This can also be specified with the @code{CALIBRATE}
  323. environment variable.
  324. @end table
  325. @end table
  326. @node starpu_shutdown
  327. @subsection @code{starpu_shutdown} -- Terminate StarPU
  328. @table @asis
  329. @item @emph{Description}:
  330. This is StarPU termination method. It must be called at the end of the
  331. application: statistics and other post-mortem debugging information are not
  332. garanteed to be available until this method has been called.
  333. @item @emph{Prototype}:
  334. @code{void starpu_shutdown(void);}
  335. @end table
  336. @node Workers' Properties
  337. @section Workers' Properties
  338. @menu
  339. * starpu_get_worker_count:: Get the number of processing units
  340. * starpu_get_worker_id:: Get the identifier of the current worker
  341. * starpu_get_worker_type:: Get the type of processing unit associated to a worker
  342. * starpu_get_worker_name:: Get the name of a worker
  343. @end menu
  344. @node starpu_get_worker_count
  345. @subsection @code{starpu_get_worker_count} -- Get the number of processing units
  346. @table @asis
  347. @item @emph{Description}:
  348. This function returns the number of workers (ie. processing units executing
  349. StarPU tasks). The returned value should be at most @code{STARPU_NMAXWORKERS}.
  350. @item @emph{Prototype}:
  351. @code{unsigned starpu_get_worker_count(void);}
  352. @end table
  353. @node starpu_get_worker_id
  354. @subsection @code{starpu_get_worker_id} -- Get the identifier of the current worker
  355. @table @asis
  356. @item @emph{Description}:
  357. This function returns the identifier of the worker associated to the calling
  358. thread. The returned value is either -1 if the current context is not a StarPU
  359. worker (ie. when called from the application outside a task or a callback), or
  360. an integer between 0 and @code{starpu_get_worker_count() - 1}.
  361. @item @emph{Prototype}:
  362. @code{int starpu_get_worker_count(void);}
  363. @end table
  364. @node starpu_get_worker_type
  365. @subsection @code{starpu_get_worker_type} -- Get the type of processing unit associated to a worker
  366. @table @asis
  367. @item @emph{Description}:
  368. This function returns the type of worker associated to an identifier (as
  369. returned by the @code{starpu_get_worker_id} function). The returned value
  370. indicates the architecture of the worker: @code{STARPU_CPU_WORKER} for a CPU
  371. core, @code{STARPU_CUDA_WORKER} for a CUDA device, and
  372. @code{STARPU_GORDON_WORKER} for a Cell SPU. The value returned for an invalid
  373. identifier is unspecified.
  374. @item @emph{Prototype}:
  375. @code{enum starpu_archtype starpu_get_worker_type(int id);}
  376. @end table
  377. @node starpu_get_worker_name
  378. @subsection @code{starpu_get_worker_name} -- Get the name of a worker
  379. @table @asis
  380. @item @emph{Description}:
  381. StarPU associates a unique human readable string to each processing unit. This
  382. function copies at most the @code{maxlen} first bytes of the unique string
  383. associated to a worker identified by its identifier @code{id} into the
  384. @code{dst} buffer. The caller is responsible for ensuring that the @code{dst}
  385. is a valid pointer to a buffer of @code{maxlen} bytes at least. Calling this
  386. function on an invalid identifier results in an unspecified behaviour.
  387. @item @emph{Prototype}:
  388. @code{void starpu_get_worker_name(int id, char *dst, size_t maxlen);}
  389. @end table
  390. @node Data Library
  391. @section Data Library
  392. @c data_handle_t
  393. @c void starpu_delete_data(struct starpu_data_state_t *state);
  394. @c user interaction with the DSM
  395. @c void starpu_sync_data_with_mem(struct starpu_data_state_t *state);
  396. @c void starpu_notify_data_modification(struct starpu_data_state_t *state, uint32_t modifying_node);
  397. @node Codelets and Tasks
  398. @section Codelets and Tasks
  399. @menu
  400. * struct starpu_codelet:: StarPU codelet structure
  401. * struct starpu_task:: StarPU task structure
  402. * starpu_task_init:: Initialize a Task
  403. * starpu_task_create:: Allocate and Initialize a Task
  404. * starpu_task_destroy:: Destroy a dynamically allocated Task
  405. * starpu_submit_task:: Submit a Task
  406. * starpu_wait_task:: Wait for the termination of a Task
  407. * starpu_wait_all_tasks:: Wait for the termination of all Tasks
  408. @end menu
  409. @c struct starpu_task
  410. @c struct starpu_codelet
  411. @node struct starpu_codelet
  412. @subsection @code{struct starpu_codelet} -- StarPU codelet structure
  413. @table @asis
  414. @item @emph{Description}:
  415. The codelet structure describes a kernel that is possibly implemented on
  416. various targets.
  417. @item @emph{Fields}:
  418. @table @asis
  419. @item @code{where}:
  420. Indicates which types of processing units are able to execute that codelet.
  421. @code{CPU|CUDA} for instance indicates that the codelet is implemented for
  422. both CPU cores and CUDA devices while @code{GORDON} indicates that it is only
  423. available on Cell SPUs.
  424. @item @code{cpu_func} (optionnal):
  425. Is a function pointer to the CPU implementation of the codelet. Its prototype
  426. must be: @code{void cpu_func(starpu_data_interface_t *descr, void *arg)}. The
  427. first argument being the array of data managed by the data management library,
  428. and the second argument is a pointer to the argument (possibly a copy of it)
  429. passed from the @code{.cl_arg} field of the @code{starpu_task} structure. This
  430. pointer is ignored if @code{CPU} does not appear in the @code{.where} field,
  431. it must be non-null otherwise.
  432. @item @code{cuda_func} (optionnal):
  433. Is a function pointer to the CUDA implementation of the codelet. @emph{This
  434. must be a host-function written in the CUDA runtime API}. Its prototype must
  435. be: @code{void cuda_func(starpu_data_interface_t *descr, void *arg);}. This
  436. pointer is ignored if @code{CUDA} does not appear in the @code{.where} field,
  437. it must be non-null otherwise.
  438. @item @code{gordon_func} (optionnal):
  439. This is the index of the Cell SPU implementation within the Gordon library.
  440. TODO
  441. @item @code{nbuffers}:
  442. Specifies the number of arguments taken by the codelet. These arguments are
  443. managed by the DSM and are accessed from the @code{starpu_data_interface_t *}
  444. array. The constant argument passed with the @code{.cl_arg} field of the
  445. @code{starpu_task} structure is not counted in this number. This value should
  446. not be above @code{STARPU_NMAXBUFS}.
  447. @item @code{model} (optionnal):
  448. This is a pointer to the performance model associated to this codelet. This
  449. optionnal field is ignored when null. TODO
  450. @end table
  451. @end table
  452. @node struct starpu_task
  453. @subsection @code{struct starpu_task} -- StarPU task structure
  454. @table @asis
  455. @item @emph{Description}:
  456. The starpu_task structure describes a task that can be offloaded on the various
  457. processing units managed by StarPU. It instanciates a codelet. It can either be
  458. allocated dynamically with the @code{starpu_task_create} method, or declared
  459. statically. In the latter case, the programmer has to zero the
  460. @code{starpu_task} structure and to fill the different fields properly. The
  461. indicated default values correspond to the configuration of a task allocated
  462. with @code{starpu_task_create}.
  463. @item @emph{Fields}:
  464. @table @asis
  465. @item @code{cl}:
  466. Is a pointer to the corresponding @code{starpu_codelet} data structure. This
  467. describes where the kernel should be executed, and supplies the appropriate
  468. implementations. When set to @code{NULL}, no code is executed during the tasks,
  469. such empty tasks can be useful for synchronization purposes.
  470. @item @code{buffers}:
  471. TODO
  472. @item @code{cl_arg} (optional) (default = NULL):
  473. TODO
  474. @item @code{cl_arg_size} (optional):
  475. TODO
  476. @c ignored if only executable on CPUs or CUDA ...
  477. @item @code{callback_func} (optional) (default = @code{NULL}):
  478. This is a function pointer of prototype @code{void (*f)(void *)} which
  479. specifies a possible callback. If that pointer is non-null, the callback
  480. function is executed @emph{on the host} after the execution of the task. The
  481. callback is passed the value contained in the @code{callback_arg} field. No
  482. callback is executed if that field is null.
  483. @item @code{callback_arg} (optional) (default = @code{NULL}):
  484. This is the pointer passed to the callback function. This field is ignored if
  485. the @code{callback_func} is null.
  486. @item @code{use_tag} (optional) (default = 0):
  487. If set, this flag indicates that the task should be associated with the tag
  488. conained in the @code{tag_id} field. Tag allow the application to synchronize
  489. with the task and to express task dependencies easily.
  490. @item @code{tag_id}:
  491. This fields contains the tag associated to the tag if the @code{use_tag} field
  492. was set, it is ignored otherwise.
  493. @item @code{synchronous}:
  494. If this flag is set, the @code{starpu_submit_task} function is blocking and
  495. returns only when the task has been executed (or if no worker is able to
  496. process the task). Otherwise, @code{starpu_submit_task} returns immediately.
  497. @item @code{priority} (optionnal) (default = @code{DEFAULT_PRIO}):
  498. This field indicates a level of priority for the task. This is an integer value
  499. that must be selected between @code{MIN_PRIO} (for the least important tasks)
  500. and @code{MAX_PRIO} (for the most important tasks) included. Default priority
  501. is @code{DEFAULT_PRIO}. Scheduling strategies that take priorities into
  502. account can use this parameter to take better scheduling decisions, but the
  503. scheduling policy may also ignore it.
  504. @item @code{execute_on_a_specific_worker} (default = 0):
  505. If this flag is set, StarPU will bypass the scheduler and directly affect this
  506. task to the worker specified by the @code{workerid} field.
  507. @item @code{workerid} (optional):
  508. If the @code{execute_on_a_specific_worker} field is set, this field indicates
  509. which is the identifier of the worker that should process this task (as
  510. returned by @code{starpu_get_worker_id}). This field is ignored if
  511. @code{execute_on_a_specific_worker} field is set to 0.
  512. @item @code{detach} (optional) (default = 1):
  513. If this flag is set, it is not possible to synchronize with the task
  514. by the means of @code{starpu_wait_task} later on. Internal data structures
  515. are only garanteed to be liberated once @code{starpu_wait_task} is called
  516. if that flag is not set.
  517. @item @code{destroy} (optional) (default = 1):
  518. If that flag is set, the task structure will automatically be liberated, either
  519. after the execution of the callback if the task is detached, or during
  520. @code{starpu_task_wait} otherwise. If this flag is not set, dynamically allocated data
  521. structures will not be liberated until @code{starpu_task_destroy} is called
  522. explicitely. Setting this flag for a statically allocated task structure will
  523. result in undefined behaviour.
  524. @end table
  525. @end table
  526. @node starpu_task_init
  527. @subsection @code{starpu_task_init} -- Initialize a Task
  528. @table @asis
  529. @item @emph{Description}:
  530. TODO
  531. @item @emph{Prototype}:
  532. @code{void starpu_task_init(struct starpu_task *task);}
  533. @end table
  534. @node starpu_task_create
  535. @subsection @code{starpu_task_create} -- Allocate and Initialize a Task
  536. @table @asis
  537. @item @emph{Description}:
  538. TODO
  539. (Describe the different default fields ...)
  540. @item @emph{Prototype}:
  541. @code{struct starpu_task *starpu_task_create(void);}
  542. @end table
  543. @node starpu_task_destroy
  544. @subsection @code{starpu_task_destroy} -- Destroy a dynamically allocated Task
  545. @table @asis
  546. @item @emph{Description}:
  547. Liberate the ressource allocated during starpu_task_create. This function can
  548. be called automatically after the execution of a task by setting the
  549. @code{.destroy} flag of the @code{starpu_task} structure (default behaviour).
  550. Calling this function on a statically allocated task results in an undefined
  551. behaviour.
  552. @item @emph{Prototype}:
  553. @code{void starpu_task_destroy(struct starpu_task *task);}
  554. @end table
  555. @node starpu_wait_task
  556. @subsection @code{starpu_wait_task} -- Wait for the termination of a Task
  557. @table @asis
  558. @item @emph{Description}:
  559. This function blocks until the task was executed. It is not possible to
  560. synchronize with a task more than once. It is not possible to wait
  561. synchronous or detached tasks.
  562. @item @emph{Return value}:
  563. Upon successful completion, this function returns 0. Otherwise, @code{-EINVAL}
  564. indicates that the waited task was either synchronous or detached.
  565. @item @emph{Prototype}:
  566. @code{int starpu_wait_task(struct starpu_task *task);}
  567. @end table
  568. @node starpu_submit_task
  569. @subsection @code{starpu_submit_task} -- Submit a Task
  570. @table @asis
  571. @item @emph{Description}:
  572. This function submits task @code{task} to StarPU. Calling this function does
  573. not mean that the task will be executed immediatly as there can be data or task
  574. (tag) dependencies that are not fulfilled yet: StarPU will take care to
  575. schedule this task with respect to such dependencies.
  576. This function returns immediately if the @code{synchronous} field of the
  577. @code{starpu_task} structure was set to 0, and block until the termination of
  578. the task otherwise. It is also possible to synchronize the application with
  579. asynchronous tasks by the means of tags, using the @code{starpu_tag_wait}
  580. function for instance.
  581. In case of success, this function returns 0, a return value of @code{-ENODEV}
  582. means that there is no worker able to process that task (eg. there is no GPU
  583. available and this task is only implemented on top of CUDA).
  584. @item @emph{Prototype}:
  585. @code{int starpu_submit_task(struct starpu_task *task);}
  586. @end table
  587. @node starpu_wait_all_tasks
  588. @subsection @code{starpu_wait_all_tasks} -- Wait for the termination of all Tasks
  589. @table @asis
  590. @item @emph{Description}:
  591. This function blocks until all the tasks that were submitted are terminated.
  592. @item @emph{Prototype}:
  593. @code{void starpu_wait_all_tasks(void);}
  594. @end table
  595. @c Callbacks : what can we put in callbacks ?
  596. @node Tags
  597. @section Tags
  598. @menu
  599. * starpu_tag_t:: Task identifier
  600. * starpu_tag_declare_deps:: Declare the Dependencies of a Tag
  601. * starpu_tag_declare_deps_array:: Declare the Dependencies of a Tag
  602. * starpu_tag_wait:: Block until a Tag is terminated
  603. * starpu_tag_wait_array:: Block until a set of Tags is terminated
  604. * starpu_tag_remove:: Destroy a Tag
  605. * starpu_tag_notify_from_apps:: Feed a tag explicitely
  606. @end menu
  607. @node starpu_tag_t
  608. @subsection @code{starpu_tag_t} -- Task identifier
  609. @table @asis
  610. @item @emph{Description}:
  611. It is possible to associate a task with a unique "tag" and to express
  612. dependencies between tasks by the means of those tags. To do so, fill the
  613. @code{tag_id} field of the @code{starpu_task} structure with a tag number (can
  614. be arbitrary) and set the @code{use_tag} field to 1.
  615. If @code{starpu_tag_declare_deps} is called with that tag number, the task will
  616. not be started until the task which wears the declared dependency tags are
  617. complete.
  618. @end table
  619. @node starpu_tag_declare_deps
  620. @subsection @code{starpu_tag_declare_deps} -- Declare the Dependencies of a Tag
  621. @table @asis
  622. @item @emph{Description}:
  623. Specify the dependencies of the task identified by tag @code{id}. The first
  624. argument specifies the tag which is configured, the second argument gives the
  625. number of tag(s) on which @code{id} depends. The following arguments are the
  626. tags which have to terminated to unlock the task.
  627. This function must be called before the associated task is submitted to StarPU
  628. with @code{starpu_submit_task}.
  629. @item @emph{Remark}
  630. Because of the variable arity of @code{starpu_tag_declare_deps}, note that the
  631. last arguments @emph{must} be of type @code{starpu_tag_t}: constant values
  632. typically need to be explicitely casted. Using the
  633. @code{starpu_tag_declare_deps_array} function avoids this hazard.
  634. @item @emph{Prototype}:
  635. @code{void starpu_tag_declare_deps(starpu_tag_t id, unsigned ndeps, ...);}
  636. @item @emph{Example}:
  637. @example
  638. @c @cartouche
  639. /* Tag 0x1 depends on tags 0x32 and 0x52 */
  640. starpu_tag_declare_deps((starpu_tag_t)0x1,
  641. 2, (starpu_tag_t)0x32, (starpu_tag_t)0x52);
  642. @c @end cartouche
  643. @end example
  644. @end table
  645. @node starpu_tag_declare_deps_array
  646. @subsection @code{starpu_tag_declare_deps_array} -- Declare the Dependencies of a Tag
  647. @table @asis
  648. @item @emph{Description}:
  649. This function is similar to @code{starpu_tag_declare_deps}, except that its
  650. does not take a variable number of arguments but an array of tags of size
  651. @code{ndeps}.
  652. @item @emph{Prototype}:
  653. @code{void starpu_tag_declare_deps_array(starpu_tag_t id, unsigned ndeps, starpu_tag_t *array);}
  654. @item @emph{Example}:
  655. @example
  656. @c @cartouche
  657. /* Tag 0x1 depends on tags 0x32 and 0x52 */
  658. starpu_tag_t tag_array[2] = @{0x32, 0x52@};
  659. starpu_tag_declare_deps((starpu_tag_t)0x1, 2, tag_array);
  660. @c @end cartouche
  661. @end example
  662. @end table
  663. @node starpu_tag_wait
  664. @subsection @code{starpu_tag_wait} -- Block until a Tag is terminated
  665. @table @asis
  666. @item @emph{Description}:
  667. This function blocks until the task associated to tag @code{id} has been
  668. executed. This is a blocking call which must therefore not be called within
  669. tasks or callbacks, but only from the application directly. It is possible to
  670. synchronize with the same tag multiple times, as long as the
  671. @code{starpu_tag_remove} function is not called. Note that it is still
  672. possible to synchronize wih a tag associated to a task which @code{starpu_task}
  673. data structure was liberated (eg. if the @code{destroy} flag of the
  674. @code{starpu_task} was enabled).
  675. @item @emph{Prototype}:
  676. @code{void starpu_tag_wait(starpu_tag_t id);}
  677. @end table
  678. @node starpu_tag_wait_array
  679. @subsection @code{starpu_tag_wait_array} -- Block until a set of Tags is terminated
  680. @table @asis
  681. @item @emph{Description}:
  682. This function is similar to @code{starpu_tag_wait} except that it blocks until
  683. @emph{all} the @code{ntags} tags contained in the @code{id} array are
  684. terminated.
  685. @item @emph{Prototype}:
  686. @code{void starpu_tag_wait_array(unsigned ntags, starpu_tag_t *id);}
  687. @end table
  688. @node starpu_tag_remove
  689. @subsection @code{starpu_tag_remove} -- Destroy a Tag
  690. @table @asis
  691. @item @emph{Description}:
  692. This function release the resources associated to tag @code{id}. It can be
  693. called once the corresponding task has been executed and when there is no tag
  694. that depend on that one anymore.
  695. @item @emph{Prototype}:
  696. @code{void starpu_tag_remove(starpu_tag_t id);}
  697. @end table
  698. @node starpu_tag_notify_from_apps
  699. @subsection @code{starpu_tag_notify_from_apps} -- Feed a Tag explicitely
  700. @table @asis
  701. @item @emph{Description}:
  702. This function explicitely unlocks tag @code{id}. It may be useful in the
  703. case of applications which execute part of their computation outside StarPU
  704. tasks (eg. third-party libraries). It is also provided as a
  705. convenient tool for the programmer, for instance to entirely construct the task
  706. DAG before actually giving StarPU the opportunity to execute the tasks.
  707. @item @emph{Prototype}:
  708. @code{void starpu_tag_notify_from_apps(starpu_tag_t id);}
  709. @end table
  710. @section Extensions
  711. @subsection CUDA extensions
  712. @c void starpu_malloc_pinned_if_possible(float **A, size_t dim);
  713. @subsection Cell extensions
  714. @c ---------------------------------------------------------------------
  715. @c Basic Examples
  716. @c ---------------------------------------------------------------------
  717. @node Basic Examples
  718. @chapter Basic Examples
  719. @menu
  720. * Compiling and linking:: Compiling and Linking Options
  721. * Hello World:: Submitting Tasks
  722. * Scaling a Vector:: Manipulating Data
  723. * Scaling a Vector (hybrid):: Handling Heterogeneous Architectures
  724. @end menu
  725. @node Compiling and linking
  726. @section Compiling and linking options
  727. The Makefile could for instance contain the following lines to define which
  728. options must be given to the compiler and to the linker:
  729. @example
  730. @c @cartouche
  731. CFLAGS+=$$(pkg-config --cflags libstarpu)
  732. LIBS+=$$(pkg-config --libs libstarpu)
  733. @c @end cartouche
  734. @end example
  735. @node Hello World
  736. @section Hello World
  737. In this section, we show how to implement a simple program that submits a task to StarPU.
  738. @subsection Required Headers
  739. The @code{starpu.h} header should be included in any code using StarPU.
  740. @example
  741. @c @cartouche
  742. #include <starpu.h>
  743. @c @end cartouche
  744. @end example
  745. @subsection Defining a Codelet
  746. @example
  747. @c @cartouche
  748. void cpu_func(starpu_data_interface_t *buffers, void *func_arg)
  749. @{
  750. float *array = func_arg;
  751. printf("Hello world (array = @{%f, %f@} )\n", array[0], array[1]);
  752. @}
  753. starpu_codelet cl =
  754. @{
  755. .where = CPU,
  756. .cpu_func = cpu_func,
  757. .nbuffers = 0
  758. @};
  759. @c @end cartouche
  760. @end example
  761. A codelet is a structure that represents a computational kernel. Such a codelet
  762. may contain an implementation of the same kernel on different architectures
  763. (eg. CUDA, Cell's SPU, x86, ...).
  764. The ''@code{.nbuffers}'' field specifies the number of data buffers that are
  765. manipulated by the codelet: here the codelet does not access or modify any data
  766. that is controlled by our data management library. Note that the argument
  767. passed to the codelet (the ''@code{.cl_arg}'' field of the @code{starpu_task}
  768. structure) does not count as a buffer since it is not managed by our data
  769. management library.
  770. @c TODO need a crossref to the proper description of "where" see bla for more ...
  771. We create a codelet which may only be executed on the CPUs. The ''@code{.where}''
  772. field is a bitmask that defines where the codelet may be executed. Here, the
  773. @code{CPU} value means that only CPUs can execute this codelet
  774. (@pxref{Codelets and Tasks} for more details on that field).
  775. When a CPU core executes a codelet, it calls the @code{.cpu_func} function,
  776. which @emph{must} have the following prototype:
  777. @code{void (*cpu_func)(starpu_data_interface_t *, void *)}
  778. In this example, we can ignore the first argument of this function which gives a
  779. description of the input and output buffers (eg. the size and the location of
  780. the matrices). The second argument is a pointer to a buffer passed as an
  781. argument to the codelet by the means of the ''@code{.cl_arg}'' field of the
  782. @code{starpu_task} structure. Be aware that this may be a pointer to a
  783. @emph{copy} of the actual buffer, and not the pointer given by the programmer:
  784. if the codelet modifies this buffer, there is no garantee that the initial
  785. buffer will be modified as well: this for instance implies that the buffer
  786. cannot be used as a synchronization medium.
  787. @subsection Submitting a Task
  788. @example
  789. @c @cartouche
  790. void callback_func(void *callback_arg)
  791. @{
  792. printf("Callback function (arg %x)\n", callback_arg);
  793. @}
  794. int main(int argc, char **argv)
  795. @{
  796. /* initialize StarPU */
  797. starpu_init(NULL);
  798. struct starpu_task *task = starpu_task_create();
  799. task->cl = &cl;
  800. float array[2] = @{1.0f, -1.0f@};
  801. task->cl_arg = &array;
  802. task->cl_arg_size = 2*sizeof(float);
  803. task->callback_func = callback_func;
  804. task->callback_arg = 0x42;
  805. /* starpu_submit_task will be a blocking call */
  806. task->synchronous = 1;
  807. /* submit the task to StarPU */
  808. starpu_submit_task(task);
  809. /* terminate StarPU */
  810. starpu_shutdown();
  811. return 0;
  812. @}
  813. @c @end cartouche
  814. @end example
  815. Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
  816. @code{NULL} argument specifies that we use default configuration. Tasks cannot
  817. be submitted after the termination of StarPU by a call to
  818. @code{starpu_shutdown}.
  819. In the example above, a task structure is allocated by a call to
  820. @code{starpu_task_create}. This function only allocates and fills the
  821. corresponding structure with the default settings (@pxref{starpu_task_create}),
  822. but it does not submit the task to StarPU.
  823. @c not really clear ;)
  824. The ''@code{.cl}'' field is a pointer to the codelet which the task will
  825. execute: in other words, the codelet structure describes which computational
  826. kernel should be offloaded on the different architectures, and the task
  827. structure is a wrapper containing a codelet and the piece of data on which the
  828. codelet should operate.
  829. The optional ''@code{.cl_arg}'' field is a pointer to a buffer (of size
  830. @code{.cl_arg_size}) with some parameters for the kernel
  831. described by the codelet. For instance, if a codelet implements a computational
  832. kernel that multiplies its input vector by a constant, the constant could be
  833. specified by the means of this buffer.
  834. Once a task has been executed, an optional callback function can be called.
  835. While the computational kernel could be offloaded on various architectures, the
  836. callback function is always executed on a CPU. The ''@code{.callback_arg}''
  837. pointer is passed as an argument of the callback. The prototype of a callback
  838. function must be:
  839. @example
  840. void (*callback_function)(void *);
  841. @end example
  842. If the @code{.synchronous} field is non-null, task submission will be
  843. synchronous: the @code{starpu_submit_task} function will not return until the
  844. task was executed. Note that the @code{starpu_shutdown} method does not
  845. guarantee that asynchronous tasks have been executed before it returns.
  846. @node Scaling a Vector
  847. @section Manipulating Data: Scaling a Vector
  848. The previous example has shown how to submit tasks. In this section we show how
  849. StarPU tasks can manipulate data.
  850. Programmers can describe the data layout of their application so that StarPU is
  851. responsible for enforcing data coherency and availability accross the machine.
  852. Instead of handling complex (and non-portable) mechanisms to perform data
  853. movements, programmers only declare which piece of data is accessed and/or
  854. modified by a task, and StarPU makes sure that when a computational kernel
  855. starts somewhere (eg. on a GPU), its data are available locally.
  856. Before submitting those tasks, the programmer first needs to declare the
  857. different pieces of data to StarPU using the @code{starpu_register_*_data}
  858. functions. To ease the development of applications for StarPU, it is possible
  859. to describe multiple types of data layout. A type of data layout is called an
  860. @b{interface}. By default, there are different interfaces available in StarPU:
  861. here we will consider the @b{vector interface}.
  862. The following lines show how to declare an array of @code{n} elements of type
  863. @code{float} using the vector interface:
  864. @example
  865. float tab[n];
  866. starpu_data_handle tab_handle;
  867. starpu_register_vector_data(&tab_handle, 0, tab, n, sizeof(float));
  868. @end example
  869. The first argument, called the @b{data handle}, is an opaque pointer which
  870. designates the array in StarPU. This is also the structure which is used to
  871. describe which data is used by a task.
  872. @c TODO: what is 0 ?
  873. It is possible to construct a StarPU
  874. task that multiplies this vector by a constant factor:
  875. @example
  876. float factor;
  877. struct starpu_task *task = starpu_task_create();
  878. task->cl = &cl;
  879. task->buffers[0].handle = tab_handle;
  880. task->buffers[0].mode = STARPU_RW;
  881. task->cl_arg = &factor;
  882. task->cl_arg_size = sizeof(float);
  883. @end example
  884. Since the factor is constant, it does not need a preliminary declaration, and
  885. can just be passed through the @code{cl_arg} pointer like in the previous
  886. example. The vector parameter is described by its handle.
  887. There are two fields in each element of the @code{buffers} array.
  888. @code{.handle} is the handle of the data, and @code{.mode} specifies how the
  889. kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
  890. write-only and @code{STARPU_RW} for read and write access).
  891. The definition of the codelet can be written as follows:
  892. @example
  893. void scal_func(starpu_data_interface_t *buffers, void *arg)
  894. @{
  895. unsigned i;
  896. float *factor = arg;
  897. /* length of the vector */
  898. unsigned n = buffers[0].vector.nx;
  899. /* local copy of the vector pointer */
  900. float *val = (float *)buffers[0].vector.ptr;
  901. for (i = 0; i < n; i++)
  902. val[i] *= *factor;
  903. @}
  904. starpu_codelet cl = @{
  905. .where = CPU,
  906. .cpu_func = scal_func,
  907. .nbuffers = 1
  908. @};
  909. @end example
  910. The second argument of the @code{scal_func} function contains a pointer to the
  911. parameters of the codelet (given in @code{task->cl_arg}), so that we read the
  912. constant factor from this pointer. The first argument is an array that gives
  913. a description of every buffers passed in the @code{task->buffers}@ array, the
  914. number of which is given by the @code{.nbuffers} field of the codelet structure.
  915. In the @b{vector interface}, the location of the vector (resp. its length)
  916. is accessible in the @code{.vector.ptr} (resp. @code{.vector.nx}) of this
  917. array. Since the vector is accessed in a read-write fashion, any modification
  918. will automatically affect future accesses to that vector made by other tasks.
  919. @node Scaling a Vector (hybrid)
  920. @section Vector Scaling on an Hybrid CPU/GPU Machine
  921. Contrary to the previous examples, the task submitted in the example may not
  922. only be executed by the CPUs, but also by a CUDA device.
  923. TODO
  924. @c ---------------------------------------------------------------------
  925. @c Advanced Topics
  926. @c ---------------------------------------------------------------------
  927. @node Advanced Topics
  928. @chapter Advanced Topics
  929. @bye