starpu.texi 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656
  1. \input texinfo @c -*-texinfo-*-
  2. @c %**start of header
  3. @setfilename starpu.info
  4. @settitle StarPU
  5. @c %**end of header
  6. @setchapternewpage odd
  7. @titlepage
  8. @title StarPU
  9. @page
  10. @vskip 0pt plus 1filll
  11. @comment For the @value{version-GCC} Version*
  12. @end titlepage
  13. @summarycontents
  14. @contents
  15. @page
  16. @node Top
  17. @top Preface
  18. @cindex Preface
  19. This manual documents the usage of StarPU
  20. @comment
  21. @comment When you add a new menu item, please keep the right hand
  22. @comment aligned to the same column. Do not use tabs. This provides
  23. @comment better formatting.
  24. @comment
  25. @menu
  26. * Introduction:: A basic introduction to using StarPU.
  27. * Installing StarPU:: How to configure, build and install StarPU
  28. * StarPU API:: The API to use StarPU
  29. * Basic Examples:: Basic examples of the use of StarPU
  30. * Advanced Topics:: Advanced use of StarPU
  31. @end menu
  32. @c ---------------------------------------------------------------------
  33. @c Introduction to StarPU
  34. @c ---------------------------------------------------------------------
  35. @node Introduction
  36. @chapter Introduction to StarPU
  37. @menu
  38. * Motivation:: Why StarPU ?
  39. * StarPU in a Nutshell:: The Fundamentals of StarPU
  40. @end menu
  41. @node Motivation
  42. @section Motivation
  43. @c complex machines with heterogeneous cores/devices
  44. The use of specialized hardware such as accelerators or coprocessors offers an
  45. interesting approach to overcome the physical limits encountered by processor
  46. architects. As a result, many machines are now equipped with one or several
  47. accelerators (eg. a GPU), in addition to the usual processor(s). While a lot of
  48. efforts have been devoted to offload computation onto such accelerators, very
  49. few attention as been paid to portability concerns on the one hand, and to the
  50. possibility of having heterogeneous accelerators and processors to interact.
  51. StarPU is a runtime system that offers support for heterogeneous multicore
  52. architectures, it not only offers a unified view of the computational resources
  53. (ie. CPUs and accelerators at the same time), but it also takes care to
  54. efficiently map and execute tasks onto an heterogeneous machine while
  55. transparently handling low-level issues in a portable fashion.
  56. @c this leads to a complicated distributed memory design
  57. @c which is not (easily) manageable by hand
  58. @c added value/benefits of StarPU
  59. @c - portability
  60. @c - scheduling, perf. portability
  61. @node StarPU in a Nutshell
  62. @section StarPU in a Nutshell
  63. From a programming point of view, StarPU is not a new language but a library
  64. that execute tasks explicitly submitted by the application. The data that a
  65. task manipulate are automatically transferred onto the accelerators so that the
  66. programmer does not have to take care of complex data movements. StarPU also
  67. takes particular care of scheduling those tasks efficiently and allows
  68. scheduling experts to implement custom scheduling policies in a portable
  69. fashion.
  70. @c explain the notion of codelet and task (ie. g(A, B)
  71. @subsection Codelet and Tasks
  72. One of StarPU primary data structure is the @b{codelet}. A codelet describes a
  73. computational kernel that can possibly be implemented on multiple architectures
  74. such as a CPU, a CUDA device or a Cell's SPU.
  75. @c TODO insert illustration f : f_spu, f_cpu, ...
  76. Another important data structure is the @b{task}. Executing a StarPU task
  77. consists in applying a codelet on a data set on one of the architecture on
  78. which the codelet is implemented. In addition to the codelet that a task
  79. implements, it also describes which data are accessed, and how they are
  80. accessed during the computation (read and/or write).
  81. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  82. operation. The task structure can also specify a @b{callback} function that is
  83. called once StarPU has properly executed the task. It also contains optional
  84. fields that the application may use to give hints to the scheduler (such as
  85. priority levels).
  86. A task may be identified by a unique 64-bit number which we refer as a @b{tag}.
  87. Task dependencies can be enforced either by the means of callback functions, or
  88. by expressing dependencies between tags.
  89. @c TODO insert illustration f(Ar, Brw, Cr) + ..
  90. @c DSM
  91. @subsection StarPU Data Management Library
  92. @c ---------------------------------------------------------------------
  93. @c Installing StarPU
  94. @c ---------------------------------------------------------------------
  95. @node Installing StarPU
  96. @chapter Installing StarPU
  97. StarPU can be built and installed by the standard means of the GNU
  98. autotools. The following chapter is intended to briefly remind how these tools
  99. can be used to install StarPU.
  100. @section Configuring StarPU
  101. @subsection Generating Makefiles and configuration scripts
  102. This step is not necessary when using the tarball releases of StarPU. If you
  103. are using the source code from the svn repository, you first need to generate
  104. the configure scripts and the Makefiles.
  105. @example
  106. $ autoreconf -i
  107. @end example
  108. @subsection Configuring StarPU
  109. @example
  110. $ ./configure
  111. @end example
  112. @c TODO enumerate the list of interesting options
  113. @section Building and Installing StarPU
  114. @subsection Building
  115. @example
  116. $ make
  117. @end example
  118. @subsection Sanity Checks
  119. In order to make sure that StarPU is working properly on the system, it is also
  120. possible to run a test suite.
  121. @example
  122. $ make check
  123. @end example
  124. @subsection Installing
  125. In order to install StarPU at the location that was specified during
  126. configuration:
  127. @example
  128. # make install
  129. @end example
  130. @subsection pkg-config configuration
  131. It is possible that compiling and linking an application against StarPU
  132. requires to use specific flags or libraries (for instance @code{CUDA} or
  133. @code{libspe2}). Therefore, it is possible to use the @code{pkg-config} tool.
  134. If StarPU was not installed at some standard location, the path of StarPU's
  135. library must be specified in the @code{PKG_CONFIG_PATH} environment variable so
  136. that @code{pkg-config} can find it. So if StarPU was installed in
  137. @code{$(prefix_dir)}:
  138. @example
  139. @c TODO: heu, c'est vraiment du shell ça ? :)
  140. $ PKG_CONFIG_PATH = @{PKG_CONFIG_PATH@}:$(prefix_dir)/lib/
  141. @end example
  142. The flags required to compiled or linked against StarPU are then
  143. accessible with the following commands:
  144. @example
  145. $ pkg-config --cflags libstarpu # options for the compiler
  146. $ pkg-config --libs libstarpu # options for the linker
  147. @end example
  148. @c ---------------------------------------------------------------------
  149. @c StarPU API
  150. @c ---------------------------------------------------------------------
  151. @node StarPU API
  152. @chapter StarPU API
  153. @menu
  154. * Initialization and Termination:: Initialization and Termination methods
  155. * Data Library:: Methods to manipulate data
  156. * Codelets and Tasks:: Methods to construct tasks
  157. * Tags:: Task dependencies
  158. @end menu
  159. @node Initialization and Termination
  160. @section Initialization and Termination
  161. @menu
  162. * starpu_init:: Initialize StarPU
  163. * struct starpu_conf:: StarPU runtime configuration
  164. * starpu_shutdown:: Terminate StarPU
  165. @end menu
  166. @node starpu_init
  167. @subsection @code{starpu_init} -- Initialize StarPU
  168. @table @asis
  169. @item @emph{Description}:
  170. This is StarPU initialization method, which must be called prior to any other
  171. StarPU call. It is possible to specify StarPU's configuration (eg. scheduling
  172. policy, number of cores, ...) by passing a non-null argument. Default
  173. configuration is used if the passed argument is @code{NULL}.
  174. @item @emph{Prototype}:
  175. @code{void starpu_init(struct starpu_conf *conf);}
  176. @end table
  177. @node struct starpu_conf
  178. @subsection @code{struct starpu_conf} -- StarPU runtime configuration
  179. @table @asis
  180. @item @emph{Description}:
  181. TODO
  182. @item @emph{Definition}:
  183. TODO
  184. @end table
  185. @node starpu_shutdown
  186. @subsection @code{starpu_shutdown} -- Terminate StarPU
  187. @table @asis
  188. @item @emph{Description}:
  189. This is StarPU termination method. It must be called at the end of the
  190. application: statistics and other post-mortem debugging information are not
  191. garanteed to be available until this method has been called.
  192. @item @emph{Prototype}:
  193. @code{void starpu_shutdown(void);}
  194. @end table
  195. @node Data Library
  196. @section Data Library
  197. @c data_handle_t
  198. @c void starpu_delete_data(struct starpu_data_state_t *state);
  199. @c user interaction with the DSM
  200. @c void starpu_sync_data_with_mem(struct starpu_data_state_t *state);
  201. @c void starpu_notify_data_modification(struct starpu_data_state_t *state, uint32_t modifying_node);
  202. @node Codelets and Tasks
  203. @section Codelets and Tasks
  204. @menu
  205. * starpu_task_create:: Allocate and Initialize a Task
  206. @end menu
  207. @c struct starpu_task
  208. @c struct starpu_codelet
  209. @node starpu_task_create
  210. @subsection @code{starpu_task_create} -- Allocate and Initialize a Task
  211. @table @asis
  212. @item @emph{Description}:
  213. TODO
  214. @item @emph{Prototype}:
  215. @code{struct starpu_task *starpu_task_create(void);}
  216. @end table
  217. @c Callbacks : what can we put in callbacks ?
  218. @node Tags
  219. @section Tags
  220. @menu
  221. * starpu_tag_t:: Task identifier
  222. * starpu_tag_declare_deps:: Declare the Dependencies of a Tag
  223. * starpu_tag_declare_deps_array:: Declare the Dependencies of a Tag
  224. * starpu_tag_wait:: Block until a Tag is terminated
  225. * starpu_tag_wait_array:: Block until a set of Tags is terminated
  226. * starpu_tag_remove:: Destroy a Tag
  227. @end menu
  228. @node starpu_tag_t
  229. @subsection @code{starpu_tag_t} -- Task identifier
  230. @c mention the tag_id field of the task structure
  231. @table @asis
  232. @item @emph{Definition}:
  233. TODO
  234. @end table
  235. @node starpu_tag_declare_deps
  236. @subsection @code{starpu_tag_declare_deps} -- Declare the Dependencies of a Tag
  237. @table @asis
  238. @item @emph{Description}:
  239. TODO
  240. @item @emph{Prototype}:
  241. @code{void starpu_tag_declare_deps(starpu_tag_t id, unsigned ndeps, ...);}
  242. @end table
  243. @node starpu_tag_declare_deps_array
  244. @subsection @code{starpu_tag_declare_deps_array} -- Declare the Dependencies of a Tag
  245. @table @asis
  246. @item @emph{Description}:
  247. TODO
  248. @item @emph{Prototype}:
  249. @code{void starpu_tag_declare_deps_array(starpu_tag_t id, unsigned ndeps, starpu_tag_t *array);}
  250. @end table
  251. @node starpu_tag_wait
  252. @subsection @code{starpu_tag_wait} -- Block until a Tag is terminated
  253. @table @asis
  254. @item @emph{Description}:
  255. TODO
  256. @item @emph{Prototype}:
  257. @code{void starpu_tag_wait(starpu_tag_t id);}
  258. @end table
  259. @node starpu_tag_wait_array
  260. @subsection @code{starpu_tag_wait_array} -- Block until a set of Tags is terminated
  261. @table @asis
  262. @item @emph{Description}:
  263. TODO
  264. @item @emph{Prototype}:
  265. @code{void starpu_tag_wait_array(unsigned ntags, starpu_tag_t *id);}
  266. @end table
  267. @node starpu_tag_remove
  268. @subsection @code{starpu_tag_remove} -- Destroy a Tag
  269. @table @asis
  270. @item @emph{Description}:
  271. TODO
  272. @item @emph{Prototype}:
  273. @code{void starpu_tag_remove(starpu_tag_t id);}
  274. @end table
  275. @section Extensions
  276. @subsection CUDA extensions
  277. @c void starpu_malloc_pinned_if_possible(float **A, size_t dim);
  278. @c subsubsection driver API specific calls
  279. @subsection Cell extensions
  280. @c ---------------------------------------------------------------------
  281. @c Basic Examples
  282. @c ---------------------------------------------------------------------
  283. @node Basic Examples
  284. @chapter Basic Examples
  285. @menu
  286. * Compiling and linking:: Compiling and Linking Options
  287. * Hello World:: Submitting Tasks
  288. * Scaling a Vector:: Manipulating Data
  289. * Scaling a Vector (hybrid):: Handling Heterogeneous Architectures
  290. @end menu
  291. @node Compiling and linking
  292. @section Compiling and linking options
  293. The Makefile could for instance contain the following lines to define which
  294. options must be given to the compiler and to the linker:
  295. @example
  296. @c @cartouche
  297. CFLAGS+=$$(pkg-config --cflags libstarpu)
  298. LIBS+=$$(pkg-config --libs libstarpu)
  299. @c @end cartouche
  300. @end example
  301. @node Hello World
  302. @section Hello World
  303. In this section, we show how to implement a simple program that submits a task to StarPU.
  304. @subsection Required Headers
  305. The @code{starpu.h} header should be included in any code using StarPU.
  306. @example
  307. @c @cartouche
  308. #include <starpu.h>
  309. @c @end cartouche
  310. @end example
  311. @subsection Defining a Codelet
  312. @example
  313. @c @cartouche
  314. void cpu_func(starpu_data_interface_t *buffers, void *func_arg)
  315. @{
  316. float *array = func_arg;
  317. printf("Hello world (array = @{%f, %f@} )\n", array[0], array[1]);
  318. @}
  319. starpu_codelet cl =
  320. @{
  321. .where = CORE,
  322. .core_func = cpu_func,
  323. .nbuffers = 0
  324. @};
  325. @c @end cartouche
  326. @end example
  327. A codelet is a structure that represents a computational kernel. Such a codelet
  328. may contain an implementation of the same kernel on different architectures
  329. (eg. CUDA, Cell's SPU, x86, ...).
  330. The ''@code{.nbuffers}'' field specifies the number of data buffers that are
  331. manipulated by the codelet: here the codelet does not access or modify any data
  332. that is controlled by our data management library. Note that the argument
  333. passed to the codelet (the ''@code{.cl_arg}'' field of the @code{starpu_task}
  334. structure) does not count as a buffer since it is not managed by our data
  335. management library.
  336. @c TODO need a crossref to the proper description of "where" see bla for more ...
  337. We create a codelet which may only be executed on the CPUs. The ''@code{.where}''
  338. field is a bitmask that defines where the codelet may be executed. Here, the
  339. @code{CORE} value means that only CPUs can execute this codelet
  340. (@pxref{Codelets and Tasks} for more details on that field).
  341. When a CPU core executes a codelet, it calls the @code{.core_func} function,
  342. which @emph{must} have the following prototype:
  343. @code{void (*core_func)(starpu_data_interface_t *, void *)}
  344. In this example, we can ignore the first argument of this function which gives a
  345. description of the input and output buffers (eg. the size and the location of
  346. the matrices). The second argument is a pointer to a buffer passed as an
  347. argument to the codelet by the means of the ''@code{.cl_arg}'' field of the
  348. @code{starpu_task} structure. Be aware that this may be a pointer to a
  349. @emph{copy} of the actual buffer, and not the pointer given by the programmer:
  350. if the codelet modifies this buffer, there is no garantee that the initial
  351. buffer will be modified as well: this for instance implies that the buffer
  352. cannot be used as a synchronization medium.
  353. @subsection Submitting a Task
  354. @example
  355. @c @cartouche
  356. void callback_func(void *callback_arg)
  357. @{
  358. printf("Callback function (arg %x)\n", callback_arg);
  359. @}
  360. int main(int argc, char **argv)
  361. @{
  362. /* initialize StarPU */
  363. starpu_init(NULL);
  364. struct starpu_task *task = starpu_task_create();
  365. task->cl = &cl;
  366. float array[2] = @{1.0f, -1.0f@};
  367. task->cl_arg = &array;
  368. task->cl_arg_size = 2*sizeof(float);
  369. task->callback_func = callback_func;
  370. task->callback_arg = 0x42;
  371. /* starpu_submit_task will be a blocking call */
  372. task->synchronous = 1;
  373. /* submit the task to StarPU */
  374. starpu_submit_task(task);
  375. /* terminate StarPU */
  376. starpu_shutdown();
  377. return 0;
  378. @}
  379. @c @end cartouche
  380. @end example
  381. Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
  382. @code{NULL} arguments specifies that we use default configuration. Tasks cannot
  383. be submitted after the termination of StarPU by a call to
  384. @code{starpu_shutdown}.
  385. In the example above, a task structure is allocated by a call to
  386. @code{starpu_task_create}. This function only allocate and fills the
  387. corresponding structure with the default settings (@pxref{starpu_task_create}),
  388. but it does not submit the task to StarPU.
  389. @c not really clear ;)
  390. The ''@code{.cl}'' field is a pointer to the codelet which the task will
  391. execute: in other words, the codelet structure describes which computational
  392. kernel should be offloaded on the different architectures, and the task
  393. structure is a wrapper containing a codelet and the piece of data on which the
  394. codelet should operate.
  395. The optional ''@code{.cl_arg}'' field is a pointer to a buffer (of size
  396. @code{.cl_arg_size}) with some parameters for some parameters for the kernel
  397. described by the codelet. For instance, if a codelet implements a computational
  398. kernel that multiplies its input vector by a constant, the constant could be
  399. specified by the means of this buffer.
  400. Once a task has been executed, an optional callback function can be called.
  401. While the computational kernel could be offloaded on various architectures, the
  402. callback function is always executed on a CPU. The ''@code{.callback_arg}''
  403. pointer is passed as an argument of the callback. The prototype of a callback
  404. function must be:
  405. @example
  406. void (*callback_function)(void *);
  407. @end example
  408. If the @code{.synchronous} field is non-null, task submission will be
  409. synchronous: the @code{starpu_submit_task} function will not return until the
  410. task was executed. Note that the @code{starpu_shutdown} method does not
  411. guarantee that asynchronous tasks have been executed before it returns.
  412. @node Scaling a Vector
  413. @section Manipulating Data: Scaling a Vector
  414. The previous example has shown how to submit tasks, in this section we show how
  415. StarPU tasks can manipulate data.
  416. Programmers can describe the data layout of their application so that StarPU is
  417. responsible for enforcing data coherency and availability accross the machine.
  418. Instead of handling complex (and non-portable) mechanisms to perform data
  419. movements, programmers only declare which piece of data is accessed and/or
  420. modified by a task, and StarPU makes sure that when a computational kernel starts
  421. somewhere (eg. on a GPU), its data are available locally.
  422. Before submitting those tasks, the programmer first needs to declare the
  423. different pieces of data to StarPU using the @code{starpu_register_*_data}
  424. functions. To ease the development of applications for StarPU, it is possible
  425. to describe multiple types of data layout. A type of data layout is called an
  426. @b{interface}. By default, there are different interfaces available in StarPU:
  427. here we will consider the @b{vector interface}.
  428. The following lines show how to declare an array of @code{n} elements of type
  429. @code{float} using the vector interface:
  430. @example
  431. float tab[n];
  432. starpu_data_handle tab_handle;
  433. starpu_register_vector_data(&tab_handle, 0, tab, n, sizeof(float));
  434. @end example
  435. The first argument, called the @b{data handle} is an opaque pointer which
  436. designates the array in StarPU. This is also the structure which is used to
  437. describe which data is used by a task.
  438. @c TODO: what is 0 ?
  439. It is possible to construct a StarPU
  440. task that multiplies this vector by a constant factor:
  441. @example
  442. float factor;
  443. struct starpu_task *task = starpu_task_create();
  444. task->cl = &cl;
  445. task->buffers[0].handle = &tab_handle;
  446. task->buffers[0].mode = STARPU_RW;
  447. task->cl_arg = &factor;
  448. task->cl_arg_size = sizeof(float);
  449. @end example
  450. Since the factor is constant, it does not need a preliminary declaration, and
  451. can just be passed through the @code{cl_arg} pointer like in the previous
  452. example. The vector parameter is described by its handle.
  453. There are two fields in each element of the @code{buffers} array.
  454. @code{.handle} is the handle of the data, and @code{.mode} specifies how the
  455. kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
  456. write-only and @code{STARPU_RW} for read and write access).
  457. The definition of the codelet can be written as follows:
  458. @example
  459. void scal_func(starpu_data_interface_t *buffers, void *arg)
  460. @{
  461. unsigned i;
  462. float *factor = arg;
  463. /* length of the vector */
  464. unsigned n = buffers[0].vector.nx;
  465. /* local copy of the vector pointer */
  466. float *val = (float *)buffers[0].vector.ptr;
  467. for (i = 0; i < n; i++)
  468. val[i] *= *factor;
  469. @}
  470. starpu_codelet cl = @{
  471. .where = CORE,
  472. .core_func = scal_func,
  473. .nbuffers = 1
  474. @};
  475. @end example
  476. The second argument of the @code{scal_func} function contains a pointer to the
  477. parameters of the codelet (given in @code{task->cl_arg}), so the we read the
  478. constant factor from this pointer. The first argument is an array that gives
  479. a description of every buffers passed in the @code{task->buffers}@ array, the
  480. number of which is given by the @code{.nbuffers} field of the codelet structure.
  481. In the @b{vector interface}, the location of the vector (resp. its length)
  482. is accessible in the @code{.vector.ptr} (resp. @code{.vector.nx}) of this
  483. array. Since the vector is accessed in a read-write fashion, any modification
  484. will automatically affect future accesses to that vector made by other tasks.
  485. @node Scaling a Vector (hybrid)
  486. @section Vector Scaling on an Hybrid CPU/GPU Machine
  487. Contrary to the previous examples, the task submitted in the example may not
  488. only be executed by the CPUs, but also by a CUDA device.
  489. TODO
  490. @c ---------------------------------------------------------------------
  491. @c Advanced Topics
  492. @c ---------------------------------------------------------------------
  493. @node Advanced Topics
  494. @chapter Advanced Topics
  495. @bye