starpu.texi 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616
  1. \input texinfo @c -*-texinfo-*-
  2. @c %**start of header
  3. @setfilename starpu.info
  4. @settitle StarPU
  5. @c %**end of header
  6. @setchapternewpage odd
  7. @titlepage
  8. @title StarPU
  9. @page
  10. @vskip 0pt plus 1filll
  11. @comment For the @value{version-GCC} Version*
  12. @end titlepage
  13. @summarycontents
  14. @contents
  15. @page
  16. @node Top
  17. @top Preface
  18. @cindex Preface
  19. This manual documents the usage of StarPU
  20. @comment
  21. @comment When you add a new menu item, please keep the right hand
  22. @comment aligned to the same column. Do not use tabs. This provides
  23. @comment better formatting.
  24. @comment
  25. @menu
  26. * Introduction:: A basic introduction to using StarPU.
  27. * Installing StarPU:: How to configure, build and install StarPU
  28. * StarPU API:: The API to use StarPU
  29. * Basic Examples:: Basic examples of the use of StarPU
  30. @end menu
  31. @c ---------------------------------------------------------------------
  32. @c Introduction to StarPU
  33. @c ---------------------------------------------------------------------
  34. @node Introduction
  35. @chapter Introduction to StarPU
  36. @menu
  37. * Motivation:: Why StarPU ?
  38. * StarPU in a Nutshell:: The Fundamentals of StarPU
  39. @end menu
  40. @node Motivation
  41. @section Motivation
  42. @c complex machines with heterogeneous cores/devices
  43. The use of specialized hardware such as accelerators or coprocessors offers an
  44. interesting approach to overcome the physical limits encountered by processor
  45. architects. As a result, many machines are now equipped with one or several
  46. accelerators (eg. a GPU), in addition to the usual processor(s). While a lot of
  47. efforts have been devoted to offload computation onto such accelerators, very
  48. few attention as been paid to portability concerns on the one hand, and to the
  49. possibility of having heterogeneous accelerators and processors to interact.
  50. StarPU is a runtime system that offers support for heterogeneous multicore
  51. architectures, it not only offers a unified view of the computational resources
  52. (ie. CPUs and accelerators at the same time), but it also takes care to
  53. efficiently map and execute tasks onto an heterogeneous machine while
  54. transparently handling low-level issues in a portable fashion.
  55. @c this leads to a complicated distributed memory design
  56. @c which is not (easily) manageable by hand
  57. @c added value/benefits of StarPU
  58. @c - portability
  59. @c - scheduling, perf. portability
  60. @node StarPU in a Nutshell
  61. @section StarPU in a Nutshell
  62. @c DSM
  63. @c explain the notion of codelet and task (ie. g(A, B)
  64. @c ---------------------------------------------------------------------
  65. @c Installing StarPU
  66. @c ---------------------------------------------------------------------
  67. @node Installing StarPU
  68. @chapter Installing StarPU
  69. StarPU can be built and installed by the standard means of the GNU
  70. autotools. The following chapter is intended to briefly remind how these tools
  71. can be used to install StarPU.
  72. @section Configuring StarPU
  73. @subsection Generating Makefiles and configuration scripts
  74. This step is not necessary when using the tarball releases of StarPU. If you
  75. are using the source code from the svn repository, you first need to generate
  76. the configure scripts and the Makefiles.
  77. @example
  78. $ autoreconf -i
  79. @end example
  80. @subsection Configuring StarPU
  81. @example
  82. $ ./configure
  83. @end example
  84. @c TODO enumerate the list of interesting options
  85. @section Building and Installing StarPU
  86. @subsection Building
  87. @example
  88. $ make
  89. @end example
  90. @subsection Sanity Checks
  91. In order to make sure that StarPU is working properly on the system, it is also
  92. possible to run a test suite.
  93. @example
  94. $ make check
  95. @end example
  96. @subsection Installing
  97. In order to install StarPU at the location that was specified during
  98. configuration:
  99. @example
  100. # make install
  101. @end example
  102. @subsection pkg-config configuration
  103. It is possible that compiling and linking an application against StarPU
  104. requires to use specific flags or libraries (for instance @code{CUDA} or
  105. @code{libspe2}). Therefore, it is possible to use the @code{pkg-config} tool.
  106. If StarPU was not installed at some standard location, the path of StarPU's
  107. library must be specified in the @code{PKG_CONFIG_PATH} environment variable so
  108. that @code{pkg-config} can find it. So if StarPU was installed in
  109. @code{$(prefix_dir)}:
  110. @example
  111. @c TODO: heu, c'est vraiment du shell ça ? :)
  112. $ PKG_CONFIG_PATH = @{PKG_CONFIG_PATH@}:$(prefix_dir)/lib/
  113. @end example
  114. The flags required to compiled or linked against StarPU are then
  115. accessible with the following commands:
  116. @example
  117. $ pkg-config --cflags libstarpu # options for the compiler
  118. $ pkg-config --libs libstarpu # options for the linker
  119. @end example
  120. @c ---------------------------------------------------------------------
  121. @c StarPU API
  122. @c ---------------------------------------------------------------------
  123. @node StarPU API
  124. @chapter StarPU API
  125. @menu
  126. * Initialization and Termination:: Initialization and Termination methods
  127. * Data Library:: Methods to manipulate data
  128. * Codelets and Tasks:: Methods to construct tasks
  129. * Tags:: Task dependencies
  130. @end menu
  131. @node Initialization and Termination
  132. @section Initialization and Termination
  133. @menu
  134. * starpu_init:: Initialize StarPU
  135. * struct starpu_conf:: StarPU runtime configuration
  136. * starpu_shutdown:: Terminate StarPU
  137. @end menu
  138. @node starpu_init
  139. @subsection @code{starpu_init} -- Initialize StarPU
  140. @table @asis
  141. @item @emph{Description}:
  142. This is StarPU initialization method, which must be called prior to any other
  143. StarPU call. It is possible to specify StarPU's configuration (eg. scheduling
  144. policy, number of cores, ...) by passing a non-null argument. Default
  145. configuration is used if the passed argument is @code{NULL}.
  146. @item @emph{Prototype}:
  147. @code{void starpu_init(struct starpu_conf *conf);}
  148. @end table
  149. @node struct starpu_conf
  150. @subsection @code{struct starpu_conf} -- StarPU runtime configuration
  151. @table @asis
  152. @item @emph{Description}:
  153. TODO
  154. @item @emph{Definition}:
  155. TODO
  156. @end table
  157. @node starpu_shutdown
  158. @subsection @code{starpu_shutdown} -- Terminate StarPU
  159. @table @asis
  160. @item @emph{Description}:
  161. This is StarPU termination method. It must be called at the end of the
  162. application: statistics and other post-mortem debugging information are not
  163. garanteed to be available until this method has been called.
  164. @item @emph{Prototype}:
  165. @code{void starpu_shutdown(void);}
  166. @end table
  167. @node Data Library
  168. @section Data Library
  169. @c data_handle_t
  170. @c void starpu_delete_data(struct starpu_data_state_t *state);
  171. @c user interaction with the DSM
  172. @c void starpu_sync_data_with_mem(struct starpu_data_state_t *state);
  173. @c void starpu_notify_data_modification(struct starpu_data_state_t *state, uint32_t modifying_node);
  174. @node Codelets and Tasks
  175. @section Codelets and Tasks
  176. @menu
  177. * starpu_task_create:: Allocate and Initialize a Task
  178. @end menu
  179. @c struct starpu_task
  180. @c struct starpu_codelet
  181. @node starpu_task_create
  182. @subsection @code{starpu_task_create} -- Allocate and Initialize a Task
  183. @table @asis
  184. @item @emph{Description}:
  185. TODO
  186. @item @emph{Prototype}:
  187. @code{struct starpu_task *starpu_task_create(void);}
  188. @end table
  189. @c Callbacks : what can we put in callbacks ?
  190. @node Tags
  191. @section Tags
  192. @menu
  193. * starpu_tag_t:: Task identifier
  194. * starpu_tag_declare_deps:: Declare the Dependencies of a Tag
  195. * starpu_tag_declare_deps_array:: Declare the Dependencies of a Tag
  196. * starpu_tag_wait:: Block until a Tag is terminated
  197. * starpu_tag_wait_array:: Block until a set of Tags is terminated
  198. * starpu_tag_remove:: Destroy a Tag
  199. @end menu
  200. @node starpu_tag_t
  201. @subsection @code{starpu_tag_t} -- Task identifier
  202. @c mention the tag_id field of the task structure
  203. @table @asis
  204. @item @emph{Definition}:
  205. TODO
  206. @end table
  207. @node starpu_tag_declare_deps
  208. @subsection @code{starpu_tag_declare_deps} -- Declare the Dependencies of a Tag
  209. @table @asis
  210. @item @emph{Description}:
  211. TODO
  212. @item @emph{Prototype}:
  213. @code{void starpu_tag_declare_deps(starpu_tag_t id, unsigned ndeps, ...);}
  214. @end table
  215. @node starpu_tag_declare_deps_array
  216. @subsection @code{starpu_tag_declare_deps_array} -- Declare the Dependencies of a Tag
  217. @table @asis
  218. @item @emph{Description}:
  219. TODO
  220. @item @emph{Prototype}:
  221. @code{void starpu_tag_declare_deps_array(starpu_tag_t id, unsigned ndeps, starpu_tag_t *array);}
  222. @end table
  223. @node starpu_tag_wait
  224. @subsection @code{starpu_tag_wait} -- Block until a Tag is terminated
  225. @table @asis
  226. @item @emph{Description}:
  227. TODO
  228. @item @emph{Prototype}:
  229. @code{void starpu_tag_wait(starpu_tag_t id);}
  230. @end table
  231. @node starpu_tag_wait_array
  232. @subsection @code{starpu_tag_wait_array} -- Block until a set of Tags is terminated
  233. @table @asis
  234. @item @emph{Description}:
  235. TODO
  236. @item @emph{Prototype}:
  237. @code{void starpu_tag_wait_array(unsigned ntags, starpu_tag_t *id);}
  238. @end table
  239. @node starpu_tag_remove
  240. @subsection @code{starpu_tag_remove} -- Destroy a Tag
  241. @table @asis
  242. @item @emph{Description}:
  243. TODO
  244. @item @emph{Prototype}:
  245. @code{void starpu_tag_remove(starpu_tag_t id);}
  246. @end table
  247. @section Extensions
  248. @subsection CUDA extensions
  249. @c void starpu_malloc_pinned_if_possible(float **A, size_t dim);
  250. @c subsubsection driver API specific calls
  251. @subsection Cell extensions
  252. @c ---------------------------------------------------------------------
  253. @c Basic Examples
  254. @c ---------------------------------------------------------------------
  255. @node Basic Examples
  256. @chapter Basic Examples
  257. @menu
  258. * Compiling and linking:: Compiling and Linking Options
  259. * Hello World:: Submitting Tasks
  260. * Scaling a Vector:: Manipulating Data
  261. * Scaling a Vector (hybrid):: Handling Heterogeneous Architectures
  262. @end menu
  263. @node Compiling and linking
  264. @section Compiling and linking options
  265. The Makefile could for instance contain the following lines to define which
  266. options must be given to the compiler and to the linker:
  267. @example
  268. @c @cartouche
  269. CFLAGS+=$$(pkg-config --cflags libstarpu)
  270. LIBS+=$$(pkg-config --libs libstarpu)
  271. @c @end cartouche
  272. @end example
  273. @node Hello World
  274. @section Hello World
  275. In this section, we show how to implement a simple program that submits a task to StarPU.
  276. @subsection Required Headers
  277. The @code{starpu.h} header should be included in any code using StarPU.
  278. @example
  279. @c @cartouche
  280. #include <starpu.h>
  281. @c @end cartouche
  282. @end example
  283. @subsection Defining a Codelet
  284. @example
  285. @c @cartouche
  286. void cpu_func(starpu_data_interface_t *buffers, void *func_arg)
  287. @{
  288. float *array = func_arg;
  289. printf("Hello world (array = @{%f, %f@} )\n", array[0], array[1]);
  290. @}
  291. starpu_codelet cl =
  292. @{
  293. .where = CORE,
  294. .core_func = cpu_func,
  295. .nbuffers = 0
  296. @};
  297. @c @end cartouche
  298. @end example
  299. A codelet is a structure that represents a computational kernel. Such a codelet
  300. may contain an implementation of the same kernel on different architectures
  301. (eg. CUDA, Cell's SPU, x86, ...).
  302. The ''@code{.nbuffers}'' field specifies the number of data buffers that are
  303. manipulated by the codelet: here the codelet does not access or modify any data
  304. that is controlled by our data management library. Note that the argument
  305. passed to the codelet (the ''@code{.cl_arg}'' field of the @code{starpu_task}
  306. structure) does not count as a buffer since it is not managed by our data
  307. management library.
  308. @c TODO need a crossref to the proper description of "where" see bla for more ...
  309. We create a codelet which may only be executed on the CPUs. The ''@code{.where}''
  310. field is a bitmask that defines where the codelet may be executed. Here, the
  311. @code{CORE} value means that only CPUs can execute this codelet
  312. (@pxref{Codelets and Tasks} for more details on that field).
  313. When a CPU core executes a codelet, it calls the @code{.core_func} function,
  314. which @emph{must} have the following prototype:
  315. @code{void (*core_func)(starpu_data_interface_t *, void *)}
  316. In this example, we can ignore the first argument of this function which gives a
  317. description of the input and output buffers (eg. the size and the location of
  318. the matrices). The second argument is a pointer to a buffer passed as an
  319. argument to the codelet by the means of the ''@code{.cl_arg}'' field of the
  320. @code{starpu_task} structure. Be aware that this may be a pointer to a
  321. @emph{copy} of the actual buffer, and not the pointer given by the programmer:
  322. if the codelet modifies this buffer, there is no garantee that the initial
  323. buffer will be modified as well: this for instance implies that the buffer
  324. cannot be used as a synchronization medium.
  325. @subsection Submitting a Task
  326. @example
  327. @c @cartouche
  328. void callback_func(void *callback_arg)
  329. @{
  330. printf("Callback function (arg %x)\n", callback_arg);
  331. @}
  332. int main(int argc, char **argv)
  333. @{
  334. /* initialize StarPU */
  335. starpu_init(NULL);
  336. struct starpu_task *task = starpu_task_create();
  337. task->cl = &cl;
  338. float array[2] = @{1.0f, -1.0f@};
  339. task->cl_arg = &array;
  340. task->cl_arg_size = 2*sizeof(float);
  341. task->callback_func = callback_func;
  342. task->callback_arg = 0x42;
  343. /* starpu_submit_task will be a blocking call */
  344. task->synchronous = 1;
  345. /* submit the task to StarPU */
  346. starpu_submit_task(task);
  347. /* terminate StarPU */
  348. starpu_shutdown();
  349. return 0;
  350. @}
  351. @c @end cartouche
  352. @end example
  353. Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
  354. @code{NULL} arguments specifies that we use default configuration. Tasks cannot
  355. be submitted after the termination of StarPU by a call to
  356. @code{starpu_shutdown}.
  357. In the example above, a task structure is allocated by a call to
  358. @code{starpu_task_create}. This function only allocate and fills the
  359. corresponding structure with the default settings (@pxref{starpu_task_create}),
  360. but it does not submit the task to StarPU.
  361. @c not really clear ;)
  362. The ''@code{.cl}'' field is a pointer to the codelet which the task will
  363. execute: in other words, the codelet structure describes which computational
  364. kernel should be offloaded on the different architectures, and the task
  365. structure is a wrapper containing a codelet and the piece of data on which the
  366. codelet should operate.
  367. The optional ''@code{.cl_arg}'' field is a pointer to a buffer (of size
  368. @code{.cl_arg_size}) with some parameters for some parameters for the kernel
  369. described by the codelet. For instance, if a codelet implements a computational
  370. kernel that multiplies its input vector by a constant, the constant could be
  371. specified by the means of this buffer.
  372. Once a task has been executed, an optional callback function can be called.
  373. While the computational kernel could be offloaded on various architectures, the
  374. callback function is always executed on a CPU. The ''@code{.callback_arg}''
  375. pointer is passed as an argument of the callback. The prototype of a callback
  376. function must be:
  377. @example
  378. void (*callback_function)(void *);
  379. @end example
  380. If the @code{.synchronous} field is non-null, task submission will be
  381. synchronous: the @code{starpu_submit_task} function will not return until the
  382. task was executed. Note that the @code{starpu_shutdown} method does not
  383. guarantee that asynchronous tasks have been executed before it returns.
  384. @node Scaling a Vector
  385. @section Manipulating Data: Scaling a Vector
  386. The previous example has shown how to submit tasks, in this section we show how
  387. StarPU tasks can manipulate data.
  388. Programmers can describe the data layout of their application so that StarPU is
  389. responsible for enforcing data coherency and availability accross the machine.
  390. Instead of handling complex (and non-portable) mechanisms to perform data
  391. movements, programmers only declare which piece of data is accessed and/or
  392. modified by a task, and StarPU makes sure that when a computational kernel starts
  393. somewhere (eg. on a GPU), its data are available locally.
  394. Before submitting those tasks, the programmer first needs to declare the
  395. different pieces of data to StarPU using the @code{starpu_register_*_data}
  396. functions. To ease the development of applications for StarPU, it is possible
  397. to describe multiple types of data layout. A type of data layout is called an
  398. @b{interface}. By default, there are different interfaces available in StarPU:
  399. here we will consider the @b{vector interface}.
  400. The following lines show how to declare an array of @code{n} elements of type
  401. @code{float} using the vector interface:
  402. @example
  403. float tab[n];
  404. starpu_data_handle tab_handle;
  405. starpu_register_vector_data(&tab_handle, 0, tab, n, sizeof(float));
  406. @end example
  407. The first argument, called the @b{data handle} is an opaque pointer which
  408. designates the array in StarPU. This is also the structure which is used to
  409. describe which data is used by a task.
  410. @c TODO: what is 0 ?
  411. It is possible to construct a StarPU
  412. task that multiplies this vector by a constant factor:
  413. @example
  414. float factor;
  415. struct starpu_task *task = starpu_task_create();
  416. task->cl = &cl;
  417. task->buffers[0].handle = &tab_handle;
  418. task->buffers[0].mode = STARPU_RW;
  419. task->cl_arg = &factor;
  420. task->cl_arg_size = sizeof(float);
  421. @end example
  422. Since the factor is constant, it does not need a preliminary declaration, and
  423. can just be passed through the @code{cl_arg} pointer like in the previous
  424. example. The vector parameter is described by its handle.
  425. There are two fields in each element of the @code{buffers} array.
  426. @code{.handle} is the handle of the data, and @code{.mode} specifies how the
  427. kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
  428. write-only and @code{STARPU_RW} for read and write access).
  429. The definition of the codelet can be written as follows:
  430. @example
  431. void scal_func(starpu_data_interface_t *buffers, void *arg)
  432. @{
  433. unsigned i;
  434. float *factor = arg;
  435. /* length of the vector */
  436. unsigned n = buffers[0].vector.nx;
  437. /* local copy of the vector pointer */
  438. float *val = (float *)buffers[0].vector.ptr;
  439. for (i = 0; i < n; i++)
  440. val[i] *= *factor;
  441. @}
  442. starpu_codelet cl = @{
  443. .where = CORE,
  444. .core_func = scal_func,
  445. .nbuffers = 1
  446. @};
  447. @end example
  448. The second argument of the @code{scal_func} function contains a pointer to the
  449. parameters of the codelet (given in @code{task->cl_arg}), so the we read the
  450. constant factor from this pointer. The first argument is an array that gives
  451. a description of every buffers passed in the @code{task->buffers}@ array, the
  452. number of which is given by the @code{.nbuffers} field of the codelet structure.
  453. In the @b{vector interface}, the location of the vector (resp. its length)
  454. is accessible in the @code{.vector.ptr} (resp. @code{.vector.nx}) of this
  455. array. Since the vector is accessed in a read-write fashion, any modification
  456. will automatically affect future accesses to that vector made by other tasks.
  457. @node Scaling a Vector (hybrid)
  458. @section Vector Scaling on an Hybrid CPU/GPU Machine
  459. Contrary to the previous examples, the task submitted in the example may not
  460. only be executed by the CPUs, but also by a CUDA device.
  461. TODO
  462. @bye