data_partition.doxy 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2010-2015,2017,2018,2019 CNRS
  4. * Copyright (C) 2009-2011,2014,2015,2017,2018-2019 Université de Bordeaux
  5. * Copyright (C) 2011-2013 Inria
  6. *
  7. * StarPU is free software; you can redistribute it and/or modify
  8. * it under the terms of the GNU Lesser General Public License as published by
  9. * the Free Software Foundation; either version 2.1 of the License, or (at
  10. * your option) any later version.
  11. *
  12. * StarPU is distributed in the hope that it will be useful, but
  13. * WITHOUT ANY WARRANTY; without even the implied warranty of
  14. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  15. *
  16. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  17. */
  18. /*! \defgroup API_Data_Partition Data Partition
  19. \struct starpu_data_filter
  20. The filter structure describes a data partitioning operation, to be
  21. given to the starpu_data_partition() function.
  22. \ingroup API_Data_Partition
  23. \var void (*starpu_data_filter::filter_func)(void *father_interface, void *child_interface, struct starpu_data_filter *filter, unsigned i, unsigned nparts)
  24. Fill the \p child_interface structure with interface information
  25. for the \p i -th child of the parent \p father_interface (among
  26. \p nparts). The \p filter structure is provided, allowing to inspect the
  27. starpu_data_filter::filter_arg and starpu_data_filter::filter_arg_ptr
  28. parameters.
  29. The details of what needs to be filled in \p child_interface vary according
  30. to the data interface, but generally speaking:
  31. <ul>
  32. <li> <c>id</c> is usually just copied over from the father, when the sub data has the same structure as the father, e.g. a subvector is a vector, a submatrix is a matrix, etc. This is however not the case for instance when dividing a BCSR matrix into its dense blocks, which then are matrices. </li>
  33. <li> <c>nx</c>, <c>ny</c> and alike are usually divided by the number of subdata, depending how the subdivision is done (e.g. nx division vs ny division for vertical matrix division vs horizontal matrix division). </li>
  34. <li> <c>ld</c> for matrix interfaces are usually just copied over: the leading dimension (ld) usually does not change. </li>
  35. <li> <c>elemsize</c> is usually just copied over. </li>
  36. <li> <c>ptr</c>, the pointer to the data, has to be computed according to \p i and the father's <c>ptr</c>, so as to point to the start of the sub data. This should however be done only if the father has <c>ptr</c> different from NULL: in the OpenCL case notably, the <c>dev_handle</c> and <c>offset</c> fields are used instead. </li>
  37. <li> <c>dev_handle</c> should be just copied over from the parent. </li>
  38. <li> <c>offset</c> has to be computed according to \p i and the father's <c>offset</c>, so as to provide the offset of the start of the sub data. This is notably used for the OpenCL case.
  39. </ul>
  40. \var unsigned starpu_data_filter::nchildren
  41. Number of parts to partition the data into.
  42. \var unsigned (*starpu_data_filter::get_nchildren)(struct starpu_data_filter *, starpu_data_handle_t initial_handle)
  43. Return the number of children. This can be used instead of
  44. starpu_data_filter::nchildren when the number of children depends
  45. on the actual data (e.g. the number of blocks in a sparse matrix).
  46. \var struct starpu_data_interface_ops *(*starpu_data_filter::get_child_ops)(struct starpu_data_filter *, unsigned id)
  47. In case the resulting children use a different data interface,
  48. this function returns which interface is used by child number \p
  49. id.
  50. \var unsigned starpu_data_filter::filter_arg
  51. Additional parameter for the filter function
  52. \var void *starpu_data_filter::filter_arg_ptr
  53. Additional pointer parameter for the filter
  54. function, such as the sizes of the different parts.
  55. @name Basic API
  56. \ingroup API_Data_Partition
  57. \fn void starpu_data_partition(starpu_data_handle_t initial_handle, struct starpu_data_filter *f)
  58. \ingroup API_Data_Partition
  59. Request the partitioning of \p initial_handle into several subdata
  60. according to the filter \p f.
  61. Here an example of how to use the function.
  62. \code{.c}
  63. struct starpu_data_filter f =
  64. {
  65. .filter_func = starpu_matrix_filter_block,
  66. .nchildren = nslicesx
  67. };
  68. starpu_data_partition(A_handle, &f);
  69. \endcode
  70. \fn void starpu_data_unpartition(starpu_data_handle_t root_data, unsigned gathering_node)
  71. \ingroup API_Data_Partition
  72. Unapply the filter which has been applied to \p root_data, thus
  73. unpartitioning the data. The pieces of data are collected back into
  74. one big piece in the \p gathering_node (usually ::STARPU_MAIN_RAM).
  75. Tasks working on the partitioned data will be waited for
  76. by starpu_data_unpartition().
  77. Here an example of how to use the function.
  78. \code{.c}
  79. starpu_data_unpartition(A_handle, STARPU_MAIN_RAM);
  80. \endcode
  81. \fn int starpu_data_get_nb_children(starpu_data_handle_t handle)
  82. \ingroup API_Data_Partition
  83. Return the number of children \p handle has been partitioned into.
  84. \fn starpu_data_handle_t starpu_data_get_child(starpu_data_handle_t handle, unsigned i)
  85. \ingroup API_Data_Partition
  86. Return the \p i -th child of the given \p handle, which must have been
  87. partitionned beforehand.
  88. \fn starpu_data_handle_t starpu_data_get_sub_data(starpu_data_handle_t root_data, unsigned depth, ... )
  89. \ingroup API_Data_Partition
  90. After partitioning a StarPU data by applying a filter,
  91. starpu_data_get_sub_data() can be used to get handles for each of the
  92. data portions. \p root_data is the parent data that was partitioned.
  93. \p depth is the number of filters to traverse (in case several filters
  94. have been applied, to e.g. partition in row blocks, and then in column
  95. blocks), and the subsequent parameters are the indexes. The function
  96. returns a handle to the subdata.
  97. Here an example of how to use the function.
  98. \code{.c}
  99. h = starpu_data_get_sub_data(A_handle, 1, taskx);
  100. \endcode
  101. \fn starpu_data_handle_t starpu_data_vget_sub_data(starpu_data_handle_t root_data, unsigned depth, va_list pa)
  102. \ingroup API_Data_Partition
  103. This function is similar to starpu_data_get_sub_data() but uses a
  104. va_list for the parameter list.
  105. \fn void starpu_data_map_filters(starpu_data_handle_t root_data, unsigned nfilters, ...)
  106. \ingroup API_Data_Partition
  107. Apply \p nfilters filters to the handle designated by
  108. \p root_handle recursively. \p nfilters pointers to variables of the type
  109. starpu_data_filter should be given.
  110. \fn void starpu_data_vmap_filters(starpu_data_handle_t root_data, unsigned nfilters, va_list pa)
  111. \ingroup API_Data_Partition
  112. Apply \p nfilters filters to the handle designated by
  113. \p root_handle recursively. It uses a va_list of pointers to variables of
  114. the type starpu_data_filter.
  115. @name Asynchronous API
  116. \ingroup API_Data_Partition
  117. \fn void starpu_data_partition_plan(starpu_data_handle_t initial_handle, struct starpu_data_filter *f, starpu_data_handle_t *children)
  118. \ingroup API_Data_Partition
  119. Plan to partition \p initial_handle into several subdata according to
  120. the filter \p f.
  121. The handles are returned into the \p children array, which has to be
  122. the same size as the number of parts described in \p f. These handles
  123. are not immediately usable, starpu_data_partition_submit() has to be
  124. called to submit the actual partitioning.
  125. Here is an example of how to use the function:
  126. \code{.c}
  127. starpu_data_handle_t children[nslicesx];
  128. struct starpu_data_filter f =
  129. {
  130. .filter_func = starpu_matrix_filter_block,
  131. .nchildren = nslicesx
  132. };
  133. starpu_data_partition_plan(A_handle, &f, children);
  134. \endcode
  135. \fn void starpu_data_partition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
  136. \ingroup API_Data_Partition
  137. Submit the actual partitioning of \p initial_handle into the \p nparts
  138. \p children handles. This call is asynchronous, it only submits that the
  139. partitioning should be done, so that the \p children handles can now be used to
  140. submit tasks, and \p initial_handle can not be used to submit tasks any more (to
  141. guarantee coherency).
  142. For instance,
  143. \code{.c}
  144. starpu_data_partition_submit(A_handle, nslicesx, children);
  145. \endcode
  146. \fn void starpu_data_partition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
  147. \ingroup API_Data_Partition
  148. This is the same as starpu_data_partition_submit(), but it does not invalidate \p
  149. initial_handle. This allows to continue using it, but the application has to be
  150. careful not to write to \p initial_handle or \p children handles, only read from
  151. them, since the coherency is otherwise not guaranteed. This thus allows to
  152. submit various tasks which concurrently read from various partitions of the data.
  153. When the application wants to write to \p initial_handle again, it should call
  154. starpu_data_unpartition_submit(), which will properly add dependencies between the
  155. reads on the \p children and the writes to be submitted.
  156. If instead the application wants to write to \p children handles, it should
  157. call starpu_data_partition_readwrite_upgrade_submit(), which will correctly add
  158. dependencies between the reads on the \p initial_handle and the writes to be
  159. submitted.
  160. \fn void starpu_data_partition_readwrite_upgrade_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
  161. \ingroup API_Data_Partition
  162. This assumes that a partitioning of \p initial_handle has already been submited
  163. in readonly mode through starpu_data_partition_readonly_submit(), and will upgrade
  164. that partitioning into read-write mode for the \p children, by invalidating \p
  165. initial_handle, and adding the necessary dependencies.
  166. \fn void starpu_data_partition_submit_sequential_consistency(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int sequential_consistency)
  167. \ingroup API_Data_Partition
  168. Similar to starpu_data_partition_submit() but also allows to
  169. specify the coherency to be used for the main data \p initial_handle
  170. through the parameter \p sequential_consistency.
  171. \fn void starpu_data_unpartition_submit_sequential_consistency_cb(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gather_node, int sequential_consistency, void (*callback_func)(void *), void *callback_arg)
  172. \ingroup API_Data_Partition
  173. Similar to starpu_data_partition_submit_sequential_consistency() but
  174. allow to specify a callback function for the unpartitiong task
  175. \fn void starpu_data_partition_not_automatic(starpu_data_handle_t handle)
  176. \ingroup API_Data_Partition
  177. Disable the automatic partitioning of the data \p handle for which a
  178. asynchronous plan has previously been submitted
  179. \fn void starpu_data_unpartition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node)
  180. \ingroup API_Data_Partition
  181. This assumes that \p initial_handle is partitioned into \p children, and submits
  182. an unpartitionning of it, i.e. submitting a gathering of the pieces on the
  183. requested \p gathering_node memory node, and submitting an invalidation of the
  184. children.
  185. \p gathering_node can be set to -1 to let the runtime decide which memory node
  186. should be used to gather the pieces.
  187. This call is asynchronous, it only submits that the unpartitioning should be
  188. done, so that the \p children handles should not be used to submit tasks any
  189. more, and \p initial_handle can now be used again to submit tasks.
  190. \fn void starpu_data_unpartition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node)
  191. \ingroup API_Data_Partition
  192. This assumes that \p initial_handle is partitioned into \p children, and submits
  193. just a readonly unpartitionning of it, i.e. submitting a gathering of the pieces
  194. on the requested \p gathering_node memory node. It does not invalidate the
  195. children. This brings \p initial_handle and \p children handles to the same
  196. state as obtained with starpu_data_partition_readonly_submit().
  197. \p gathering_node can be set to -1 to let the runtime decide which memory node
  198. should be used to gather the pieces.
  199. \fn void starpu_data_unpartition_submit_sequential_consistency(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node, int sequential_consistency)
  200. \ingroup API_Data_Partition
  201. Similar to starpu_data_unpartition_submit() but also allows to
  202. specify the coherency to be used for the main data \p initial_handle
  203. through the parameter \p sequential_consistency.
  204. \fn void starpu_data_partition_clean(starpu_data_handle_t root_data, unsigned nparts, starpu_data_handle_t *children)
  205. \ingroup API_Data_Partition
  206. This should be used to clear the partition planning established between \p
  207. root_data and \p children with starpu_data_partition_plan(). This will notably
  208. submit an unregister all the \p children, which can thus not be used any more
  209. afterwards.
  210. @name Predefined Vector Filter Functions
  211. \ingroup API_Data_Partition
  212. This section gives a partial list of the predefined partitioning
  213. functions for vector data. Examples on how to use them are shown in
  214. \ref PartitioningData. The complete list can be found in the file
  215. <c>starpu_data_filters.h</c>.
  216. \fn void starpu_vector_filter_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  217. \ingroup API_Data_Partition
  218. Return in \p child_interface the \p id th element of the vector
  219. represented by \p father_interface once partitioned in \p nparts chunks of
  220. equal size.
  221. \fn void starpu_vector_filter_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  222. \ingroup API_Data_Partition
  223. Return in \p child_interface the \p id th element of the vector
  224. represented by \p father_interface once partitioned in \p nparts chunks of
  225. equal size with a shadow border <c>filter_arg_ptr</c>, thus getting a vector
  226. of size <c>(n-2*shadow)/nparts+2*shadow</c>. The <c>filter_arg_ptr</c> field
  227. of \p f must be the shadow size casted into \c void*.
  228. <b>IMPORTANT</b>: This can only be used for read-only access, as no coherency is
  229. enforced for the shadowed parts. An usage example is available in
  230. examples/filters/shadow.c
  231. \fn void starpu_vector_filter_list_long(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  232. \ingroup API_Data_Partition
  233. Return in \p child_interface the \p id th element of the vector
  234. represented by \p father_interface once partitioned into \p nparts chunks
  235. according to the <c>filter_arg_ptr</c> field of \p f. The
  236. <c>filter_arg_ptr</c> field must point to an array of \p nparts long
  237. elements, each of which specifies the number of elements in each chunk
  238. of the partition.
  239. \fn void starpu_vector_filter_list(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  240. \ingroup API_Data_Partition
  241. Return in \p child_interface the \p id th element of the vector
  242. represented by \p father_interface once partitioned into \p nparts chunks
  243. according to the <c>filter_arg_ptr</c> field of \p f. The
  244. <c>filter_arg_ptr</c> field must point to an array of \p nparts uint32_t
  245. elements, each of which specifies the number of elements in each chunk
  246. of the partition.
  247. \fn void starpu_vector_filter_divide_in_2(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  248. \ingroup API_Data_Partition
  249. Return in \p child_interface the \p id th element of the vector
  250. represented by \p father_interface once partitioned in <c>2</c> chunks of
  251. equal size, ignoring nparts. Thus, \p id must be <c>0</c> or <c>1</c>.
  252. @name Predefined Matrix Filter Functions
  253. \ingroup API_Data_Partition
  254. This section gives a partial list of the predefined partitioning
  255. functions for matrix data. Examples on how to use them are shown in
  256. \ref PartitioningData. The complete list can be found in the file
  257. <c>starpu_data_filters.h</c>.
  258. \fn void starpu_matrix_filter_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  259. \ingroup API_Data_Partition
  260. Partition a dense Matrix along the x dimension, thus
  261. getting (x/\p nparts ,y) matrices. If \p nparts does not divide x, the
  262. last submatrix contains the remainder.
  263. \fn void starpu_matrix_filter_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  264. \ingroup API_Data_Partition
  265. Partition a dense Matrix along the x dimension, with a
  266. shadow border <c>filter_arg_ptr</c>, thus getting ((x-2*shadow)/\p
  267. nparts +2*shadow,y) matrices. If \p nparts does not divide x-2*shadow,
  268. the last submatrix contains the remainder.
  269. <b>IMPORTANT</b>: This can
  270. only be used for read-only access, as no coherency is enforced for the
  271. shadowed parts. A usage example is available in
  272. examples/filters/shadow2d.c
  273. \fn void starpu_matrix_filter_vertical_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  274. \ingroup API_Data_Partition
  275. Partition a dense Matrix along the y dimension, thus
  276. getting (x,y/\p nparts) matrices. If \p nparts does not divide y, the
  277. last submatrix contains the remainder.
  278. \fn void starpu_matrix_filter_vertical_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  279. \ingroup API_Data_Partition
  280. Partition a dense Matrix along the y dimension, with a
  281. shadow border <c>filter_arg_ptr</c>, thus getting
  282. (x,(y-2*shadow)/\p nparts +2*shadow) matrices. If \p nparts does not
  283. divide y-2*shadow, the last submatrix contains the remainder.
  284. <b>IMPORTANT</b>: This can only be used for read-only access, as no
  285. coherency is enforced for the shadowed parts. A usage example is
  286. available in examples/filters/shadow2d.c
  287. @name Predefined Block Filter Functions
  288. \ingroup API_Data_Partition
  289. This section gives a partial list of the predefined partitioning
  290. functions for block data. Examples on how to use them are shown in
  291. \ref PartitioningData. The complete list can be found in the file
  292. <c>starpu_data_filters.h</c>. A usage example is available in
  293. examples/filters/shadow3d.c
  294. \fn void starpu_block_filter_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  295. \ingroup API_Data_Partition
  296. Partition a block along the X dimension, thus getting
  297. (x/\p nparts ,y,z) 3D matrices. If \p nparts does not divide x, the last
  298. submatrix contains the remainder.
  299. \fn void starpu_block_filter_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  300. \ingroup API_Data_Partition
  301. Partition a block along the X dimension, with a
  302. shadow border <c>filter_arg_ptr</c>, thus getting
  303. ((x-2*shadow)/\p nparts +2*shadow,y,z) blocks. If \p nparts does not
  304. divide x, the last submatrix contains the remainder.
  305. <b>IMPORTANT</b>:
  306. This can only be used for read-only access, as no coherency is
  307. enforced for the shadowed parts.
  308. \fn void starpu_block_filter_vertical_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  309. \ingroup API_Data_Partition
  310. Partition a block along the Y dimension, thus getting
  311. (x,y/\p nparts ,z) blocks. If \p nparts does not divide y, the last
  312. submatrix contains the remainder.
  313. \fn void starpu_block_filter_vertical_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  314. \ingroup API_Data_Partition
  315. Partition a block along the Y dimension, with a
  316. shadow border <c>filter_arg_ptr</c>, thus getting
  317. (x,(y-2*shadow)/\p nparts +2*shadow,z) 3D matrices. If \p nparts does not
  318. divide y, the last submatrix contains the remainder.
  319. <b>IMPORTANT</b>:
  320. This can only be used for read-only access, as no coherency is
  321. enforced for the shadowed parts.
  322. \fn void starpu_block_filter_depth_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  323. \ingroup API_Data_Partition
  324. Partition a block along the Z dimension, thus getting
  325. (x,y,z/\p nparts) blocks. If \p nparts does not divide z, the last
  326. submatrix contains the remainder.
  327. \fn void starpu_block_filter_depth_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  328. \ingroup API_Data_Partition
  329. Partition a block along the Z dimension, with a
  330. shadow border <c>filter_arg_ptr</c>, thus getting
  331. (x,y,(z-2*shadow)/\p nparts +2*shadow) blocks. If \p nparts does not
  332. divide z, the last submatrix contains the remainder.
  333. <b>IMPORTANT</b>:
  334. This can only be used for read-only access, as no coherency is
  335. enforced for the shadowed parts.
  336. @name Predefined BCSR Filter Functions
  337. \ingroup API_Data_Partition
  338. This section gives a partial list of the predefined partitioning
  339. functions for BCSR data. Examples on how to use them are shown in
  340. \ref PartitioningData. The complete list can be found in the file
  341. <c>starpu_data_filters.h</c>.
  342. \fn void starpu_bcsr_filter_canonical_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  343. \ingroup API_Data_Partition
  344. Partition a block-sparse matrix into dense matrices.
  345. \fn void starpu_csr_filter_vertical_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
  346. \ingroup API_Data_Partition
  347. Partition a block-sparse matrix into vertical block-sparse matrices.
  348. */