401_out_of_core.doxy 9.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2013-2020 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  4. * Copyright (C) 2013 Corentin Salingue
  5. *
  6. * StarPU is free software; you can redistribute it and/or modify
  7. * it under the terms of the GNU Lesser General Public License as published by
  8. * the Free Software Foundation; either version 2.1 of the License, or (at
  9. * your option) any later version.
  10. *
  11. * StarPU is distributed in the hope that it will be useful, but
  12. * WITHOUT ANY WARRANTY; without even the implied warranty of
  13. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  14. *
  15. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  16. */
  17. /*! \page OutOfCore Out Of Core
  18. \section OutOfCore_Introduction Introduction
  19. When using StarPU, one may need to store more data than what the main memory
  20. (RAM) can store. This part describes the method to add a new memory node on a
  21. disk and to use it.
  22. Similarly to what happens with GPUs (it's actually exactly the same code), when
  23. available main memory becomes scarse, StarPU will evict unused data to the disk,
  24. thus leaving room for new allocations. Whenever some evicted data is needed
  25. again for a task, StarPU will automatically fetch it back from the disk.
  26. The principle is that one first registers a disk location, seen by StarPU as a
  27. <c>void*</c>, which can be for instance a Unix path for the \c stdio, \c unistd or
  28. \c unistd_o_direct backends, or a leveldb database for the \c leveldb backend, an HDF5
  29. file path for the \c HDF5 backend, etc. The \c disk backend opens this place with the
  30. plug() method.
  31. StarPU can then start using it to allocate room and store data there with the
  32. disk write method, without user intervention.
  33. The user can also use starpu_disk_open() to explicitly open an object within the
  34. disk, e.g. a file name in the \c stdio or \c unistd cases, or a database key in the
  35. \c leveldb case, and then use <c>starpu_*_register</c> functions to turn it into a StarPU
  36. data handle. StarPU will then use this file as external source of data, and
  37. automatically read and write data as appropriate.
  38. In any case, the user also needs to set \ref STARPU_LIMIT_CPU_MEM to the amount of
  39. data that StarPU will be allowed to afford. By default StarPU will use the
  40. machine memory size, but part of it is taken by the kernel, the system,
  41. daemons, and the application's own allocated data, whose size can not be
  42. predicted. That is why the user needs to specify what StarPU can afford.
  43. Some Out-of-core tests are worth giving a read, see <c>tests/disk/*.c</c>
  44. \section UseANewDiskMemory Use a new disk memory
  45. To use a disk memory node, you have to register it with this function:
  46. \code{.c}
  47. int new_dd = starpu_disk_register(&starpu_disk_unistd_ops, (void *) "/tmp/", 1024*1024*200);
  48. \endcode
  49. Here, we use the \c unistd library to realize the read/write operations, i.e.
  50. \c fread/\c fwrite. This structure must have a path where to store files, as well as
  51. the maximum size the software can afford storing on the disk.
  52. Don't forget to check if the result is correct!
  53. This can also be achieved by just setting environment variables \ref STARPU_DISK_SWAP, \ref STARPU_DISK_SWAP_BACKEND and \ref STARPU_DISK_SWAP_SIZE :
  54. \verbatim
  55. export STARPU_DISK_SWAP=/tmp
  56. export STARPU_DISK_SWAP_BACKEND=unistd
  57. export STARPU_DISK_SWAP_SIZE=200
  58. \endverbatim
  59. The backend can be set to \c stdio (some caching is done by \c libc and the kernel), \c unistd (only
  60. caching in the kernel), \c unistd_o_direct (no caching), \c leveldb, or \c hdf5.
  61. It is important to understand that when the backend is not set to \c
  62. unistd_o_direct, some caching will occur at the kernel level (the page cache),
  63. which will also consume memory... \ref STARPU_LIMIT_CPU_MEM might need to be set
  64. to less that half of the machine memory just to leave room for the kernel's
  65. page cache, otherwise the kernel will struggle to get memory. Using \c
  66. unistd_o_direct avoids this caching, thus allowing to set \ref STARPU_LIMIT_CPU_MEM
  67. to the machine memory size (minus some memory for normal kernel operations,
  68. system daemons, and application data).
  69. When the register call is made, StarPU will benchmark the disk. This can
  70. take some time.
  71. <strong>Warning: the size thus has to be at least \ref STARPU_DISK_SIZE_MIN bytes ! </strong>
  72. StarPU will then automatically try to evict unused data to this new disk. One
  73. can also use the standard StarPU memory node API to prefetch data etc., see the
  74. \ref API_Standard_Memory_Library and the \ref API_Data_Interfaces.
  75. The disk is unregistered during the starpu_shutdown().
  76. \section OOCDataRegistration Data Registration
  77. StarPU will only be able to achieve Out-Of-Core eviction if it controls memory
  78. allocation. For instance, if the application does the following:
  79. \code{.c}
  80. p = malloc(1024*1024*sizeof(float));
  81. fill_with_data(p);
  82. starpu_matrix_data_register(&h, STARPU_MAIN_RAM, (uintptr_t) p, 1024, 1024, 1024, sizeof(float));
  83. \endcode
  84. StarPU will not be able to release the corresponding memory since it's the
  85. application which allocated it, and StarPU can not know how, and thus how to
  86. release it. One thus have to use the following instead:
  87. \code{.c}
  88. starpu_matrix_data_register(&h, -1, NULL, 1024, 1024, 1024, sizeof(float));
  89. starpu_task_insert(cl_fill_with_data, STARPU_W, h, 0);
  90. \endcode
  91. Which makes StarPU automatically do the allocation when the task running
  92. cl_fill_with_data gets executed. And then if its needs to, it will be able to
  93. release it after having pushed the data to the disk. Since no initial buffer is
  94. provided to starpu_matrix_data_register(), the handle does not have any initial
  95. value right after this call, and thus the very first task using the handle needs
  96. to use the ::STARPU_W mode like above, ::STARPU_R or ::STARPU_RW would not make
  97. sense.
  98. By default, StarPU will try to push any data handle to the disk.
  99. To specify whether a given handle should be pushed to the disk,
  100. starpu_data_set_ooc_flag() should be used.
  101. \section OOCWontUse Using Wont Use
  102. By default, StarPU uses a Least-Recently-Used (LRU) algorithm to determine
  103. which data should be evicted to the disk. This algorithm can be hinted
  104. by telling which data will no be used in the coming future thanks to
  105. starpu_data_wont_use(), for instance:
  106. \code{.c}
  107. starpu_task_insert(&cl_work, STARPU_RW, h, 0);
  108. starpu_data_wont_use(h);
  109. \endcode
  110. StarPU will mark the data as "inactive" and tend to evict to the disk that data
  111. rather than others.
  112. \section ExampleDiskCopy Examples: disk_copy
  113. \snippet disk_copy.c To be included. You should update doxygen if you see this text.
  114. \section ExampleDiskCompute Examples: disk_compute
  115. \snippet disk_compute.c To be included. You should update doxygen if you see this text.
  116. \section Performances
  117. Scheduling heuristics for Out-of-core are still relatively experimental. The
  118. tricky part is that you usually have to find a compromise between privileging
  119. locality (which avoids back and forth with the disk) and privileging the
  120. critical path, i.e. taking into account priorities to avoid lack of parallelism
  121. at the end of the task graph.
  122. It is notably better to avoid defining different priorities to tasks with low
  123. priority, since that will make the scheduler want to schedule them by levels of
  124. priority, at the depense of locality.
  125. The scheduling algorithms worth trying are thus <code>dmdar</code> and
  126. <code>lws</code>, which privilege data locality over priorities. There will be
  127. work on this area in the coming future.
  128. \section FeedBackFigures Feedback Figures
  129. Beyond pure performance feedback, some figures are interesting to have a look at.
  130. Using <c>export STARPU_BUS_STATS=1</c> (\ref STARPU_BUS_STATS and \ref STARPU_BUS_STATS_FILE
  131. to define a filename in which to display statistics, by default the
  132. standard error stream is used) gives an overview of the data
  133. transfers which were needed. The values can also be obtained at runtime
  134. by using starpu_bus_get_profiling_info(). An example can be read in
  135. <c>src/profiling/profiling_helpers.c</c>.
  136. \verbatim
  137. #---------------------
  138. Data transfer speed for /tmp/sthibault-disk-DJzhAj (node 1):
  139. 0 -> 1: 99 MB/s
  140. 1 -> 0: 99 MB/s
  141. 0 -> 1: 23858 µs
  142. 1 -> 0: 23858 µs
  143. #---------------------
  144. TEST DISK MEMORY
  145. #---------------------
  146. Data transfer stats:
  147. Disk 0 -> NUMA 0 0.0000 GB 0.0000 MB/s (transfers : 0 - avg -nan MB)
  148. NUMA 0 -> Disk 0 0.0625 GB 63.6816 MB/s (transfers : 2 - avg 32.0000 MB)
  149. Total transfers: 0.0625 GB
  150. #---------------------
  151. \endverbatim
  152. Using <c>export STARPU_ENABLE_STATS=1</c> gives information for each memory node
  153. on data miss/hit and allocation miss/hit.
  154. \verbatim
  155. #---------------------
  156. MSI cache stats :
  157. memory node NUMA 0
  158. hit : 32 (66.67 %)
  159. miss : 16 (33.33 %)
  160. memory node Disk 0
  161. hit : 0 (0.00 %)
  162. miss : 0 (0.00 %)
  163. #---------------------
  164. #---------------------
  165. Allocation cache stats:
  166. memory node NUMA 0
  167. total alloc : 16
  168. cached alloc: 0 (0.00 %)
  169. memory node Disk 0
  170. total alloc : 8
  171. cached alloc: 0 (0.00 %)
  172. #---------------------
  173. \endverbatim
  174. \section DiskFunctions Disk functions
  175. There are various ways to operate a disk memory node, described by the structure
  176. starpu_disk_ops. For instance, the variable #starpu_disk_unistd_ops
  177. uses read/write functions.
  178. All structures are in \ref API_Out_Of_Core.
  179. */