470_simgrid.doxy 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2009-2021 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  4. * Copyright (C) 2020 Federal University of Rio Grande do Sul (UFRGS)
  5. *
  6. * StarPU is free software; you can redistribute it and/or modify
  7. * it under the terms of the GNU Lesser General Public License as published by
  8. * the Free Software Foundation; either version 2.1 of the License, or (at
  9. * your option) any later version.
  10. *
  11. * StarPU is distributed in the hope that it will be useful, but
  12. * WITHOUT ANY WARRANTY; without even the implied warranty of
  13. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  14. *
  15. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  16. */
  17. /*
  18. * NOTE: XXX: also update simgrid versions in 101_building.doxy !!
  19. */
  20. /*! \page SimGridSupport SimGrid Support
  21. StarPU can use Simgrid in order to simulate execution on an arbitrary
  22. platform. This was tested with SimGrid from 3.11 to 3.16, and 3.18 to
  23. 3.27. SimGrid version 3.25 needs to be configured with -Denable_msg=ON .
  24. Other versions may have compatibility issues. 3.17 notably does not build at
  25. all. MPI simulation does not work with version 3.22.
  26. \section Preparing Preparing Your Application For Simulation
  27. There are a few technical details which need to be handled for an application to
  28. be simulated through SimGrid.
  29. If the application uses <c>gettimeofday</c> to make its
  30. performance measurements, the real time will be used, which will be bogus. To
  31. get the simulated time, it has to use starpu_timing_now() which returns the
  32. virtual timestamp in us.
  33. For some technical reason, the application's .c file which contains \c main() has
  34. to be recompiled with \c starpu_simgrid_wrap.h, which in the SimGrid case will <c># define main()</c>
  35. into <c>starpu_main()</c>, and it is \c libstarpu which will provide the real \c main() and
  36. will call the application's \c main().
  37. To be able to test with crazy data sizes, one may want to only allocate
  38. application data if the macro \c STARPU_SIMGRID is not defined. Passing a <c>NULL</c> pointer to
  39. \c starpu_data_register functions is fine, data will never be read/written to by
  40. StarPU in SimGrid mode anyway.
  41. To be able to run the application with e.g. CUDA simulation on a system which
  42. does not have CUDA installed, one can fill the starpu_codelet::cuda_funcs with \c (void*)1, to
  43. express that there is a CUDA implementation, even if one does not actually
  44. provide it. StarPU will not actually run it in SimGrid mode anyway by default
  45. (unless the ::STARPU_CODELET_SIMGRID_EXECUTE or ::STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT
  46. flags are set in the codelet)
  47. \snippet simgrid.c To be included. You should update doxygen if you see this text.
  48. \section Calibration Calibration
  49. The idea is to first compile StarPU normally, and run the application,
  50. so as to automatically benchmark the bus and the codelets.
  51. \verbatim
  52. $ ./configure && make
  53. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  54. [starpu][_starpu_load_history_based_model] Warning: model matvecmult
  55. is not calibrated, forcing calibration for this run. Use the
  56. STARPU_CALIBRATE environment variable to control this.
  57. $ ...
  58. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  59. TEST PASSED
  60. \endverbatim
  61. Note that we force to use the scheduler <c>dmda</c> to generate
  62. performance models for the application. The application may need to be
  63. run several times before the model is calibrated.
  64. \section Simulation Simulation
  65. Then, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"
  66. to <c>configure</c>. Make sure to keep all other <c>configure</c> options
  67. the same, and notably options such as <c>--enable-maxcudadev</c>.
  68. \verbatim
  69. $ ./configure --enable-simgrid
  70. \endverbatim
  71. To specify the location of SimGrid, you can either set the environment
  72. variables \c SIMGRID_CFLAGS and \c SIMGRID_LIBS, or use the \c configure
  73. options \ref with-simgrid-dir "--with-simgrid-dir",
  74. \ref with-simgrid-include-dir "--with-simgrid-include-dir" and
  75. \ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example
  76. \verbatim
  77. $ ./configure --with-simgrid-dir=/opt/local/simgrid
  78. \endverbatim
  79. You can then re-run the application.
  80. \verbatim
  81. $ make
  82. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  83. TEST FAILED !!!
  84. \endverbatim
  85. It is normal that the test fails: since the computation are not actually done
  86. (that is the whole point of SimGrid), the result is wrong, of course.
  87. If the performance model is not calibrated enough, the following error
  88. message will be displayed
  89. \verbatim
  90. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  91. [starpu][_starpu_load_history_based_model] Warning: model matvecmult
  92. is not calibrated, forcing calibration for this run. Use the
  93. STARPU_CALIBRATE environment variable to control this.
  94. [starpu][_starpu_simgrid_execute_job][assert failure] Codelet
  95. matvecmult does not have a perfmodel, or is not calibrated enough
  96. \endverbatim
  97. The number of devices can be chosen as usual with \ref STARPU_NCPU,
  98. \ref STARPU_NCUDA, and \ref STARPU_NOPENCL, and the amount of GPU memory
  99. with \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM,
  100. \ref STARPU_LIMIT_OPENCL_MEM, and \ref STARPU_LIMIT_OPENCL_devid_MEM.
  101. \section SimulationOnAnotherMachine Simulation On Another Machine
  102. The SimGrid support even permits to perform simulations on another machine, your
  103. desktop, typically. To achieve this, one still needs to perform the Calibration
  104. step on the actual machine to be simulated, then copy them to your desktop
  105. machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
  106. Simulation step on the desktop machine, by setting the environment
  107. variable \ref STARPU_HOSTNAME to the name of the actual machine, to
  108. make StarPU use the performance models of the simulated machine even
  109. on the desktop machine. To use multiple performance models in different ranks,
  110. in case of smpi executions in a heterogeneous platform, it is possible to use the
  111. option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
  112. \ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
  113. If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
  114. use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
  115. source code will probably disable the CUDA and OpenCL codelets in that
  116. case. Since during SimGrid execution, the functions of the codelet are actually
  117. not called by default, one can use dummy functions such as the following to
  118. still permit CUDA or OpenCL execution.
  119. \section SimulationExamples Simulation Examples
  120. StarPU ships a few performance models for a couple of systems: \c attila,
  121. \c mirage, \c idgraf, and \c sirocco. See Section \ref SimulatedBenchmarks for the details.
  122. \section FakeSimulations Simulations On Fake Machines
  123. It is possible to build fake machines which do not exist, by modifying the
  124. platform file in <c>$STARPU_HOME/.starpu/sampling/bus/machine.platform.xml</c>
  125. by hand: one can add more CPUs, add GPUs (but the performance model file has to
  126. be extended as well), change the available GPU memory size, PCI memory bandwidth, etc.
  127. \section TweakingSimulation Tweaking Simulation
  128. The simulation can be tweaked, to be able to tune it between a very accurate
  129. simulation and a very simple simulation (which is thus close to scheduling
  130. theory results), see the \ref STARPU_SIMGRID_TRANSFER_COST, \ref STARPU_SIMGRID_CUDA_MALLOC_COST,
  131. \ref STARPU_SIMGRID_CUDA_QUEUE_COST, \ref STARPU_SIMGRID_TASK_SUBMIT_COST,
  132. \ref STARPU_SIMGRID_FETCHING_INPUT_COST and \ref STARPU_SIMGRID_SCHED_COST environment variables.
  133. \section SimulationMPIApplications MPI Applications
  134. StarPU-MPI applications can also be run in SimGrid mode. smpi currently requires
  135. that StarPU be build statically only, so <c>--disable-shared</c> needs to be
  136. passed to <c>./configure</c>.
  137. The application needs to be compiled with \c smpicc, and run using the
  138. <c>starpu_smpirun</c> script, for instance:
  139. \verbatim
  140. $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong
  141. \endverbatim
  142. Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
  143. list of MPI nodes to be used. In homogeneous MPI clusters: for each MPI node it
  144. will just replicate the architecture referred by
  145. \ref STARPU_HOSTNAME. To use multiple performance models in different ranks,
  146. in case of a heterogeneous platform, it is possible to use the
  147. option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
  148. \ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
  149. So as to use FxT traces, libfxt also needs to be built statically, <b>and</b>
  150. with dynamic linking flags, i.e. with
  151. \verbatim
  152. CFLAGS=-fPIC ./configure --enable-static
  153. \endverbatim
  154. \section SimulationDebuggingApplications Debugging Applications
  155. By default, SimGrid uses its own implementation of threads, which prevents \c gdb
  156. from being able to inspect stacks of all threads. To be able to fully debug an
  157. application running with SimGrid, pass the <c>--cfg=contexts/factory:thread</c>
  158. option to the application, to make SimGrid use system threads, which \c gdb will be
  159. able to manipulate as usual.
  160. It is also worth noting SimGrid 3.21's new parameter
  161. <c>--cfg=simix/breakpoint</c> which allows to put a breakpoint at a precise
  162. (deterministic!) timing of the execution. If for instance in an execution
  163. trace we see that something odd is happening at time 19000ms, we can use
  164. <c>--cfg=simix/breakpoint:19.000</c> and \c SIGTRAP will be raised at that point,
  165. which will thus interrupt execution within \c gdb, allowing to inspect e.g.
  166. scheduler state, etc.
  167. \section SimulationMemoryUsage Memory Usage
  168. Since kernels are not actually run and data transfers are not actually
  169. performed, the data memory does not actually need to be allocated. This allows
  170. for instance to simulate the execution of applications processing very big data
  171. on a small laptop.
  172. The application can for instance pass <c>1</c> (or whatever bogus pointer)
  173. to starpu data registration functions, instead of allocating data. This will
  174. however require the application to take care of not trying to access the data,
  175. and will not work in MPI mode, which performs transfers.
  176. Another way is to pass the \ref STARPU_MALLOC_SIMULATION_FOLDED flag to the
  177. starpu_malloc_flags() function. This will make it allocate a memory area which
  178. one can read/write, but optimized so that this does not actually consume
  179. memory. Of course, the values read from such area will be bogus, but this allows
  180. the application to keep e.g. data load, store, initialization as it is, and also
  181. work in MPI mode.
  182. Note however that notably Linux kernels refuse obvious memory overcommitting by
  183. default, so a single allocation can typically not be bigger than the amount of
  184. physical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
  185. This prevents for instance from allocating a single huge matrix. Allocating a
  186. huge matrix in several tiles is not a problem, however. <c>sysctl
  187. vm.overcommit_memory=1</c> can also be used to allow such overcommit.
  188. Note however that this folding is done by remapping the same file several times,
  189. and Linux kernels will also refuse to create too many memory areas. <c>sysctl
  190. vm.max_map_count</c> can be used to check and change the default (65535). By
  191. default, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. This
  192. however limits the amount of such folded memory to a bit below 64GiB. The
  193. \ref STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase the
  194. size of the file.
  195. */