21simgrid.doxy 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204
  1. /*
  2. * This file is part of the StarPU Handbook.
  3. * Copyright (C) 2009--2011 Universit@'e de Bordeaux
  4. * Copyright (C) 2010, 2011, 2012, 2013, 2014 CNRS
  5. * Copyright (C) 2011, 2012 INRIA
  6. * See the file version.doxy for copying conditions.
  7. */
  8. /*! \page SimGridSupport SimGrid Support
  9. StarPU can use Simgrid in order to simulate execution on an arbitrary
  10. platform. This was tested with simgrid 3.11, 3.12 and 3.13, other versions may have
  11. compatibility issues.
  12. \section Preparing Preparing your application for simulation.
  13. There are a few technical details which need to be handled for an application to
  14. be simulated through Simgrid.
  15. If the application uses <c>gettimeofday</c> to make its
  16. performance measurements, the real time will be used, which will be bogus. To
  17. get the simulated time, it has to use starpu_timing_now() which returns the
  18. virtual timestamp in us.
  19. For some technical reason, the application's .c file which contains main() has
  20. to be recompiled with starpu_simgrid_wrap.h, which in the simgrid case will # define main()
  21. into starpu_main(), and it is libstarpu which will provide the real main() and
  22. will call the application's main().
  23. To be able to test with crazy data sizes, one may want to only allocate
  24. application data if STARPU_SIMGRID is not defined. Passing a NULL pointer to
  25. starpu_data_register functions is fine, data will never be read/written to by
  26. StarPU in Simgrid mode anyway.
  27. To be able to run the application with e.g. CUDA simulation on a system which
  28. does not have CUDA installed, one can fill the cuda_funcs with (void*)1, to
  29. express that there is a CUDA implementation, even if one does not actually
  30. provide it. StarPU will not actually run it in Simgrid mode anyway by default
  31. (unless the ::STARPU_CODELET_SIMGRID_EXECUTE flag is set in the codelet)
  32. \section Calibration Calibration
  33. The idea is to first compile StarPU normally, and run the application,
  34. so as to automatically benchmark the bus and the codelets.
  35. \verbatim
  36. $ ./configure && make
  37. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  38. [starpu][_starpu_load_history_based_model] Warning: model matvecmult
  39. is not calibrated, forcing calibration for this run. Use the
  40. STARPU_CALIBRATE environment variable to control this.
  41. $ ...
  42. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  43. TEST PASSED
  44. \endverbatim
  45. Note that we force to use the scheduler <c>dmda</c> to generate
  46. performance models for the application. The application may need to be
  47. run several times before the model is calibrated.
  48. \section Simulation Simulation
  49. Then, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"
  50. to <c>./configure</c>.
  51. \verbatim
  52. $ ./configure --enable-simgrid
  53. \endverbatim
  54. To specify the location of SimGrid, you can either set the environment
  55. variables SIMGRID_CFLAGS and SIMGRID_LIBS, or use the configure
  56. options \ref with-simgrid-dir "--with-simgrid-dir",
  57. \ref with-simgrid-include-dir "--with-simgrid-include-dir" and
  58. \ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example
  59. \verbatim
  60. $ ./configure --with-simgrid-dir=/opt/local/simgrid
  61. \endverbatim
  62. You can then re-run the application.
  63. \verbatim
  64. $ make
  65. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  66. TEST FAILED !!!
  67. \endverbatim
  68. It is normal that the test fails: since the computation are not actually done
  69. (that is the whole point of simgrid), the result is wrong, of course.
  70. If the performance model is not calibrated enough, the following error
  71. message will be displayed
  72. \verbatim
  73. $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
  74. [starpu][_starpu_load_history_based_model] Warning: model matvecmult
  75. is not calibrated, forcing calibration for this run. Use the
  76. STARPU_CALIBRATE environment variable to control this.
  77. [starpu][_starpu_simgrid_execute_job][assert failure] Codelet
  78. matvecmult does not have a perfmodel, or is not calibrated enough
  79. \endverbatim
  80. The number of devices can be chosen as usual with \ref STARPU_NCPU, \ref
  81. STARPU_NCUDA, and \ref STARPU_NOPENCL, and the amount of GPU memory
  82. with \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM, \ref
  83. STARPU_LIMIT_OPENCL_MEM, and \ref STARPU_LIMIT_OPENCL_devid_MEM.
  84. \section SimulationOnAnotherMachine Simulation On Another Machine
  85. The simgrid support even permits to perform simulations on another machine, your
  86. desktop, typically. To achieve this, one still needs to perform the Calibration
  87. step on the actual machine to be simulated, then copy them to your desktop
  88. machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
  89. Simulation step on the desktop machine, by setting the environment
  90. variable \ref STARPU_HOSTNAME to the name of the actual machine, to
  91. make StarPU use the performance models of the simulated machine even
  92. on the desktop machine.
  93. If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
  94. use simgrid to simulate execution with CUDA/OpenCL devices, but the application
  95. source code will probably disable the CUDA and OpenCL codelets in thatcd sc
  96. case. Since during simgrid execution, the functions of the codelet are actually
  97. not called by default, one can use dummy functions such as the following to
  98. still permit CUDA or OpenCL execution.
  99. \section SimulationExamples Simulation examples
  100. StarPU ships a few performance models for a couple of systems: attila,
  101. mirage, idgraf, and sirocco. See section \ref SimulatedBenchmarks for the details.
  102. \section FakeSimulations Simulations on fake machines
  103. It is possible to build fake machines which do not exist, by modifying the
  104. platform file in <c>$STARPU_HOME/.starpu/sampling/bus/machine.platform.xml</c>
  105. by hand: one can add more CPUs, add GPUs (but the performance model file has to
  106. be extended as well), change the available GPU memory size, PCI memory bandwidth, etc.
  107. \section Tweaking simulation
  108. The simulation can be tweaked, to be able to tune it between a very accurate
  109. simulation and a very simple simulation (which is thus close to scheduling
  110. theory results), see the \ref STARPU_SIMGRID_CUDA_MALLOC_COST and \ref
  111. STARPU_SIMGRID_CUDA_QUEUE_COST environment variables.
  112. \section SimulationMPIApplications MPI applications
  113. StarPU-MPI applications can also be run in simgrid mode. It needs to be compiled
  114. with smpicc, and run using the starpu_smpirun script, for instance:
  115. \verbatim
  116. $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong
  117. \endverbatim
  118. Where cluster.xml is a Simgrid-MPI platform description, and hostfile the
  119. list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
  120. clusters: for each MPI node it will just replicate the architecture referred by
  121. \ref STARPU_HOSTNAME.
  122. \section SimulationDebuggingApplications Debugging applications
  123. By default, simgrid uses its own implementation of threads, which prevents gdb
  124. from being able to inspect stacks of all threads. To be able to fully debug an
  125. application running with simgrid, pass the <c>--cfg=contexts/factory:thread</c>
  126. option to the application, to make simgrid use system threads, which gdb will be
  127. able to manipulate as usual.
  128. \snippet simgrid.c To be included. You should update doxygen if you see this text.
  129. \section SimulationMemoryUsage Memory usage
  130. Since kernels are not actually run and data transfers are not actually
  131. performed, the data memory does not actually need to be allocated. This allows
  132. for instance to simulate the execution of applications processing very big data
  133. on a small laptop.
  134. The application can for instance pass <c>1</c> (or whatever bogus pointer)
  135. to starpu data registration functions, instead of allocating data. This will
  136. however require the application to take care of not trying to access the data,
  137. and will not work in MPI mode, which performs transfers.
  138. Another way is to pass the STARPU_MALLOC_SIMULATION_FOLDED flag to the
  139. starpu_malloc_flags() function. This will make it allocate a memory area which
  140. one can read/write, but optimized so that this does not actually consume
  141. memory. Of course, the values read from such area will be bogus, but this allows
  142. the application to keep e.g. data load, store, initialization as it is, and also
  143. work in MPI mode.
  144. Note however that notably Linux kernels refuse obvious memory overcommitting by
  145. default, so a single allocation can typically not be bigger than the amount of
  146. physical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
  147. This prevents for instance from allocating a single huge matrix. Allocating a
  148. huge matrix in several tiles is not a problem, however. <c>sysctl
  149. vm.overcommit_memory=1</c> can also be used to allow such overcommit.
  150. Note however that this folding is done by remapping the same file several times,
  151. and Linux kernels will also refuse to create too many memory areas. <c>sysctl
  152. vm.max_map_count</c> can be used to check and change the default (65535). By
  153. default, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. This
  154. however limits the amount of such folded memory to a bit below 64GiB. The \ref
  155. STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase the
  156. size of the file.
  157. */