123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2011,2012,2014,2016,2017 Inria
- * Copyright (C) 2010-2019 CNRS
- * Copyright (C) 2009-2011,2014-2018 Université de Bordeaux
- * Copyright (C) 2009-2011,2014-2018 Université de Bordeaux,
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page SimGridSupport SimGrid Support
- StarPU can use Simgrid in order to simulate execution on an arbitrary
- platform. This was tested with simgrid from 3.11 to 3.16, and 3.18 to 3.20.
- Other versions may have compatibility issues. 3.17 notably does not build at
- all.
- \section Preparing Preparing Your Application For Simulation
- There are a few technical details which need to be handled for an application to
- be simulated through Simgrid.
- If the application uses <c>gettimeofday</c> to make its
- performance measurements, the real time will be used, which will be bogus. To
- get the simulated time, it has to use starpu_timing_now() which returns the
- virtual timestamp in us.
- For some technical reason, the application's .c file which contains \c main() has
- to be recompiled with \c starpu_simgrid_wrap.h, which in the simgrid case will <c># define main()</c>
- into <c>starpu_main()</c>, and it is \c libstarpu which will provide the real \c main() and
- will call the application's \c main().
- To be able to test with crazy data sizes, one may want to only allocate
- application data if the macro \c STARPU_SIMGRID is not defined. Passing a <c>NULL</c> pointer to
- \c starpu_data_register functions is fine, data will never be read/written to by
- StarPU in Simgrid mode anyway.
- To be able to run the application with e.g. CUDA simulation on a system which
- does not have CUDA installed, one can fill the starpu_codelet::cuda_funcs with \c (void*)1, to
- express that there is a CUDA implementation, even if one does not actually
- provide it. StarPU will not actually run it in Simgrid mode anyway by default
- (unless the ::STARPU_CODELET_SIMGRID_EXECUTE or ::STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT
- flags are set in the codelet)
- \snippet simgrid.c To be included. You should update doxygen if you see this text.
- \section Calibration Calibration
- The idea is to first compile StarPU normally, and run the application,
- so as to automatically benchmark the bus and the codelets.
- \verbatim
- $ ./configure && make
- $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
- [starpu][_starpu_load_history_based_model] Warning: model matvecmult
- is not calibrated, forcing calibration for this run. Use the
- STARPU_CALIBRATE environment variable to control this.
- $ ...
- $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
- TEST PASSED
- \endverbatim
- Note that we force to use the scheduler <c>dmda</c> to generate
- performance models for the application. The application may need to be
- run several times before the model is calibrated.
- \section Simulation Simulation
- Then, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"
- to <c>configure</c>. Make sure to keep all other <c>configure</c> options
- the same, and notably options such as <c>--enable-maxcudadev</c>.
- \verbatim
- $ ./configure --enable-simgrid
- \endverbatim
- To specify the location of SimGrid, you can either set the environment
- variables \c SIMGRID_CFLAGS and \c SIMGRID_LIBS, or use the \c configure
- options \ref with-simgrid-dir "--with-simgrid-dir",
- \ref with-simgrid-include-dir "--with-simgrid-include-dir" and
- \ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example
- \verbatim
- $ ./configure --with-simgrid-dir=/opt/local/simgrid
- \endverbatim
- You can then re-run the application.
- \verbatim
- $ make
- $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
- TEST FAILED !!!
- \endverbatim
- It is normal that the test fails: since the computation are not actually done
- (that is the whole point of SimGrid), the result is wrong, of course.
- If the performance model is not calibrated enough, the following error
- message will be displayed
- \verbatim
- $ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
- [starpu][_starpu_load_history_based_model] Warning: model matvecmult
- is not calibrated, forcing calibration for this run. Use the
- STARPU_CALIBRATE environment variable to control this.
- [starpu][_starpu_simgrid_execute_job][assert failure] Codelet
- matvecmult does not have a perfmodel, or is not calibrated enough
- \endverbatim
- The number of devices can be chosen as usual with \ref STARPU_NCPU,
- \ref STARPU_NCUDA, and \ref STARPU_NOPENCL, and the amount of GPU memory
- with \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM,
- \ref STARPU_LIMIT_OPENCL_MEM, and \ref STARPU_LIMIT_OPENCL_devid_MEM.
- \section SimulationOnAnotherMachine Simulation On Another Machine
- The SimGrid support even permits to perform simulations on another machine, your
- desktop, typically. To achieve this, one still needs to perform the Calibration
- step on the actual machine to be simulated, then copy them to your desktop
- machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
- Simulation step on the desktop machine, by setting the environment
- variable \ref STARPU_HOSTNAME to the name of the actual machine, to
- make StarPU use the performance models of the simulated machine even
- on the desktop machine.
- If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
- use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
- source code will probably disable the CUDA and OpenCL codelets in that
- case. Since during SimGrid execution, the functions of the codelet are actually
- not called by default, one can use dummy functions such as the following to
- still permit CUDA or OpenCL execution.
- \section SimulationExamples Simulation Examples
- StarPU ships a few performance models for a couple of systems: \c attila,
- \c mirage, \c idgraf, and \c sirocco. See Section \ref SimulatedBenchmarks for the details.
- \section FakeSimulations Simulations On Fake Machines
- It is possible to build fake machines which do not exist, by modifying the
- platform file in <c>$STARPU_HOME/.starpu/sampling/bus/machine.platform.xml</c>
- by hand: one can add more CPUs, add GPUs (but the performance model file has to
- be extended as well), change the available GPU memory size, PCI memory bandwidth, etc.
- \section TweakingSimulation Tweaking Simulation
- The simulation can be tweaked, to be able to tune it between a very accurate
- simulation and a very simple simulation (which is thus close to scheduling
- theory results), see the \ref STARPU_SIMGRID_CUDA_MALLOC_COST,
- \ref STARPU_SIMGRID_CUDA_QUEUE_COST, \ref STARPU_SIMGRID_TASK_SUBMIT_COST,
- \ref STARPU_SIMGRID_FETCHING_INPUT_COST and \ref STARPU_SIMGRID_SCHED_COST environment variables.
- \section SimulationMPIApplications MPI Applications
- StarPU-MPI applications can also be run in SimGrid mode. It needs to be compiled
- with \c smpicc, and run using the <c>starpu_smpirun</c> script, for instance:
- \verbatim
- $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong
- \endverbatim
- Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
- list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
- clusters: for each MPI node it will just replicate the architecture referred by
- \ref STARPU_HOSTNAME.
- \section SimulationDebuggingApplications Debugging Applications
- By default, SimGrid uses its own implementation of threads, which prevents \c gdb
- from being able to inspect stacks of all threads. To be able to fully debug an
- application running with SimGrid, pass the <c>--cfg=contexts/factory:thread</c>
- option to the application, to make SimGrid use system threads, which \c gdb will be
- able to manipulate as usual.
- It is also worth noting SimGrid 3.21's new parameter
- <c>--cfg=simix/breakpoint</c> which allows to put a breakpoint at a precise
- (deterministic!) timing of the execution. If for instance in an execution
- trace we see that something odd is happening at time 19000ms, we can use
- <c>--cfg=simix/breakpoint:19.000</c> and \c SIGTRAP will be raised at that point,
- which will thus interrupt execution within \c gdb, allowing to inspect e.g.
- scheduler state, etc.
- \section SimulationMemoryUsage Memory Usage
- Since kernels are not actually run and data transfers are not actually
- performed, the data memory does not actually need to be allocated. This allows
- for instance to simulate the execution of applications processing very big data
- on a small laptop.
- The application can for instance pass <c>1</c> (or whatever bogus pointer)
- to starpu data registration functions, instead of allocating data. This will
- however require the application to take care of not trying to access the data,
- and will not work in MPI mode, which performs transfers.
- Another way is to pass the \ref STARPU_MALLOC_SIMULATION_FOLDED flag to the
- starpu_malloc_flags() function. This will make it allocate a memory area which
- one can read/write, but optimized so that this does not actually consume
- memory. Of course, the values read from such area will be bogus, but this allows
- the application to keep e.g. data load, store, initialization as it is, and also
- work in MPI mode.
- Note however that notably Linux kernels refuse obvious memory overcommitting by
- default, so a single allocation can typically not be bigger than the amount of
- physical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
- This prevents for instance from allocating a single huge matrix. Allocating a
- huge matrix in several tiles is not a problem, however. <c>sysctl
- vm.overcommit_memory=1</c> can also be used to allow such overcommit.
- Note however that this folding is done by remapping the same file several times,
- and Linux kernels will also refuse to create too many memory areas. <c>sysctl
- vm.max_map_count</c> can be used to check and change the default (65535). By
- default, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. This
- however limits the amount of such folded memory to a bit below 64GiB. The
- \ref STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase the
- size of the file.
- */
|