| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207 | /* * This file is part of the StarPU Handbook. * Copyright (C) 2009--2011  Universit@'e de Bordeaux * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2016  CNRS * Copyright (C) 2011, 2012, 2016 INRIA * See the file version.doxy for copying conditions. *//*! \page SimGridSupport SimGrid SupportStarPU can use Simgrid in order to simulate execution on an arbitraryplatform. This was tested with simgrid from 3.11 to 3.15,other versions may have compatibility issues.\section Preparing Preparing Your Application For SimulationThere are a few technical details which need to be handled for an application tobe simulated through Simgrid.If the application uses <c>gettimeofday</c> to make itsperformance measurements, the real time will be used, which will be bogus. Toget the simulated time, it has to use starpu_timing_now() which returns thevirtual timestamp in us.For some technical reason, the application's .c file which contains main() hasto be recompiled with starpu_simgrid_wrap.h, which in the simgrid case will # define main()into starpu_main(), and it is libstarpu which will provide the real main() andwill call the application's main().To be able to test with crazy data sizes, one may want to only allocateapplication data if STARPU_SIMGRID is not defined.  Passing a <c>NULL</c> pointer tostarpu_data_register functions is fine, data will never be read/written to byStarPU in Simgrid mode anyway.To be able to run the application with e.g. CUDA simulation on a system whichdoes not have CUDA installed, one can fill the cuda_funcs with (void*)1, toexpress that there is a CUDA implementation, even if one does not actuallyprovide it. StarPU will not actually run it in Simgrid mode anyway by default(unless the ::STARPU_CODELET_SIMGRID_EXECUTE or ::STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECTflags are set in the codelet)\snippet simgrid.c To be included. You should update doxygen if you see this text.\section Calibration CalibrationThe idea is to first compile StarPU normally, and run the application,so as to automatically benchmark the bus and the codelets.\verbatim$ ./configure && make$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult[starpu][_starpu_load_history_based_model] Warning: model matvecmult   is not calibrated, forcing calibration for this run. Use the   STARPU_CALIBRATE environment variable to control this.$ ...$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmultTEST PASSED\endverbatimNote that we force to use the scheduler <c>dmda</c> to generateperformance models for the application. The application may need to berun several times before the model is calibrated.\section Simulation SimulationThen, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"to <c>./configure</c>. Make sure to keep all other <c>./configure</c> optionsthe same, and notably options such as <c>--enable-maxcudadev</c>.\verbatim$ ./configure --enable-simgrid\endverbatimTo specify the location of SimGrid, you can either set the environmentvariables SIMGRID_CFLAGS and SIMGRID_LIBS, or use the configureoptions \ref with-simgrid-dir "--with-simgrid-dir",\ref with-simgrid-include-dir "--with-simgrid-include-dir" and\ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example\verbatim$ ./configure --with-simgrid-dir=/opt/local/simgrid\endverbatimYou can then re-run the application.\verbatim$ make$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmultTEST FAILED !!!\endverbatimIt is normal that the test fails: since the computation are not actually done(that is the whole point of simgrid), the result is wrong, of course.If the performance model is not calibrated enough, the following errormessage will be displayed\verbatim$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult[starpu][_starpu_load_history_based_model] Warning: model matvecmult    is not calibrated, forcing calibration for this run. Use the    STARPU_CALIBRATE environment variable to control this.[starpu][_starpu_simgrid_execute_job][assert failure] Codelet    matvecmult does not have a perfmodel, or is not calibrated enough\endverbatimThe number of devices can be chosen as usual with \ref STARPU_NCPU,\ref STARPU_NCUDA, and \ref STARPU_NOPENCL, and the amount of GPU memorywith \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM,\ref STARPU_LIMIT_OPENCL_MEM, and \ref STARPU_LIMIT_OPENCL_devid_MEM.\section SimulationOnAnotherMachine Simulation On Another MachineThe simgrid support even permits to perform simulations on another machine, yourdesktop, typically. To achieve this, one still needs to perform the Calibrationstep on the actual machine to be simulated, then copy them to your desktopmachine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform theSimulation step on the desktop machine, by setting the environmentvariable \ref STARPU_HOSTNAME to the name of the actual machine, tomake StarPU use the performance models of the simulated machine evenon the desktop machine.If the desktop machine does not have CUDA or OpenCL, StarPU is still able touse simgrid to simulate execution with CUDA/OpenCL devices, but the applicationsource code will probably disable the CUDA and OpenCL codelets in thatcd sccase. Since during simgrid execution, the functions of the codelet are actuallynot called by default, one can use dummy functions such as the following tostill permit CUDA or OpenCL execution.\section SimulationExamples Simulation ExamplesStarPU ships a few performance models for a couple of systems: attila,mirage, idgraf, and sirocco. See section \ref SimulatedBenchmarks for the details.\section FakeSimulations Simulations On Fake MachinesIt is possible to build fake machines which do not exist, by modifying theplatform file in <c>$STARPU_HOME/.starpu/sampling/bus/machine.platform.xml</c>by hand: one can add more CPUs, add GPUs (but the performance model file has tobe extended as well), change the available GPU memory size, PCI memory bandwidth, etc.\section TweakingSimulation Tweaking SimulationThe simulation can be tweaked, to be able to tune it between a very accuratesimulation and a very simple simulation (which is thus close to schedulingtheory results), see the \ref STARPU_SIMGRID_CUDA_MALLOC_COST,\ref STARPU_SIMGRID_CUDA_QUEUE_COST, \ref STARPU_SIMGRID_TASK_SUBMIT_COST,\ref STARPU_SIMGRID_FETCHING_INPUT_COST and STARPU_SIMGRID_SCHED_COST environment variables.\section SimulationMPIApplications MPI ApplicationsStarPU-MPI applications can also be run in simgrid mode. It needs to be compiledwith smpicc, and run using the <c>starpu_smpirun</c> script, for instance:\verbatim$ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong\endverbatimWhere cluster.xml is a Simgrid-MPI platform description, and hostfile thelist of MPI nodes to be used. StarPU currently only supports homogeneous MPIclusters: for each MPI node it will just replicate the architecture referred by\ref STARPU_HOSTNAME.\section SimulationDebuggingApplications Debugging ApplicationsBy default, simgrid uses its own implementation of threads, which prevents gdbfrom being able to inspect stacks of all threads.  To be able to fully debug anapplication running with simgrid, pass the <c>--cfg=contexts/factory:thread</c>option to the application, to make simgrid use system threads, which gdb will beable to manipulate as usual.\section SimulationMemoryUsage Memory UsageSince kernels are not actually run and data transfers are not actuallyperformed, the data memory does not actually need to be allocated.  This allowsfor instance to simulate the execution of applications processing very big dataon a small laptop.The application can for instance pass <c>1</c> (or whatever bogus pointer)to starpu data registration functions, instead of allocating data. This willhowever require the application to take care of not trying to access the data,and will not work in MPI mode, which performs transfers.Another way is to pass the \ref STARPU_MALLOC_SIMULATION_FOLDED flag to thestarpu_malloc_flags() function. This will make it allocate a memory area whichone can read/write, but optimized so that this does not actually consumememory. Of course, the values read from such area will be bogus, but this allowsthe application to keep e.g. data load, store, initialization as it is, and alsowork in MPI mode.Note however that notably Linux kernels refuse obvious memory overcommitting bydefault, so a single allocation can typically not be bigger than the amount ofphysical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accountingThis prevents for instance from allocating a single huge matrix. Allocating ahuge matrix in several tiles is not a problem, however. <c>sysctlvm.overcommit_memory=1</c> can also be used to allow such overcommit.Note however that this folding is done by remapping the same file several times,and Linux kernels will also refuse to create too many memory areas. <c>sysctlvm.max_map_count</c> can be used to check and change the default (65535). Bydefault, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. Thishowever limits the amount of such folded memory to a bit below 64GiB. The\ref STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase thesize of the file.*/
 |