|
@@ -341,4 +341,29 @@ multiplication using BLAS and cuBLAS. They output the obtained GFlops.
|
|
|
|
|
|
<c>lu_*</c> perform an LU factorization. They use different dependency primitives.
|
|
|
|
|
|
+\subsection SimulatedBenchmarks Simulated benchmarks
|
|
|
+
|
|
|
+It can also be convenient to try simulated benchmarks, if you want to give a try
|
|
|
+at CPU-GPU scheduling without actually having a GPU at hand. This can be done by
|
|
|
+using the simgrid version of StarPU: first install the simgrid simulator from
|
|
|
+http://simgrid.gforge.inria.fr/ , then configure StarPU with \ref enable-simgrid "--enable-simgrid"
|
|
|
+and rebuild and install it, and then you can simulate the performance for a
|
|
|
+couple of virtualized system shipped along StarPU: attila and mirage.
|
|
|
+
|
|
|
+For instance:
|
|
|
+
|
|
|
+\verbatim
|
|
|
+$ export STARPU_PERF_MODEL_DIR=$STARPU_PATH/share/starpu/perfmodels/sampling
|
|
|
+$ export STARPU_HOSTNAME=attila
|
|
|
+$ $STARPU_PATH/lib/starpu/examples/cholesky_implicit
|
|
|
+\endverbatim
|
|
|
+
|
|
|
+Will show the performance of the cholesky factorization with the attila
|
|
|
+system. It will be interesting to try with different matrix sizes and
|
|
|
+schedulers.
|
|
|
+
|
|
|
+Performance models are available for cholesky_*, lu_*, *gemm, with block sizes
|
|
|
+320, 640, or 960, and for stencil with block size 128x128x128, 192x192x192, and
|
|
|
+256x256x256.
|
|
|
+
|
|
|
*/
|