| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748 | @c -*-texinfo-*-@c This file is part of the StarPU Handbook.@c Copyright (C) 2012  University of Bordeaux@c See the file starpu.texi for copying conditions.@menu* Task size overhead::           Overhead of tasks depending on their size* Data transfer latency::        Latency of data transfers * Gemm::                         Matrix-matrix multiplication* Cholesky::                     Cholesky factorization* LU::                           LU factorization@end menuSome interesting benchmarks are installed among examples in/usr/lib/starpu/examples . Make sure to try various schedulers, for instanceSTARPU_SCHED=heft@node Task size overhead@section Task size overheadThis benchmark gives a glimpse into how big a size should be for StarPU overheadto be low enough.  Run @code{tasks_size_overhead.sh}, it will generate a plotof the speedup of tasks of various sizes, depending on the number of CPUs beingused.@node Data transfer latency@section Data transfer latency@code{local_pingpong} performs a ping-pong between the first two CUDA nodes, andprints the measured latency.@node Gemm@section Matrix-matrix multiplication@code{sgemm} and @code{dgemm} perform a blocked matrix-matrixmultiplication using BLAS and cuBLAS. They output the obtained GFlops.@node Cholesky@section Cholesky factorization@code{cholesky*} perform a Cholesky factorization (single precision). They use different dependency primitives.@node LU@section LU factorization@code{lu*} perform an LU factorization. They use different dependency primitives.
 |