benchmarks.texi 1.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2012 University of Bordeaux
  4. @c See the file starpu.texi for copying conditions.
  5. @menu
  6. * Task size overhead:: Overhead of tasks depending on their size
  7. * Data transfer latency:: Latency of data transfers
  8. * Gemm:: Matrix-matrix multiplication
  9. * Cholesky:: Cholesky factorization
  10. * LU:: LU factorization
  11. @end menu
  12. Some interesting benchmarks are installed among examples in
  13. /usr/lib/starpu/examples . Make sure to try various schedulers, for instance
  14. STARPU_SCHED=dmda
  15. @node Task size overhead
  16. @section Task size overhead
  17. This benchmark gives a glimpse into how big a size should be for StarPU overhead
  18. to be low enough. Run @code{tasks_size_overhead.sh}, it will generate a plot
  19. of the speedup of tasks of various sizes, depending on the number of CPUs being
  20. used.
  21. @node Data transfer latency
  22. @section Data transfer latency
  23. @code{local_pingpong} performs a ping-pong between the first two CUDA nodes, and
  24. prints the measured latency.
  25. @node Gemm
  26. @section Matrix-matrix multiplication
  27. @code{sgemm} and @code{dgemm} perform a blocked matrix-matrix
  28. multiplication using BLAS and cuBLAS. They output the obtained GFlops.
  29. @node Cholesky
  30. @section Cholesky factorization
  31. @code{cholesky*} perform a Cholesky factorization (single precision). They use different dependency primitives.
  32. @node LU
  33. @section LU factorization
  34. @code{lu*} perform an LU factorization. They use different dependency primitives.