|
@@ -2,7 +2,7 @@
|
|
|
*
|
|
|
* Copyright (C) 2011-2013,2015,2017 Inria
|
|
|
* Copyright (C) 2010-2018 CNRS
|
|
|
- * Copyright (C) 2009-2011,2013-2017 Université de Bordeaux
|
|
|
+ * Copyright (C) 2009-2011,2013-2018 Université de Bordeaux
|
|
|
*
|
|
|
* StarPU is free software; you can redistribute it and/or modify
|
|
|
* it under the terms of the GNU Lesser General Public License as published by
|
|
@@ -26,6 +26,26 @@ performance, we give below a list of features which should be checked.
|
|
|
For a start, you can use \ref OfflinePerformanceTools to get a Gantt chart which
|
|
|
will show roughly where time is spent, and focus correspondingly.
|
|
|
|
|
|
+\section CheckTaskSize Check Task Size
|
|
|
+
|
|
|
+Make sure that your tasks are not too small, because the StarPU runtime overhead
|
|
|
+is not completely zero. You can run the tasks_size_overhead.sh script to get an
|
|
|
+idea of the scalability of tasks depending on their duration (in µs), on your
|
|
|
+own system.
|
|
|
+
|
|
|
+Typically, 10µs-ish tasks are definitely too small, the CUDA overhead itself is
|
|
|
+much bigger than this.
|
|
|
+
|
|
|
+1ms-ish tasks may be a good start, but will not necessarily scale to many dozens
|
|
|
+of cores, so it's better to try to get 10ms-ish tasks.
|
|
|
+
|
|
|
+Tasks durations can easily be observed when performance models are defined (see
|
|
|
+\ref PerformanceModelExample) by using the <c>starpu_perfmodel_plot</c> or
|
|
|
+<c>starpu_perfmodel_display</c> tool (see \ref PerformanceOfCodelets)
|
|
|
+
|
|
|
+When using parallel tasks, the problem is even worse since StarPU has to
|
|
|
+synchronize the execution of tasks.
|
|
|
+
|
|
|
\section ConfigurationImprovePerformance Configuration Which May Improve Performance
|
|
|
|
|
|
The \ref enable-fast "--enable-fast" configuration option disables all
|