|
@@ -1195,10 +1195,13 @@ type). This still assumes performance regularity, but can work with various data
|
|
|
input sizes, by applying a*n^b+c regression over observed execution times.
|
|
|
@end itemize
|
|
|
|
|
|
+How to use schedulers which can benefit from such performance model is explained
|
|
|
+in section @ref{Task scheduling policy}.
|
|
|
+
|
|
|
The same can be done for task power consumption estimation, by setting the
|
|
|
@code{power_model} field the same way as the @code{model} field. Note: for
|
|
|
now, the application has to give to the power consumption performance model
|
|
|
-a different name.
|
|
|
+a name which is different from the execution time performance model.
|
|
|
|
|
|
@node Theoretical lower bound on execution time
|
|
|
@section Theoretical lower bound on execution time
|
|
@@ -1441,11 +1444,14 @@ priority information to StarPU.
|
|
|
|
|
|
By default, StarPU uses the @code{eager} simple greedy scheduler. This is
|
|
|
because it provides correct load balance even if the application codelets do not
|
|
|
-have performance models. If your application codelets have performance models,
|
|
|
+have performance models. If your application codelets have performance models
|
|
|
+(see section @ref{Performance model example} for examples showing how to do it),
|
|
|
you should change the scheduler thanks to the @code{STARPU_SCHED} environment
|
|
|
variable. For instance @code{export STARPU_SCHED=dmda} . Use @code{help} to get
|
|
|
the list of available schedulers.
|
|
|
|
|
|
+@c TODO: give some details about each scheduler.
|
|
|
+
|
|
|
Most schedulers are based on an estimation of codelet duration on each kind
|
|
|
of processing unit. For this to be possible, the application programmer needs
|
|
|
to configure a performance model for the codelets of the application (see
|
|
@@ -1502,15 +1508,16 @@ The power actually consumed by the total execution can be displayed by setting
|
|
|
@node Profiling
|
|
|
@section Profiling
|
|
|
|
|
|
-Profiling can be enabled by using @code{export STARPU_PROFILING=1} or by
|
|
|
+A quick view of how many tasks each worker has executed can be obtained by setting
|
|
|
+@code{export STARPU_WORKER_STATS=1} This is a convenient way to check that
|
|
|
+execution did happen on accelerators without penalizing performance with
|
|
|
+the profiling overhead.
|
|
|
+
|
|
|
+More detailed profiling information can be enabled by using @code{export STARPU_PROFILING=1} or by
|
|
|
calling @code{starpu_profiling_status_set} from the source code.
|
|
|
Statistics on the execution can then be obtained by using @code{export
|
|
|
-STARPU_BUS_STATS=1} and @code{export STARPU_WORKER_STATS=1} . Workers
|
|
|
-stats will include an approximation of the number of executed tasks even if
|
|
|
-@code{STARPU_PROFILING} is not set. This is a convenient way to check that
|
|
|
-execution did happen on accelerators without penalizing performance with
|
|
|
-the profiling overhead. More details on performance feedback are provided by the
|
|
|
-next chapter.
|
|
|
+STARPU_BUS_STATS=1} and @code{export STARPU_WORKER_STATS=1} .
|
|
|
+ More details on performance feedback are provided by the next chapter.
|
|
|
|
|
|
@node CUDA-specific optimizations
|
|
|
@section CUDA-specific optimizations
|