|
|
@@ -1159,8 +1159,11 @@ same. This is very true for regular kernels on GPUs for instance (<0.1% error),
|
|
|
and just a bit less true on CPUs (~=1% error). This also assumes that there are
|
|
|
few different sets of data input/output sizes. StarPU will then keep record of
|
|
|
the average time of previous executions on the various processing units, and use
|
|
|
-it as an estimation. It will also save it in @code{~/.starpu/sampling/codelets}
|
|
|
-for further executions. The following is a small code example.
|
|
|
+it as an estimation. History is done per task size, by using a hash of the input
|
|
|
+and ouput sizes as an index.
|
|
|
+It will also save it in @code{~/.starpu/sampling/codelets}
|
|
|
+for further executions, and can be observed by using the
|
|
|
+@code{starpu_perfmodel_display} command. The following is a small code example.
|
|
|
|
|
|
@cartouche
|
|
|
@smallexample
|