Pārlūkot izejas kodu

Mention DriverCopyAsync in the trace.

Samuel Thibault 14 gadi atpakaļ
vecāks
revīzija
e9e66b6c78
2 mainītis faili ar 4 papildinājumiem un 1 dzēšanām
  1. 3 1
      doc/starpu.texi
  2. 1 0
      src/sched_policies/heft.c

+ 3 - 1
doc/starpu.texi

@@ -1570,7 +1570,9 @@ When the application allocates data, whenever possible it should use the
 @code{starpu_malloc} function, which will ask CUDA or
 OpenCL to make the allocation itself and pin the corresponding allocated
 memory. This is needed to permit asynchronous data transfer, i.e. permit data
-transfer to overlap with computations.
+transfer to overlap with computations. Otherwise, the trace will show that the
+@code{DriverCopyAsync} state takes a lot of time, this is because CUDA or OpenCL
+then reverts to synchronous transfers.
 
 By default, StarPU leaves replicates of data wherever they were used, in case they
 will be re-used by other tasks, thus saving the data transfer time. When some

+ 1 - 0
src/sched_policies/heft.c

@@ -315,6 +315,7 @@ static int _heft_push_task(struct starpu_task *task, unsigned prio)
 
 	for (worker = 0; worker < nworkers; worker++)
 	{
+		/* FIXME: multiimpl! */
 		if (!starpu_worker_may_execute_task(worker, task, 0))
 		{
 			/* no one on that queue may execute this task */