浏览代码

Mention DriverCopyAsync in the trace.

Samuel Thibault 14 年之前
父节点
当前提交
e9e66b6c78
共有 2 个文件被更改,包括 4 次插入1 次删除
  1. 3 1
      doc/starpu.texi
  2. 1 0
      src/sched_policies/heft.c

+ 3 - 1
doc/starpu.texi

@@ -1570,7 +1570,9 @@ When the application allocates data, whenever possible it should use the
 @code{starpu_malloc} function, which will ask CUDA or
 @code{starpu_malloc} function, which will ask CUDA or
 OpenCL to make the allocation itself and pin the corresponding allocated
 OpenCL to make the allocation itself and pin the corresponding allocated
 memory. This is needed to permit asynchronous data transfer, i.e. permit data
 memory. This is needed to permit asynchronous data transfer, i.e. permit data
-transfer to overlap with computations.
+transfer to overlap with computations. Otherwise, the trace will show that the
+@code{DriverCopyAsync} state takes a lot of time, this is because CUDA or OpenCL
+then reverts to synchronous transfers.
 
 
 By default, StarPU leaves replicates of data wherever they were used, in case they
 By default, StarPU leaves replicates of data wherever they were used, in case they
 will be re-used by other tasks, thus saving the data transfer time. When some
 will be re-used by other tasks, thus saving the data transfer time. When some

+ 1 - 0
src/sched_policies/heft.c

@@ -315,6 +315,7 @@ static int _heft_push_task(struct starpu_task *task, unsigned prio)
 
 
 	for (worker = 0; worker < nworkers; worker++)
 	for (worker = 0; worker < nworkers; worker++)
 	{
 	{
+		/* FIXME: multiimpl! */
 		if (!starpu_worker_may_execute_task(worker, task, 0))
 		if (!starpu_worker_may_execute_task(worker, task, 0))
 		{
 		{
 			/* no one on that queue may execute this task */
 			/* no one on that queue may execute this task */