瀏覽代碼

mention cudaMemcpyAsync for asynchrounous operations

Samuel Thibault 6 年之前
父節點
當前提交
7944ad1433
共有 1 個文件被更改,包括 5 次插入2 次删除
  1. 5 2
      doc/doxygen/chapters/210_check_list_performance.doxy

+ 5 - 2
doc/doxygen/chapters/210_check_list_performance.doxy

@@ -2,7 +2,7 @@
  *
  * Copyright (C) 2011-2013,2015,2017                      Inria
  * Copyright (C) 2010-2019                                CNRS
- * Copyright (C) 2009-2011,2013-2018                      Université de Bordeaux
+ * Copyright (C) 2009-2011,2013-2019                      Université de Bordeaux
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -96,8 +96,11 @@ func <<<grid,block,0,starpu_cuda_get_local_stream()>>> (foo, bar);
 cudaStreamSynchronize(starpu_cuda_get_local_stream());
 \endcode
 
+as well as the use of cudaMemcpyAsync(), etc. for each CUDA operation one needs
+to use a version that takes the a stream parameter.
+
 Unfortunately, some CUDA libraries do not have stream variants of
-kernels. This will lower the potential for overlapping.
+kernels. This will seriously lower the potential for overlapping.
 
 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,