浏览代码

Note about cudaMemset/cudaMemcpy being actually async

Samuel Thibault 6 年之前
父节点
当前提交
243bdf9cea
共有 1 个文件被更改,包括 6 次插入0 次删除
  1. 6 0
      doc/doxygen/chapters/210_check_list_performance.doxy

+ 6 - 0
doc/doxygen/chapters/210_check_list_performance.doxy

@@ -101,6 +101,12 @@ to use a version that takes the a stream parameter.
 
 Unfortunately, some CUDA libraries do not have stream variants of
 kernels. This will seriously lower the potential for overlapping.
+If some CUDA calls are made without specifying this local stream,
+synchronization needs to be explicited with cudaThreadSynchronize() around these
+calls, to make sure that they get properly synchronized with the calls using
+the local stream. Notably, \c cudaMemcpy() and \c cudaMemset() are actually
+asynchronous and need such explicit synchronization! Use cudaMemcpyAsync() and
+cudaMemsetAsync() instead.
 
 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,