Browse Source

Note about cudaMemset/cudaMemcpy being actually async

Samuel Thibault 6 years ago
parent
commit
243bdf9cea
1 changed files with 6 additions and 0 deletions
  1. 6 0
      doc/doxygen/chapters/210_check_list_performance.doxy

+ 6 - 0
doc/doxygen/chapters/210_check_list_performance.doxy

@@ -101,6 +101,12 @@ to use a version that takes the a stream parameter.
 
 Unfortunately, some CUDA libraries do not have stream variants of
 kernels. This will seriously lower the potential for overlapping.
+If some CUDA calls are made without specifying this local stream,
+synchronization needs to be explicited with cudaThreadSynchronize() around these
+calls, to make sure that they get properly synchronized with the calls using
+the local stream. Notably, \c cudaMemcpy() and \c cudaMemset() are actually
+asynchronous and need such explicit synchronization! Use cudaMemcpyAsync() and
+cudaMemsetAsync() instead.
 
 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,