Просмотр исходного кода

Note about cudaMemset/cudaMemcpy being actually async

Samuel Thibault лет назад: 7
Родитель
Сommit
243bdf9cea
1 измененных файлов с 6 добавлено и 0 удалено
  1. 6 0
      doc/doxygen/chapters/210_check_list_performance.doxy

+ 6 - 0
doc/doxygen/chapters/210_check_list_performance.doxy

@@ -101,6 +101,12 @@ to use a version that takes the a stream parameter.
 
 
 Unfortunately, some CUDA libraries do not have stream variants of
 Unfortunately, some CUDA libraries do not have stream variants of
 kernels. This will seriously lower the potential for overlapping.
 kernels. This will seriously lower the potential for overlapping.
+If some CUDA calls are made without specifying this local stream,
+synchronization needs to be explicited with cudaThreadSynchronize() around these
+calls, to make sure that they get properly synchronized with the calls using
+the local stream. Notably, \c cudaMemcpy() and \c cudaMemset() are actually
+asynchronous and need such explicit synchronization! Use cudaMemcpyAsync() and
+cudaMemsetAsync() instead.
 
 
 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,
 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,