7 years ago · 243bdf9cea
--- a/doc/doxygen/chapters/210_check_list_performance.doxy
+++ b/doc/doxygen/chapters/210_check_list_performance.doxy
@@ -101,6 +101,12 @@ to use a version that takes the a stream parameter.
 
																 Unfortunately, some CUDA libraries do not have stream variants of
															
 
																 kernels. This will seriously lower the potential for overlapping.
															
 
																+If some CUDA calls are made without specifying this local stream,
															
 
																+synchronization needs to be explicited with cudaThreadSynchronize() around these
															
 
																+calls, to make sure that they get properly synchronized with the calls using
															
 
																+the local stream. Notably, \c cudaMemcpy() and \c cudaMemset() are actually
															
 
																+asynchronous and need such explicit synchronization! Use cudaMemcpyAsync() and
															
 
																+cudaMemsetAsync() instead.
															
 
																 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
															
 
																 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,