7 years ago · 243bdf9cea
--- a/doc/doxygen/chapters/210_check_list_performance.doxy
+++ b/doc/doxygen/chapters/210_check_list_performance.doxy
@@ -101,6 +101,12 @@ to use a version that takes the a stream parameter.
 
				 
			
 
				 Unfortunately, some CUDA libraries do not have stream variants of
			
 
				 kernels. This will seriously lower the potential for overlapping.
			
 
				+If some CUDA calls are made without specifying this local stream,
			
 
				+synchronization needs to be explicited with cudaThreadSynchronize() around these
			
 
				+calls, to make sure that they get properly synchronized with the calls using
			
 
				+the local stream. Notably, \c cudaMemcpy() and \c cudaMemset() are actually
			
 
				+asynchronous and need such explicit synchronization! Use cudaMemcpyAsync() and
			
 
				+cudaMemsetAsync() instead.
			
 
				 
			
 
				 Calling starpu_cublas_init() makes StarPU already do appropriate calls for the
			
 
				 CUBLAS library. Some libraries like Magma may however change the current stream of CUBLAS v1,