Browse Source

small fixes

Samuel Thibault 14 years ago
parent
commit
87fa1ac5c3
1 changed files with 11 additions and 7 deletions
  1. 11 7
      doc/starpu.texi

+ 11 - 7
doc/starpu.texi

@@ -1447,7 +1447,8 @@ completely and re-calibrate from start, use @code{export STARPU_CALIBRATE=2}.
 Note: due to CUDA limitations, to be able to measure kernel duration,
 calibration mode needs to disable asynchronous data transfers. Calibration thus
 disables data transfer / computation overlapping, and should thus not be used
-for eventual benchmarks.
+for eventual benchmarks. Note 2: history-based performance model get calibrated
+only if a performance-model-based scheduler is chosen.
 
 @node Task distribution vs Data transfer
 @section Task distribution vs Data transfer
@@ -1504,17 +1505,19 @@ next chapter.
 @node CUDA-specific optimizations
 @section CUDA-specific optimizations
 
-Due to CUDA limitations, StarPU will have a hard time overlapping
-communications and computations if the application does not use a dedicated
-CUDA stream for its computations. StarPU provides one by the use of
-@code{starpu_cuda_get_local_stream()}. For instance:
+Due to CUDA limitations, StarPU will have a hard time overlapping its own
+communications and the codelet computations if the application does not use a
+dedicated CUDA stream for its computations. StarPU provides one by the use of
+@code{starpu_cuda_get_local_stream()} which should be used by all CUDA codelet
+operations. For instance:
 
 @example
 func <<<grid,block,0,starpu_cuda_get_local_stream()>>> (foo, bar);
 cudaStreamSynchronize(starpu_cuda_get_local_stream());
 @end example
 
-Unfortunately, a lot of cuda libraries do not have stream variants of kernels.
+Unfortunately, a lot of CUDA libraries do not have stream variants of
+kernels. That will lower the potential for overlapping.
 
 @c ---------------------------------------------------------------------
 @c Performance feedback
@@ -2736,7 +2739,8 @@ design their own data interfaces if required.
 @item @emph{Description}:
 This function allocates data of the given size. It will also try to pin it in
 CUDA or OpenGL, so that data transfers from this buffer can be asynchronous, and
-thus permit data transfer and computation overlapping.
+thus permit data transfer and computation overlapping. The allocated buffer must
+be freed thanks to the @code{starpu_data_free_pinned_if_possible} function.
 @item @emph{Prototype}:
 @code{int starpu_data_malloc_pinned_if_possible(void **A, size_t dim);}
 @end table