浏览代码

small fixes

Samuel Thibault 14 年之前
父节点
当前提交
87fa1ac5c3
共有 1 个文件被更改,包括 11 次插入7 次删除
  1. 11 7
      doc/starpu.texi

+ 11 - 7
doc/starpu.texi

@@ -1447,7 +1447,8 @@ completely and re-calibrate from start, use @code{export STARPU_CALIBRATE=2}.
 Note: due to CUDA limitations, to be able to measure kernel duration,
 Note: due to CUDA limitations, to be able to measure kernel duration,
 calibration mode needs to disable asynchronous data transfers. Calibration thus
 calibration mode needs to disable asynchronous data transfers. Calibration thus
 disables data transfer / computation overlapping, and should thus not be used
 disables data transfer / computation overlapping, and should thus not be used
-for eventual benchmarks.
+for eventual benchmarks. Note 2: history-based performance model get calibrated
+only if a performance-model-based scheduler is chosen.
 
 
 @node Task distribution vs Data transfer
 @node Task distribution vs Data transfer
 @section Task distribution vs Data transfer
 @section Task distribution vs Data transfer
@@ -1504,17 +1505,19 @@ next chapter.
 @node CUDA-specific optimizations
 @node CUDA-specific optimizations
 @section CUDA-specific optimizations
 @section CUDA-specific optimizations
 
 
-Due to CUDA limitations, StarPU will have a hard time overlapping
-communications and computations if the application does not use a dedicated
-CUDA stream for its computations. StarPU provides one by the use of
-@code{starpu_cuda_get_local_stream()}. For instance:
+Due to CUDA limitations, StarPU will have a hard time overlapping its own
+communications and the codelet computations if the application does not use a
+dedicated CUDA stream for its computations. StarPU provides one by the use of
+@code{starpu_cuda_get_local_stream()} which should be used by all CUDA codelet
+operations. For instance:
 
 
 @example
 @example
 func <<<grid,block,0,starpu_cuda_get_local_stream()>>> (foo, bar);
 func <<<grid,block,0,starpu_cuda_get_local_stream()>>> (foo, bar);
 cudaStreamSynchronize(starpu_cuda_get_local_stream());
 cudaStreamSynchronize(starpu_cuda_get_local_stream());
 @end example
 @end example
 
 
-Unfortunately, a lot of cuda libraries do not have stream variants of kernels.
+Unfortunately, a lot of CUDA libraries do not have stream variants of
+kernels. That will lower the potential for overlapping.
 
 
 @c ---------------------------------------------------------------------
 @c ---------------------------------------------------------------------
 @c Performance feedback
 @c Performance feedback
@@ -2736,7 +2739,8 @@ design their own data interfaces if required.
 @item @emph{Description}:
 @item @emph{Description}:
 This function allocates data of the given size. It will also try to pin it in
 This function allocates data of the given size. It will also try to pin it in
 CUDA or OpenGL, so that data transfers from this buffer can be asynchronous, and
 CUDA or OpenGL, so that data transfers from this buffer can be asynchronous, and
-thus permit data transfer and computation overlapping.
+thus permit data transfer and computation overlapping. The allocated buffer must
+be freed thanks to the @code{starpu_data_free_pinned_if_possible} function.
 @item @emph{Prototype}:
 @item @emph{Prototype}:
 @code{int starpu_data_malloc_pinned_if_possible(void **A, size_t dim);}
 @code{int starpu_data_malloc_pinned_if_possible(void **A, size_t dim);}
 @end table
 @end table