10 年之前 · 1a6e4407c7
--- a/doc/doxygen/chapters/05check_list_performance.doxy
+++ b/doc/doxygen/chapters/05check_list_performance.doxy
@@ -60,16 +60,16 @@ StarPU already does appropriate calls for the CUBLAS library.
 
				 If the kernel can be made to only use this local stream or other self-allocated
			
 
				 streams, i.e. the whole kernel submission can be made asynchronous, then
			
 
				 one should enable asynchronous execution of the kernel.  That means setting
			
 
				-the STARPU_CUDA_ASYNC flag in cuda_flags[] in the codelet, and dropping the
			
 
				+the flag ::STARPU_CUDA_ASYNC in the corresponding field starpu_codelet::cuda_flags, and dropping the
			
 
				 cudaStreamSynchronize() call at the end of the cuda_func function, so that it
			
 
				 returns immediately after having queued the kernel to the local stream. That way, StarPU will be
			
 
				 able to submit and complete data transfers while kernels are executing, instead of only at each
			
 
				 kernel submission. The kernel just has to make sure that StarPU can use the
			
 
				 local stream to synchronize with the kernel startup and completion.
			
 
				 
			
 
				-Using the STARPU_CUDA_ASYNC flag also permits to enable concurrent kernel
			
 
				+Using the flag ::STARPU_CUDA_ASYNC also permits to enable concurrent kernel
			
 
				 execution, on cards which support it (Kepler and later, notably). This is
			
 
				-enabled by setting the STARPU_NWORKER_PER_CUDA environment variable to the
			
 
				+enabled by setting the environment variable \ref STARPU_NWORKER_PER_CUDA to the
			
 
				 number of kernels to execute concurrently.  This is useful when kernels are
			
 
				 small and do not feed the whole GPU with threads to run.
			
 
				 
			
@@ -78,7 +78,7 @@ small and do not feed the whole GPU with threads to run.
 
				 If the kernel can be made to only use the StarPU-provided command queue or other self-allocated
			
 
				 queues, i.e. the whole kernel submission can be made asynchronous, then
			
 
				 one should enable asynchronous execution of the kernel. This means setting
			
 
				-the corresponding opencl_flags[] flag in the codelet and dropping the
			
 
				+the flag ::STARPU_OPENCL_ASYNC in the corresponding field starpu_codelet::opencl_flags and dropping the
			
 
				 clFinish() and starpu_opencl_collect_stats() calls at the end of the kernel, so
			
 
				 that it returns immediately after having queued the kernel to the provided queue.
			
 
				 That way, StarPU will be able to submit and complete data transfers while kernels are executing, instead of
			
@@ -93,12 +93,12 @@ period of time.  Reason are sometimes due to contention inside StarPU, but
 
				 sometimes this is due to external reasons, such as stuck MPI driver, or CUDA
			
 
				 driver, etc.
			
 
				 
			
 
				-<c>export STARPU_WATCHDOG_TIMEOUT=10000</c>
			
 
				+<c>export STARPU_WATCHDOG_TIMEOUT=10000</c> (\ref STARPU_WATCHDOG_TIMEOUT)
			
 
				 
			
 
				 allows to make StarPU print an error message whenever StarPU does not terminate
			
 
				 any task for 10ms. In addition to that,
			
 
				 
			
 
				-<c>export STARPU_WATCHDOG_CRASH=1</c>
			
 
				+<c>export STARPU_WATCHDOG_CRASH=1</c> (\ref STARPU_WATCHDOG_CRASH)
			
 
				 
			
 
				 raises SIGABRT in that condition, thus allowing to catch the situation in gdb.
			
 
				 It can also be useful to type "handle SIGABRT nopass" in gdb to be able to let
			
@@ -128,8 +128,8 @@ which have never been calibrated yet, and save the result in
 
				 <c>$STARPU_HOME/.starpu/sampling/codelets</c>.
			
 
				 The models are indexed by machine name. To share the models between
			
 
				 machines (e.g. for a homogeneous cluster), use <c>export
			
 
				-STARPU_HOSTNAME=some_global_name</c>. To force continuing calibration,
			
 
				-use <c>export STARPU_CALIBRATE=1</c> . This may be necessary if your application
			
 
				+STARPU_HOSTNAME=some_global_name</c> (\ref STARPU_HOSTNAME). To force continuing calibration,
			
 
				+use <c>export STARPU_CALIBRATE=1</c> (\ref STARPU_CALIBRATE). This may be necessary if your application
			
 
				 has not-so-stable performance. StarPU will force calibration (and thus ignore
			
 
				 the current result) until 10 (<c>_STARPU_CALIBRATION_MINIMUM</c>) measurements have been
			
 
				 made on each architecture, to avoid badly scheduling tasks just because the
			
@@ -167,7 +167,7 @@ $ gv starpu_starpu_slu_lu_model_11.eps
 
				 
			
 
				 If a kernel source code was modified (e.g. performance improvement), the
			
 
				 calibration information is stale and should be dropped, to re-calibrate from
			
 
				-start. This can be done by using <c>export STARPU_CALIBRATE=2</c>.
			
 
				+start. This can be done by using <c>export STARPU_CALIBRATE=2</c> (\ref STARPU_CALIBRATE).
			
 
				 
			
 
				 Note: history-based performance models get calibrated
			
 
				 only if a performance-model-based scheduler is chosen.
			
@@ -216,17 +216,18 @@ and in Joules for the energy consumption models.
 
				 \section Profiling Profiling
			
 
				 
			
 
				 A quick view of how many tasks each worker has executed can be obtained by setting
			
 
				-<c>export STARPU_WORKER_STATS=1</c> This is a convenient way to check that
			
 
				+<c>export STARPU_WORKER_STATS=1</c> (\ref STARPU_WORKER_STATS). This is a convenient way to check that
			
 
				 execution did happen on accelerators, without penalizing performance with
			
 
				 the profiling overhead.
			
 
				 
			
 
				 A quick view of how much data transfers have been issued can be obtained by setting
			
 
				-<c>export STARPU_BUS_STATS=1</c> .
			
 
				+<c>export STARPU_BUS_STATS=1</c> (\ref STARPU_BUS_STATS).
			
 
				 
			
 
				-More detailed profiling information can be enabled by using <c>export STARPU_PROFILING=1</c> or by
			
 
				+More detailed profiling information can be enabled by using <c>export STARPU_PROFILING=1</c> (\ref STARPU_PROFILING)
			
 
				+or by
			
 
				 calling starpu_profiling_status_set() from the source code.
			
 
				 Statistics on the execution can then be obtained by using <c>export
			
 
				 STARPU_BUS_STATS=1</c> and <c>export STARPU_WORKER_STATS=1</c> .
			
 
				- More details on performance feedback are provided by the next chapter.
			
 
				+ More details on performance feedback are provided in the next chapter.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/11debugging_tools.doxy
+++ b/doc/doxygen/chapters/11debugging_tools.doxy
@@ -24,7 +24,7 @@ time, to tell valgrind about some known false positives and disable host memory
 
				 pinning. Other known false positives can be suppressed by giving the suppression
			
 
				 files in tools/valgrind/ *.suppr to valgrind's --suppressions option.
			
 
				 
			
 
				-The STARPU_DISABLE_KERNELS environment variable can also be set to 1 to make
			
 
				+The environment variable \ref STARPU_DISABLE_KERNELS can also be set to 1 to make
			
 
				 StarPU do everything (schedule tasks, transfer memory, etc.) except actually
			
 
				 calling the application-provided kernel functions, i.e. the computation will not
			
 
				 happen. This permits to quickly check that the task scheme is working properly.
			
--- a/doc/doxygen/chapters/22openmp_runtime_support.doxy
+++ b/doc/doxygen/chapters/22openmp_runtime_support.doxy
@@ -37,17 +37,17 @@ SORS API functions inherit from extended semantics.
 
				 
			
 
				 \section Configuration Configuration
			
 
				 
			
 
				-The SORS can be compiled into <c>libstarpu</c>
			
 
				-by providing the <c>--enable-openmp</c> flag to StarPU's
			
 
				-<c>configure</c>. Conditional compiled source codes may check for the
			
 
				+The SORS can be compiled into <c>libstarpu</c> through
			
 
				+the configure option \ref enable-openmp "--enable-openmp".
			
 
				+Conditional compiled source codes may check for the
			
 
				 availability of the OpenMP Runtime Support by testing whether the C
			
 
				 preprocessor macro <c>STARPU_OPENMP</c> is defined or not.
			
 
				 
			
 
				 \section InitExit Initialization and Shutdown
			
 
				 
			
 
				 The SORS needs to be executed/terminated by the
			
 
				-starpu_omp_init()/starpu_omp_shutdown() instead of
			
 
				-starpu_init()/starpu_shutdown(). This requirement is necessary to make
			
 
				+starpu_omp_init() / starpu_omp_shutdown() instead of
			
 
				+starpu_init() / starpu_shutdown(). This requirement is necessary to make
			
 
				 sure that the main thread gets the proper execution environment to run
			
 
				 OpenMP tasks. These calls will usually be performed by a compiler
			
 
				 runtime. Thus, they can be executed from a constructor/destructor such
			
@@ -88,8 +88,9 @@ SORS calls, enabling constructs such as barriers.
 
				 Parallel regions can be created with the function
			
 
				 starpu_omp_parallel_region() which accepts a set of attributes as
			
 
				 parameter. The execution of the calling task is suspended until the
			
 
				-parallel region completes. The <c>attr.cl</c> field is a regular StarPU
			
 
				-codelet. However only CPU codelets are supported for parallel regions.
			
 
				+parallel region completes. The field starpu_omp_parallel_region_attr::cl
			
 
				+is a regular StarPU codelet. However only CPU codelets are
			
 
				+supported for parallel regions.
			
 
				 Here is an example of use:
			
 
				 
			
 
				 \code{.c}
			
@@ -305,7 +306,7 @@ void parallel_region_f(void *buffers[], void *args)
 
				 \subsection DataDependencies Data Dependencies
			
 
				 The SORS implements inter-tasks data dependencies as specified in OpenMP
			
 
				 4.0. Data dependencies are expressed using regular StarPU data handles
			
 
				-(<c>starpu_data_handle_t</c>) plugged into the task's <c>attr.cl</c>
			
 
				+(starpu_data_handle_t) plugged into the task's <c>attr.cl</c>
			
 
				 codelet. The family of starpu_vector_data_register() -like functions and the
			
 
				 starpu_data_lookup() function may be used to register a memory area and
			
 
				 to retrieve the current data handle associated with a pointer
			
--- a/doc/doxygen/chapters/41configure_options.doxy
+++ b/doc/doxygen/chapters/41configure_options.doxy
@@ -352,6 +352,13 @@ Specify the precise MIC architecture host identifier.
 
				 The default value is <c>x86_64-k1om-linux</c>
			
 
				 </dd>
			
 
				 
			
 
				+<dt>--enable-openmp</dt>
			
 
				+<dd>
			
 
				+\anchor enable-openmp
			
 
				+\addindex __configure__--enable-openmp
			
 
				+Enable OpenMP Support (\ref OpenMPRuntimeSupport)
			
 
				+</dd>
			
 
				+
			
 
				 </dl>
			
 
				 
			
 
				 \section AdvancedConfiguration Advanced Configuration
			
--- a/doc/doxygen/chapters/api/codelet_and_tasks.doxy
+++ b/doc/doxygen/chapters/api/codelet_and_tasks.doxy
@@ -120,6 +120,12 @@ configure option \ref enable-maxbuffers "--enable-maxbuffers".
 
				 Value to set in starpu_codelet::nbuffers to specify that the codelet can accept
			
 
				 a variable number of buffers, specified in starpu_task::nbuffers.
			
 
				 
			
 
				+\def STARPU_CUDA_ASYNC
			
 
				+Value to be set in starpu_codelet::cuda_flags to allow asynchronous CUDA kernel execution.
			
 
				+
			
 
				+\def STARPU_OPENCL_ASYNC
			
 
				+Value to be set in starpu_codelet::opencl_flags to allow asynchronous OpenCL kernel execution.
			
 
				+
			
 
				 \typedef starpu_cpu_func_t
			
 
				 \ingroup API_Codelet_And_Tasks
			
 
				 CPU implementation of a codelet.