Parcourir la source

doc: minor fixes

Nathalie Furmento il y a 6 ans
Parent
commit
b545befdc2

+ 1 - 1
doc/doxygen/Makefile.am

@@ -98,7 +98,7 @@ chapters =	\
 	chapters/code/simgrid.c \
 	chapters/code/vector_scal_c.c \
 	chapters/code/vector_scal_cpu.c \
-	chapters/code/vector_scal_cuda.cu \
+	chapters/code/vector_scal_cuda.c \
 	chapters/code/vector_scal_opencl.c \
 	chapters/code/vector_scal_opencl_codelet.cl \
 	chapters/code/disk_copy.c \

+ 5 - 5
doc/doxygen/chapters/101_building.doxy

@@ -97,7 +97,7 @@ $ ./autogen.sh
 \endverbatim
 
 You then need to configure StarPU. Details about options that are
-useful to give to <c>./configure</c> are given in \ref CompilationConfiguration.
+useful to give to <c>configure</c> are given in \ref CompilationConfiguration.
 
 \verbatim
 $ ./configure
@@ -242,9 +242,9 @@ autodetect the StarPU installation and library dependences (such as
 <c>libhwloc</c>) provided that the <c>PKG_CONFIG_PATH</c> variable is set, and
 is sufficient to build a statically-linked executable. This example has been
 successfully tested with CMake 3.2, though it may work with earlier CMake 3.x
-versions. 
+versions.
 
-\code{CMakeLists.txt}
+\code{File CMakeLists.txt}
 cmake_minimum_required (VERSION 3.2)
 project (hello_starpu)
 
@@ -262,7 +262,7 @@ add_executable(hello_starpu hello_starpu.c)
 \endcode
 
 The following \c CMakeLists.txt implements an alternative, more complex
-strategy, still relying on PkgConfig, but also taking into account additional
+strategy, still relying on Pkg-Config, but also taking into account additional
 flags. While more complete, this approach makes CMake's build types (Debug,
 Release, ...) unavailable because of the direct affectation to variable
 <c>CMAKE_C_FLAGS</c>. If both the full flags support and the build types
@@ -271,7 +271,7 @@ support are needed, the \c CMakeLists.txt below may be altered to work with
 This example has been successfully tested with CMake 3.2, though it may work
 with earlier CMake 3.x versions. 
 
-\code{CMakeLists.txt}
+\code{File CMakeLists.txt}
 cmake_minimum_required (VERSION 3.2)
 project (hello_starpu)
 

+ 2 - 2
doc/doxygen/chapters/110_basic_examples.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2013,2015-2018                      CNRS
+ * Copyright (C) 2010-2013,2015-2019                      CNRS
  * Copyright (C) 2011-2013                                Inria
  * Copyright (C) 2009-2011,2014,2015                      Université de Bordeaux
  *
@@ -618,7 +618,7 @@ that the vector pointer returned by ::STARPU_VECTOR_GET_PTR is here a
 pointer in GPU memory, so that it can be passed as such to the
 kernel call <c>vector_mult_cuda</c>.
 
-\snippet vector_scal_cuda.cu To be included. You should update doxygen if you see this text.
+\snippet vector_scal_cuda.c To be included. You should update doxygen if you see this text.
 
 \subsection DefinitionOfTheOpenCLKernel Definition of the OpenCL Kernel
 

+ 8 - 8
doc/doxygen/chapters/210_check_list_performance.doxy

@@ -48,7 +48,7 @@ synchronize the execution of tasks.
 
 \section ConfigurationImprovePerformance Configuration Which May Improve Performance
 
-The \ref enable-fast "--enable-fast" configuration option disables all
+The \ref enable-fast "--enable-fast" \c configure option disables all
 assertions. This makes StarPU more performant for really small tasks by
 disabling all sanity checks. Only use this for measurements and production, not for development, since this will drop all basic checks.
 
@@ -141,11 +141,11 @@ enabled by setting the environment variable \ref STARPU_NWORKER_PER_CUDA to the
 number of kernels to execute concurrently.  This is useful when kernels are
 small and do not feed the whole GPU with threads to run.
 
-Concerning memory allocation, you should really not use cudaMalloc/cudaFree
-within the kernel, since cudaFree introduces a awfully lot of synchronizations
+Concerning memory allocation, you should really not use \c cudaMalloc/ \c cudaFree
+within the kernel, since \c cudaFree introduces a awfully lot of synchronizations
 within CUDA itself. You should instead add a parameter to the codelet with the
-STARPU_SCRATCH mode access. You can then pass to the task a handle registered
-with the desired size but with the NULL pointer, that handle can even be the
+::STARPU_SCRATCH mode access. You can then pass to the task a handle registered
+with the desired size but with the \c NULL pointer, that handle can even be the
 shared between tasks, StarPU will allocate per-task data on the fly before task
 execution, and reuse the allocated data between tasks.
 
@@ -242,7 +242,7 @@ to reserve this amount immediately.
 
 It is possible to reduce the memory footprint of the task and data internal
 structures of StarPU by describing the shape of your machine and/or your
-application at the configure step.
+application at the \c configure step.
 
 To reduce the memory footprint of the data internal structures of StarPU, one
 can set the
@@ -251,12 +251,12 @@ can set the
 \ref enable-maxcudadev "--enable-maxcudadev",
 \ref enable-maxopencldev "--enable-maxopencldev" and
 \ref enable-maxnodes "--enable-maxnodes"
-configure parameters to give StarPU
+\c configure parameters to give StarPU
 the architecture of the machine it will run on, thus tuning the size of the
 structures to the machine.
 
 To reduce the memory footprint of the task internal structures of StarPU, one
-can set the \ref enable-maxbuffers "--enable-maxbuffers" configure parameter to
+can set the \ref enable-maxbuffers "--enable-maxbuffers" \c configure parameter to
 give StarPU the maximum number of buffers that a task can use during an
 execution. For example, in the Cholesky factorization (dense linear algebra
 application), the GEMM task uses up to 3 buffers, so it is possible to set the

+ 4 - 4
doc/doxygen/chapters/301_tasks.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2011,2012,2018                           Inria
  * Copyright (C) 2009-2011,2014-2016,2018                 Université de Bordeaux
  *
@@ -69,14 +69,14 @@ specific data or starpu_data_set_default_sequential_consistency_flag()
 for all datas.
 
 Setting (or unsetting) sequential consistency can also be done at task
-level by setting the field starpu_task::sequential_consistency to 0.
+level by setting the field starpu_task::sequential_consistency to \c 0.
 
 Sequential consistency can also be set (or unset) for each handle of a
 specific task, this is done by using the field
 starpu_task::handles_sequential_consistency. When set, its value
 should be a array with the number of elements being the number of
 handles for the task, each element of the array being the sequential
-consistency for the i-th handle of the task. The field can easily be
+consistency for the \c i-th handle of the task. The field can easily be
 set when calling starpu_task_insert() with the flag
 ::STARPU_HANDLES_SEQUENTIAL_CONSISTENCY
 
@@ -120,7 +120,7 @@ to delay the termination of a task until the termination of other tasks.
 
 The maximum number of data a task can manage is fixed by the environment variable
 \ref STARPU_NMAXBUFS which has a default value which can be changed
-through the configure option \ref enable-maxbuffers "--enable-maxbuffers".
+through the \c configure option \ref enable-maxbuffers "--enable-maxbuffers".
 
 However, it is possible to define tasks managing more data by using
 the field starpu_task::dyn_handles when defining a task and the field

+ 2 - 2
doc/doxygen/chapters/310_data_management.doxy

@@ -18,7 +18,7 @@
 
 /*! \page DataManagement Data Management
 
-TODO: intro qui parle de coherency entre autres
+TODO: intro which mentions consistency among other things
 
 \section DataInterface Data Interface
 
@@ -299,7 +299,7 @@ Partitioning can be applied several times, see
 Wherever the whole piece of data is already available, the partitioning will
 be done in-place, i.e. without allocating new buffers but just using pointers
 inside the existing copy. This is particularly important to be aware of when
-using OpenCL, where the kernel parameters are not pointers, but cl_mem handles. The
+using OpenCL, where the kernel parameters are not pointers, but \c cl_mem handles. The
 kernel thus needs to be also passed the offset within the OpenCL buffer:
 
 \code{.c}

+ 2 - 2
doc/doxygen/chapters/320_scheduling.doxy

@@ -79,7 +79,7 @@ specified for a codelet, every task built from this codelet will be scheduled
 using an <b>eager</b> fallback policy.
 
 <b>Troubleshooting:</b> Configuring and recompiling StarPU using the
-\ref enable-verbose "--enable-verbose" configure option displays some statistics at the end of
+\ref enable-verbose "--enable-verbose" \c configure option displays some statistics at the end of
 execution about the percentage of tasks which have been scheduled by a DM*
 family policy using performance model hints. A low or zero percentage may be
 the sign that performance models are not converging or that codelets do not
@@ -227,7 +227,7 @@ task graph.
 
 Other heuristics can however look at the task graph. Recording the task graph
 is expensive, so it is not available by default, the scheduling heuristic has
-to set _starpu_graph_record to 1 from the initialization function, to make it
+to set \c _starpu_graph_record to \c 1 from the initialization function, to make it
 available. Then the <c>_starpu_graph*</c> functions can be used.
 
 <c>src/sched_policies/graph_test_policy.c</c> is an example of simple greedy

+ 6 - 6
doc/doxygen/chapters/360_debugging_tools.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2017                                CNRS
+ * Copyright (C) 2010-2017, 2019                          CNRS
  * Copyright (C) 2009-2011,2014,2016                      Université de Bordeaux
  * Copyright (C) 2011,2012                                Inria
  *
@@ -24,10 +24,10 @@ can be generated and displayed graphically, see \ref GeneratingTracesWithFxT.
 \section DebuggingInGeneral TroubleShooting In General
 
 Generally-speaking, if you have troubles, pass \ref enable-debug "--enable-debug" to
-<c>./configure</c> to enable some checks which impact performance, but will
+<c>configure</c> to enable some checks which impact performance, but will
 catch common issues, possibly earlier than the actual problem you are observing,
 which may just be a consequence of a bug that happened earlier. Also, make sure
-not to have the \ref enable-fast "--enable-fast" option which drops very useful
+not to have the \ref enable-fast "--enable-fast" \c configure option which drops very useful
 catchup assertions. If your program is valgrind-safe, you can use it, see \ref
 UsingOtherDebugger.
 
@@ -83,12 +83,12 @@ For instance,
 </ul>
 
 Some functions can only work if \ref enable-debug "--enable-debug"
-was passed to <c>./configure</c>
+was passed to <c>configure</c>
 (because they impact performance)
 
 \section UsingOtherDebugger Using Other Debugging Tools
 
-Valgrind can be used on StarPU: valgrind.h just needs to be found at <c>./configure</c>
+Valgrind can be used on StarPU: valgrind.h just needs to be found at <c>configure</c>
 time, to tell valgrind about some known false positives and disable host memory
 pinning. Other known false positives can be suppressed by giving the suppression
 files in <c>tools/valgrind/*.suppr</c> to valgrind's <c>--suppressions</c> option.
@@ -104,7 +104,7 @@ StarPU can connect to Temanejo >= 1.0rc2 (see
 http://www.hlrs.de/temanejo), to permit
 nice visual task debugging. To do so, build Temanejo's <c>libayudame.so</c>,
 install <c>Ayudame.h</c> to e.g. <c>/usr/local/include</c>, apply the
-<c>tools/patch-ayudame</c> to it to fix C build, re-<c>./configure</c>, make
+<c>tools/patch-ayudame</c> to it to fix C build, re-<c>configure</c>, make
 sure that it found it, rebuild StarPU.  Run the Temanejo GUI, give it the path
 to your application, any options you want to pass it, the path to <c>libayudame.so</c>.
 

+ 2 - 2
doc/doxygen/chapters/370_online_performance_tools.doxy

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  * Copyright (C) 2011,2012,2016                           Inria
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2009-2011,2014,2016,2018                 Université de Bordeaux
  *
  * StarPU is free software; you can redistribute it and/or modify
@@ -451,7 +451,7 @@ directory. These files are reused when \ref STARPU_CALIBRATE
 environment variable is set to <c>1</c>, to recompute coefficients
 based on the current, but also on the previous
 executions. Additionally, when multiple linear regression models are
-disabled (using "--disable-mlr" configuration option) or when the
+disabled (using \ref disable-mlr "--disable-mlr" configure option) or when the
 <c>model->combinations</c> are not defined, StarPU will still write
 output files into <c>.starpu/sampling/codelets/tmp/</c> to allow
 performing an analysis. This analysis typically aims at finding the

+ 5 - 5
doc/doxygen/chapters/380_offline_performance_tools.doxy

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  * Copyright (C) 2011,2012,2015-2017                      Inria
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2009-2011,2014-2017                      Université de Bordeaux
  *
  * StarPU is free software; you can redistribute it and/or modify
@@ -91,7 +91,7 @@ $ ./configure --with-fxt=$FXTDIR
 
 Or you can simply point the <c>PKG_CONFIG_PATH</c> to
 <c>$FXTDIR/lib/pkgconfig</c> and pass
-\ref with-fxt "--with-fxt" to <c>./configure</c>
+\ref with-fxt "--with-fxt" to <c>configure</c>
 
 When FxT is enabled, a trace is generated when StarPU is terminated by calling
 starpu_shutdown(). The trace is a binary file whose name has the form
@@ -100,7 +100,7 @@ starpu_shutdown(). The trace is a binary file whose name has the form
 <c>/tmp/</c> directory by default, or by the directory specified by
 the environment variable \ref STARPU_FXT_PREFIX.
 
-The additional configure option \ref enable-fxt-lock "--enable-fxt-lock" can
+The additional \c configure option \ref enable-fxt-lock "--enable-fxt-lock" can
 be used to generate trace events which describes the locks behaviour during
 the execution. It is however very heavy and should not be used unless debugging
 StarPU's internal locking.
@@ -154,7 +154,7 @@ ViTE, use the following command:
 $ vite paje.trace
 \endverbatim
 
-Tasks can be assigned a name (instead of the default 'unknown') by
+Tasks can be assigned a name (instead of the default \c unknown) by
 filling the optional starpu_codelet::name, or assigning them a
 performance model. The name can also be set with the field
 starpu_task::name or by using \ref STARPU_NAME when calling
@@ -167,7 +167,7 @@ one can specify the option <c>-c</c> to <c>starpu_fxt_tool</c> or in
 \ref STARPU_GENERATE_TRACE_OPTIONS. Tasks can also be given a specific
 color by setting the field starpu_codelet::color or the
 starpu_task::color. Colors are expressed with the following format
-0xRRGGBB (e.g 0xFF0000 for red). See
+\c 0xRRGGBB (e.g \c 0xFF0000 for red). See
 <c>basic_examples/task_insert_color</c> for examples on how to assign
 colors.
 

+ 7 - 7
doc/doxygen/chapters/390_faq.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2009-2011,2014,2016,2017                 Université de Bordeaux
  * Copyright (C) 2011,2012                                Inria
  *
@@ -133,7 +133,7 @@ to be the one that runs CUDA computations for that GPU.
 
 To achieve this with StarPU, pass the option
 \ref disable-cuda-memcpy-peer "--disable-cuda-memcpy-peer"
-to <c>./configure</c> (TODO: make it dynamic), OpenGL/GLUT has to be initialized
+to <c>configure</c> (TODO: make it dynamic), OpenGL/GLUT has to be initialized
 first, and the interoperability mode has to
 be enabled by using the field
 starpu_conf::cuda_opengl_interoperability, and the driver loop has to
@@ -213,7 +213,7 @@ latency for better overall performance.
 
 If eating CPU time is a problem (e.g. application running on a desktop),
 pass option \ref enable-blocking-drivers "--enable-blocking-drivers" to
-<c>./configure</c>. This will add some overhead when putting CPU workers to
+<c>configure</c>. This will add some overhead when putting CPU workers to
 sleep or waking them, but avoid eating 100% CPU permanently.
 
 \section PauseResume Interleaving StarPU and non-StarPU code
@@ -267,8 +267,8 @@ running on the GPU, the codelet can run some computation, which thus be run by
 the CPU core instead of driving the GPU.
 
 One can choose to dedicate only one thread for all the CUDA devices by setting
-the STARPU_CUDA_THREAD_PER_DEV environment variable to 1. The application
-however should use STARPU_CUDA_ASYNC on its CUDA codelets (asynchronous
+the \ref STARPU_CUDA_THREAD_PER_DEV environment variable to \c 1. The application
+however should use ::STARPU_CUDA_ASYNC on its CUDA codelets (asynchronous
 execution), otherwise the execution of a synchronous CUDA codelet will
 monopolize the thread, and other CUDA devices will thus starve while it is
 executing.
@@ -276,7 +276,7 @@ executing.
 \section CUDADrivers StarPU does not see my CUDA device
 
 First make sure that CUDA is properly running outside StarPU: build and
-run the following program with -lcudart:
+run the following program with \c -lcudart :
 
 \code{.c}
 #include <stdio.h>
@@ -326,7 +326,7 @@ setup.
 \section OpenCLDrivers StarPU does not see my OpenCL device
 
 First make sure that OpenCL is properly running outside StarPU: build and
-run the following program with -lOpenCL:
+run the following program with \c -lOpenCL :
 
 \code{.c}
 #include <CL/cl.h>

+ 19 - 19
doc/doxygen/chapters/401_out_of_core.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2013,2014,2016-2018                      CNRS
+ * Copyright (C) 2013,2014,2016-2019                      CNRS
  * Copyright (C) 2013,2014,2017,2018                      Université de Bordeaux
  * Copyright (C) 2013                                     Inria
  * Copyright (C) 2013                                     Corentin Salingue
@@ -32,17 +32,17 @@ again for a task, StarPU will automatically fetch it back from the disk.
 
 
 The principle is that one first registers a disk location, seen by StarPU as a
-<c>void*</c>, which can be for instance a Unix path for the stdio, unistd or
-unistd_o_direct backends, or a leveldb database for the leveldb backend, an HDF5
-file path for the HDF5 backend, etc. The disk backend opens this place with the
-plug method.
+<c>void*</c>, which can be for instance a Unix path for the \c stdio, \c unistd or
+\c unistd_o_direct backends, or a leveldb database for the \c leveldb backend, an HDF5
+file path for the \c HDF5 backend, etc. The \c disk backend opens this place with the
+plug() method.
 
 StarPU can then start using it to allocate room and store data there with the
 disk write method, without user intervention.
 
 The user can also use starpu_disk_open() to explicitly open an object within the
-disk, e.g. a file name in the stdio or unistd cases, or a database key in the
-leveldb case, and then use <c>starpu_*_register</c> functions to turn it into a StarPU
+disk, e.g. a file name in the \c stdio or \c unistd cases, or a database key in the
+\c leveldb case, and then use <c>starpu_*_register</c> functions to turn it into a StarPU
 data handle. StarPU will then use this file as external source of data, and
 automatically read and write data as appropriate.
 
@@ -54,13 +54,13 @@ To use a disk memory node, you have to register it with this function:
 int new_dd = starpu_disk_register(&starpu_disk_unistd_ops, (void *) "/tmp/", 1024*1024*200);
 \endcode
 
-Here, we use the unistd library to realize the read/write operations, i.e.
-fread/fwrite. This structure must have a path where to store files, as well as
+Here, we use the \c unistd library to realize the read/write operations, i.e.
+\c fread/\c fwrite. This structure must have a path where to store files, as well as
 the maximum size the software can afford storing on the disk.
 
 Don't forget to check if the result is correct!
 
-This can also be achieved by just setting environment variables:
+This can also be achieved by just setting environment variables \ref STARPU_DISK_SWAP, \ref STARPU_DISK_SWAP_BACKEND and \ref STARPU_DISK_SWAP_SIZE :
 
 \verbatim
 export STARPU_DISK_SWAP=/tmp
@@ -68,8 +68,8 @@ export STARPU_DISK_SWAP_BACKEND=unistd
 export STARPU_DISK_SWAP_SIZE=200
 \endverbatim
 
-The backend can be set to stdio (some caching is done by libc), unistd (only
-caching in the kernel), unistd_o_direct (no caching), leveldb, or hdf5.
+The backend can be set to \c stdio (some caching is done by \c libc), \c unistd (only
+caching in the kernel), \c unistd_o_direct (no caching), \c leveldb, or \c hdf5.
 
 When that register call is made, StarPU will benchmark the disk. This can
 take some time.
@@ -78,7 +78,7 @@ take some time.
 
 StarPU will then automatically try to evict unused data to this new disk. One
 can also use the standard StarPU memory node API to prefetch data etc., see the
-\ref API_Standard_Memory_Library and the \ref API_Data_Interfaces .
+\ref API_Standard_Memory_Library and the \ref API_Data_Interfaces.
 
 The disk is unregistered during the starpu_shutdown().
 
@@ -87,20 +87,20 @@ The disk is unregistered during the starpu_shutdown().
 StarPU will only be able to achieve Out-Of-Core eviction if it controls memory
 allocation. For instance, if the application does the following:
 
-<code>
+\code{.c}
 p = malloc(1024*1024*sizeof(float));
 fill_with_data(p);
 starpu_matrix_data_register(&h, STARPU_MAIN_RAM, (uintptr_t) p, 1024, 1024, 1024, sizeof(float));
-</code>
+\endcode
 
 StarPU will not be able to release the corresponding memory since it's the
 application which allocated it, and StarPU can not know how, and thus how to
 release it. One thus have to use the following instead:
 
-<code>
+\code{.c}
 starpu_matrix_data_register(&h, -1, NULL, 1024, 1024, 1024, sizeof(float));
 starpu_task_insert(cl_fill_with_data, STARPU_W, h, 0);
-</code>
+\endcode
 
 Which makes StarPU automatically do the allocation when the task running
 cl_fill_with_data gets executed. And then if its needs to, it will be able to
@@ -113,10 +113,10 @@ which data should be evicted to the disk. This algorithm can be hinted
 by telling which data will no be used in the coming future thanks to
 starpu_data_wont_use(), for instance:
 
-<code>
+\code{.c}
 starpu_task_insert(&cl_work, STARPU_RW, h, 0);
 starpu_data_wont_use(h);
-</code>
+\endcode
 
 StarPU will mark the data as "inactive" and tend to evict to the disk that data
 rather than others.

+ 24 - 24
doc/doxygen/chapters/410_mpi_support.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2011-2013,2016,2017                      Inria
  * Copyright (C) 2009-2011,2013-2018                      Université de Bordeaux
  *
@@ -233,10 +233,10 @@ detach state attribute which determines whether a thread will be
 created in a joinable or a detached state.
 
 For send communications, data is acquired with the mode ::STARPU_R.
-When using the configure option
+When using the \c configure option
 \ref enable-mpi-pedantic-isend "--enable-mpi-pedantic-isend", the mode
 ::STARPU_RW is used to make sure there is no more than 1 concurrent
-MPI_Isend call accessing a data.
+\c MPI_Isend() call accessing a data.
 
 Internally, all communication are divided in 2 communications, a first
 message is used to exchange an envelope describing the data (i.e its
@@ -585,7 +585,7 @@ which tag is used to transfer its value.
 It can however be useful to register e.g. some temporary data on just one node,
 without having to register a dumb handle on all nodes, while only one node will
 actually need to know about it. In this case, nodes which will not need the data
-can just pass NULL to starpu_mpi_task_insert():
+can just pass \c NULL to starpu_mpi_task_insert():
 
 \code{.c}
 starpu_data_handle_t data0 = NULL;
@@ -597,11 +597,11 @@ if (rank == 0)
 starpu_mpi_task_insert(MPI_COMM_WORLD, &cl, STARPU_W, data0, 0); /* Executes on node 0 */
 \endcode
 
-Here, nodes whose rank is not 0 will simply not take care of the data, and consider it to be on another node.
+Here, nodes whose rank is not \c 0 will simply not take care of the data, and consider it to be on another node.
 
-This can be mixed various way, for instance here node 1 determines that it does
-not have to care about data0, but knows that it should send the value of its
-data1 to node 0, which owns data and thus will need the value of data1 to execute the task:
+This can be mixed various way, for instance here node \c 1 determines that it does
+not have to care about \c data0, but knows that it should send the value of its
+\c data1 to node \c 0, which owns data and thus will need the value of \c data1 to execute the task:
 
 \code{.c}
 starpu_data_handle_t data0 = NULL, data1, data;
@@ -675,29 +675,29 @@ The data can then be used just like pernode above.
 
 All send functions have a <c>_prio</c> variant which takes an additional
 priority parameter, which allows to make StarPU-MPI change the order of MPI
-requests before submitting them to MPI. The default priority is 0.
+requests before submitting them to MPI. The default priority is \c 0.
 
-When using the starpu_mpi_task_insert helper, STARPU_PRIORITY defines both the
+When using the starpu_mpi_task_insert() helper, ::STARPU_PRIORITY defines both the
 task priority and the MPI requests priority.
 
 To test how much MPI priorities have a good effect on performance, you can
-set the environment variable STARPU_MPI_PRIORITIES to 0 to disable the use of
+set the environment variable \ref STARPU_MPI_PRIORITIES to \c 0 to disable the use of
 priorities in StarPU-MPI.
 
 \section MPICache MPI cache support
 
 StarPU-MPI automatically optimizes duplicate data transmissions: if an MPI
-node B needs a piece of data D from MPI node A for several tasks, only one
-transmission of D will take place from A to B, and the value of D will be kept
-on B as long as no task modifies D.
+node \c B needs a piece of data \c D from MPI node \c A for several tasks, only one
+transmission of \c D will take place from \c A to \c B, and the value of \c D will be kept
+on \c B as long as no task modifies \c D.
 
-If a task modifies D, B will wait for all tasks which need the previous value of
-D, before invalidating the value of D. As a consequence, it releases the memory
-occupied by D. Whenever a task running on B needs the new value of D, allocation
+If a task modifies \c D, \c B will wait for all tasks which need the previous value of
+\c D, before invalidating the value of \c D. As a consequence, it releases the memory
+occupied by \c D. Whenever a task running on \c B needs the new value of \c D, allocation
 will take place again to receive it.
 
 Since tasks can be submitted dynamically, StarPU-MPI can not know whether the
-current value of data D will again be used by a newly-submitted task before
+current value of data \c D will again be used by a newly-submitted task before
 being modified by another newly-submitted task, so until a task is submitted to
 modify the current value, it can not decide by itself whether to flush the cache
 or not.  The application can however explicitly tell StarPU-MPI to flush the
@@ -705,7 +705,7 @@ cache by calling starpu_mpi_cache_flush() or starpu_mpi_cache_flush_all_data(),
 for instance in case the data will not be used at all any more (see for instance
 the cholesky example in <c>mpi/examples/matrix_decomposition</c>), or at least not in
 the close future. If a newly-submitted task actually needs the value again,
-another transmission of D will be initiated from A to B.  A mere
+another transmission of \c D will be initiated from \c A to \c B.  A mere
 starpu_mpi_cache_flush_all_data() can for instance be added at the end of the whole
 algorithm, to express that no data will be reused after this (or at least that
 it is not interesting to keep them in cache).  It may however be interesting to
@@ -887,11 +887,11 @@ the MPI communication thread as much.
 \section MPIDebug Debugging MPI
 
 Communication trace will be enabled when the environment variable
-\ref STARPU_MPI_COMM is set to 1, and StarPU has been configured with the
+\ref STARPU_MPI_COMM is set to \c 1, and StarPU has been configured with the
 option \ref enable-verbose "--enable-verbose".
 
 Statistics will be enabled for the communication cache when the
-environment variable \ref STARPU_MPI_CACHE_STATS is set to 1. It
+environment variable \ref STARPU_MPI_CACHE_STATS is set to \c 1. It
 prints messages on the standard output when data are added or removed
 from the received communication cache.
 
@@ -932,7 +932,7 @@ StarPU-MPI is implemented directly on top of MPI.
 Since the release 1.3.0, an implementation on top of NewMadeleine, an
 optimizing communication library for high-performance networks, is
 also provided. To use it, one needs to install NewMadeleine (see
-http://pm2.gforge.inria.fr/newmadeleine/) and enable the configure
+http://pm2.gforge.inria.fr/newmadeleine/) and enable the \c configure
 option \ref enable-nmad "--enable-nmad".
 
 Both implementations provide the same public API.
@@ -942,7 +942,7 @@ Both implementations provide the same public API.
 StarPU provides an other way to execute applications across many
 nodes. The Master Slave support permits to use remote cores without
 thinking about data distribution. This support can be activated with
-the configure option \ref enable-mpi-master-slave
+the \c configure option \ref enable-mpi-master-slave
 "--enable-mpi-master-slave". However, you should not activate both MPI
 support and MPI Master-Slave support.
 
@@ -956,7 +956,7 @@ the field \ref starpu_codelet::mpi_ms_funcs.
 
 By default, one core is dedicated on the master node to manage the
 entire set of slaves. If the implementation of MPI you are using has a
-good multiple threads support, you can use the configure option
+good multiple threads support, you can use the \c configure option
 \ref with-mpi-master-slave-multiple-thread "--with-mpi-master-slave-multiple-thread"
 to dedicate one core per slave.
 

+ 3 - 3
doc/doxygen/chapters/420_fft_support.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2017                                CNRS
+ * Copyright (C) 2010-2017, 2019                          CNRS
  * Copyright (C) 2009-2011,2014,2015                      Université de Bordeaux
  * Copyright (C) 2011,2012                                Inria
  *
@@ -23,8 +23,8 @@ both <c>fftw</c> and <c>cufft</c>, the difference being that it takes benefit fr
 and GPUs. It should however be noted that GPUs do not have the same precision as
 CPUs, so the results may different by a negligible amount.
 
-Different precisions are available, namely float, double and long
-double precisions, with the following fftw naming conventions:
+Different precisions are available, namely \c float, \c double and <c>long
+double</c> precisions, with the following \c fftw naming conventions:
 
 <ul>
 <li>

+ 9 - 9
doc/doxygen/chapters/430_mic_scc_support.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2017                                CNRS
+ * Copyright (C) 2010-2017, 2019                          CNRS
  * Copyright (C) 2011,2012,2016                           Inria
  * Copyright (C) 2009-2011,2013-2016                      Université de Bordeaux
  *
@@ -30,7 +30,7 @@ cross-compiled <c>hwloc.pc</c>.
 The script <c>mic-configure</c> can then be used to achieve the two compilations: it basically
 calls <c>configure</c> as appropriate from two new directories: <c>build_mic</c> and
 <c>build_host</c>. <c>make</c> and <c>make install</c> can then be used as usual and will
-recurse into both directories. If different configuration options are needed
+recurse into both directories. If different \c configure options are needed
 for the host and for the mic, one can use <c>--with-host-param=--with-fxt</c>
 for instance to specify the <c>--with-fxt</c> option for the host only, or
 <c>--with-mic-param=--with-fxt</c> for the mic only.
@@ -47,10 +47,10 @@ or option for the host and the device builds, for instance:
     --with-host-param=--with-mpicc=/.../mpiicc
 \endverbatim
 
-In case you have troubles with the coi or scif libraries (the Intel paths are
+In case you have troubles with the \c coi or \c scif libraries (the Intel paths are
 really not standard, it seems...), you can still make a build in native mode
 only, by using <c>mic-configure --enable-native-mic</c> (and notably without
-<c>--enable-mic</c> since in that case we don't need mic offloading support).
+<c>--enable-mic</c> since in that case we don't need \c mic offloading support).
 
 \section PortingApplicationsToMICSCC Porting Applications To MIC Xeon Phi / SCC
 
@@ -70,15 +70,15 @@ struct starpu_codelet cl =
 StarPU will thus simply use the
 existing CPU implementation (cross-rebuilt in the MIC Xeon Phi case). The
 functions have to be globally-visible (i.e. not <c>static</c>) for
-StarPU to be able to look them up, and -rdynamic must be passed to gcc (or
--export-dynamic to ld) so that symbols of the main program are visible.
+StarPU to be able to look them up, and \c -rdynamic must be passed to \c gcc (or
+\c -export-dynamic to \c ld) so that symbols of the main program are visible.
 
-If you have used the <c>.where</c> field, you additionally need to add in it
-<c>STARPU_MIC</c> for the Xeon Phi, and/or <c>STARPU_SCC</c> for the SCC.
+If you have used the starpu_codelet::where field, you additionally need to add in it
+::STARPU_MIC for the Xeon Phi, and/or ::STARPU_SCC for the SCC.
 
 For non-native MIC Xeon Phi execution, the 'main' function of the application, on the sink, should call starpu_init() immediately upon start-up; the starpu_init() function never returns. On the host, the 'main' function may freely perform application related initialization calls as usual, before calling starpu_init().
 
-For MIC Xeon Phi, the application may programmatically detect whether executing on the sink or on the host, by checking whether the STARPU_SINK environment variable is defined (on the sink) or not (on the host).
+For MIC Xeon Phi, the application may programmatically detect whether executing on the sink or on the host, by checking whether the \ref STARPU_SINK environment variable is defined (on the sink) or not (on the host).
 
 For SCC execution, the function starpu_initialize() also has to be
 used instead of starpu_init(), so as to pass <c>argc</c> and

+ 5 - 5
doc/doxygen/chapters/450_native_fortran_support.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2016,2017                                CNRS
+ * Copyright (C) 2016,2017,2019                           CNRS
  * Copyright (C) 2014,2016                                Inria
  *
  * StarPU is free software; you can redistribute it and/or modify
@@ -22,7 +22,7 @@ most of its functionalities from Fortran 2008+ codes.
 
 All symbols (functions, constants) are defined in <c>fstarpu_mod.f90</c>.
 Every symbol of the Native Fortran support API is prefixed by
-<c>fstarpu_</c>. 
+<c>fstarpu_</c>.
 
 Note: Mixing uses of <c>fstarpu_</c> and <c>starpu_</c>
 symbols in the same Fortran code has unspecified behaviour.
@@ -50,7 +50,7 @@ The Native Fortran API is enabled and its companion
 <c>fstarpu_mod.f90</c> Fortran module source file is installed
 by default when a Fortran compiler is found, unless the detected Fortran
 compiler is known not to support the requirements for the Native Fortran
-API. The support can be disabled through the configure option \ref
+API. The support can be disabled through the \c configure option \ref
 disable-fortran "--disable-fortran". Conditional compiled source codes
 may check for the availability of the Native Fortran Support by testing
 whether the preprocessor macro <c>STARPU_HAVE_FC</c> is defined or not.
@@ -157,7 +157,7 @@ the runtime engine and frees all internal StarPU data structures.
 \section InsertTask Fortran Flavor of StarPU's Variadic Insert_task
 
 Fortran does not have a construction similar to C variadic functions on which
-<c>starpu_insert_task</c> relies at the time of this writing. However, Fortran's variable
+starpu_insert_task() relies at the time of this writing. However, Fortran's variable
 length arrays of <c>c_ptr</c> elements enable to emulate much of the
 convenience of C's variadic functions. This is the approach retained for
 implementing <c>fstarpu_insert_task</c>.
@@ -228,7 +228,7 @@ submit tasks to StarPU. Then, when StarPU starts a task, another C
 wrapper function calls the FORTRAN routine for the task.
 
 Note that this marshalled FORTRAN support remains available even
-when specifying configure option \ref disable-fortran "--disable-fortran"
+when specifying \c configure option \ref disable-fortran "--disable-fortran"
 (which only disables StarPU's native Fortran layer).
 
 \subsection APIMIX Valid API Mixes and Language Mixes

+ 27 - 27
doc/doxygen/chapters/470_simgrid.doxy

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  * Copyright (C) 2011,2012,2014,2016,2017                 Inria
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2009-2011,2014-2018                      Université de Bordeaux
  * Copyright (C) 2009-2011,2014-2018                      Université de Bordeaux,
  *
@@ -34,18 +34,18 @@ performance measurements, the real time will be used, which will be bogus. To
 get the simulated time, it has to use starpu_timing_now() which returns the
 virtual timestamp in us.
 
-For some technical reason, the application's .c file which contains main() has
-to be recompiled with starpu_simgrid_wrap.h, which in the simgrid case will # define main()
-into starpu_main(), and it is libstarpu which will provide the real main() and
-will call the application's main().
+For some technical reason, the application's .c file which contains \c main() has
+to be recompiled with \c starpu_simgrid_wrap.h, which in the simgrid case will <c># define main()</c>
+into <c>starpu_main()</c>, and it is \c libstarpu which will provide the real \c main() and
+will call the application's \c main().
 
 To be able to test with crazy data sizes, one may want to only allocate
-application data if STARPU_SIMGRID is not defined.  Passing a <c>NULL</c> pointer to
-starpu_data_register functions is fine, data will never be read/written to by
+application data if the macro \c STARPU_SIMGRID is not defined.  Passing a <c>NULL</c> pointer to
+\c starpu_data_register functions is fine, data will never be read/written to by
 StarPU in Simgrid mode anyway.
 
 To be able to run the application with e.g. CUDA simulation on a system which
-does not have CUDA installed, one can fill the cuda_funcs with (void*)1, to
+does not have CUDA installed, one can fill the starpu_codelet::cuda_funcs with \c (void*)1, to
 express that there is a CUDA implementation, even if one does not actually
 provide it. StarPU will not actually run it in Simgrid mode anyway by default
 (unless the ::STARPU_CODELET_SIMGRID_EXECUTE or ::STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT
@@ -76,7 +76,7 @@ run several times before the model is calibrated.
 \section Simulation Simulation
 
 Then, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"
-to <c>./configure</c>. Make sure to keep all other <c>./configure</c> options
+to <c>configure</c>. Make sure to keep all other <c>configure</c> options
 the same, and notably options such as <c>--enable-maxcudadev</c>.
 
 \verbatim
@@ -84,7 +84,7 @@ $ ./configure --enable-simgrid
 \endverbatim
 
 To specify the location of SimGrid, you can either set the environment
-variables SIMGRID_CFLAGS and SIMGRID_LIBS, or use the configure
+variables \c SIMGRID_CFLAGS and \c SIMGRID_LIBS, or use the \c configure
 options \ref with-simgrid-dir "--with-simgrid-dir",
 \ref with-simgrid-include-dir "--with-simgrid-include-dir" and
 \ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example
@@ -102,7 +102,7 @@ TEST FAILED !!!
 \endverbatim
 
 It is normal that the test fails: since the computation are not actually done
-(that is the whole point of simgrid), the result is wrong, of course.
+(that is the whole point of SimGrid), the result is wrong, of course.
 
 If the performance model is not calibrated enough, the following error
 message will be displayed
@@ -123,7 +123,7 @@ with \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM,
 
 \section SimulationOnAnotherMachine Simulation On Another Machine
 
-The simgrid support even permits to perform simulations on another machine, your
+The SimGrid support even permits to perform simulations on another machine, your
 desktop, typically. To achieve this, one still needs to perform the Calibration
 step on the actual machine to be simulated, then copy them to your desktop
 machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
@@ -133,16 +133,16 @@ make StarPU use the performance models of the simulated machine even
 on the desktop machine.
 
 If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
-use simgrid to simulate execution with CUDA/OpenCL devices, but the application
-source code will probably disable the CUDA and OpenCL codelets in thatcd sc
-case. Since during simgrid execution, the functions of the codelet are actually
+use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
+source code will probably disable the CUDA and OpenCL codelets in that
+case. Since during SimGrid execution, the functions of the codelet are actually
 not called by default, one can use dummy functions such as the following to
 still permit CUDA or OpenCL execution.
 
 \section SimulationExamples Simulation Examples
 
-StarPU ships a few performance models for a couple of systems: attila,
-mirage, idgraf, and sirocco. See section \ref SimulatedBenchmarks for the details.
+StarPU ships a few performance models for a couple of systems: \c attila,
+\c mirage, \c idgraf, and \c sirocco. See Section \ref SimulatedBenchmarks for the details.
 
 \section FakeSimulations Simulations On Fake Machines
 
@@ -157,36 +157,36 @@ The simulation can be tweaked, to be able to tune it between a very accurate
 simulation and a very simple simulation (which is thus close to scheduling
 theory results), see the \ref STARPU_SIMGRID_CUDA_MALLOC_COST,
 \ref STARPU_SIMGRID_CUDA_QUEUE_COST, \ref STARPU_SIMGRID_TASK_SUBMIT_COST,
-\ref STARPU_SIMGRID_FETCHING_INPUT_COST and STARPU_SIMGRID_SCHED_COST environment variables.
+\ref STARPU_SIMGRID_FETCHING_INPUT_COST and \ref STARPU_SIMGRID_SCHED_COST environment variables.
 
 \section SimulationMPIApplications MPI Applications
 
-StarPU-MPI applications can also be run in simgrid mode. It needs to be compiled
-with smpicc, and run using the <c>starpu_smpirun</c> script, for instance:
+StarPU-MPI applications can also be run in SimGrid mode. It needs to be compiled
+with \c smpicc, and run using the <c>starpu_smpirun</c> script, for instance:
 
 \verbatim
 $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong
 \endverbatim
 
-Where cluster.xml is a Simgrid-MPI platform description, and hostfile the
+Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
 list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
 clusters: for each MPI node it will just replicate the architecture referred by
 \ref STARPU_HOSTNAME.
 
 \section SimulationDebuggingApplications Debugging Applications
 
-By default, simgrid uses its own implementation of threads, which prevents gdb
+By default, SimGrid uses its own implementation of threads, which prevents \c gdb
 from being able to inspect stacks of all threads.  To be able to fully debug an
-application running with simgrid, pass the <c>--cfg=contexts/factory:thread</c>
-option to the application, to make simgrid use system threads, which gdb will be
+application running with SimGrid, pass the <c>--cfg=contexts/factory:thread</c>
+option to the application, to make SimGrid use system threads, which \c gdb will be
 able to manipulate as usual.
 
-It is also worth noting Simgrid 3.21's new parameter
+It is also worth noting SimGrid 3.21's new parameter
 <c>--cfg=simix/breakpoint</c> which allows to put a breakpoint at a precise
 (deterministic!) timing of the execution. If for instance in an execution
 trace we see that something odd is happening at time 19000ms, we can use
-<c>--cfg=simix/breakpoint:19.000</c> and SIGTRAP will be raised at that point,
-which will thus interrupt execution within gdb, allowing to inspect e.g.
+<c>--cfg=simix/breakpoint:19.000</c> and \c SIGTRAP will be raised at that point,
+which will thus interrupt execution within \c gdb, allowing to inspect e.g.
 scheduler state, etc.
 
 \section SimulationMemoryUsage Memory Usage

+ 5 - 5
doc/doxygen/chapters/480_openmp_runtime_support.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2014-2017                                CNRS
+ * Copyright (C) 2014-2017, 2019                          CNRS
  * Copyright (C) 2014                                     Inria
  *
  * StarPU is free software; you can redistribute it and/or modify
@@ -49,7 +49,7 @@ SORS API functions inherit from extended semantics.
 \section Configuration Configuration
 
 The SORS can be compiled into <c>libstarpu</c> through
-the configure option \ref enable-openmp "--enable-openmp".
+the \c configure option \ref enable-openmp "--enable-openmp".
 Conditional compiled source codes may check for the
 availability of the OpenMP Runtime Support by testing whether the C
 preprocessor macro <c>STARPU_OPENMP</c> is defined or not.
@@ -137,7 +137,7 @@ implementations. The SORS supports <c>static</c>, <c>dynamic</c>, and
 <c>guided</c> loop scheduling clauses. The <c>auto</c> scheduling clause
 is implemented as <c>static</c>. The <c>runtime</c> scheduling clause
 honors the scheduling mode selected through the environment variable
-OMP_SCHEDULE or the starpu_omp_set_schedule() function. For loops with
+\c OMP_SCHEDULE or the starpu_omp_set_schedule() function. For loops with
 the <c>ordered</c> clause are also supported. An implicit barrier can be
 enforced or skipped at the end of the worksharing construct, according
 to the value of the <c>nowait</c> parameter.
@@ -317,7 +317,7 @@ void parallel_region_f(void *buffers[], void *args)
 \subsection DataDependencies Data Dependencies
 The SORS implements inter-tasks data dependencies as specified in OpenMP
 4.0. Data dependencies are expressed using regular StarPU data handles
-(starpu_data_handle_t) plugged into the task's <c>attr.cl</c>
+(\ref starpu_data_handle_t) plugged into the task's <c>attr.cl</c>
 codelet. The family of starpu_vector_data_register() -like functions and the
 starpu_data_lookup() function may be used to register a memory area and
 to retrieve the current data handle associated with a pointer
@@ -452,7 +452,7 @@ incur less processing overhead than Nestable Locks.
 \subsection Critical Critical Sections
 
 The SORS implements support for OpenMP critical sections through the
-family of starpu_omp_critical functions. Critical sections may optionally
+family of \ref starpu_omp_critical functions. Critical sections may optionally
 be named. There is a single, common anonymous critical section. Mutual
 exclusion only occur within the scope of single critical section, either
 a named one or the anonymous one.

+ 38 - 38
doc/doxygen/chapters/501_environment_variables.doxy

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  * Copyright (C) 2011-2013,2015-2017                      Inria
- * Copyright (C) 2010-2018                                CNRS
+ * Copyright (C) 2010-2019                                CNRS
  * Copyright (C) 2009-2011,2013-2018                      Université de Bordeaux
  * Copyright (C) 2016                                     Uppsala University
  *
@@ -44,7 +44,7 @@ Specify the number of CPU cores that should not be used by StarPU, so the
 application can use starpu_get_next_bindid() and starpu_bind_thread_on() to bind
 its own threads.
 
-This option is ignored if \ref STARPU_NCPU or starpu_config::ncpus is set.
+This option is ignored if \ref STARPU_NCPU or starpu_conf::ncpus is set.
 </dd>
 
 <dt>STARPU_NCPUS</dt>
@@ -79,9 +79,9 @@ which will be concurrently running on the devices. The default value is 1.
 \addindex __env__STARPU_CUDA_THREAD_PER_WORKER
 Specify whether the cuda driver should use one thread per stream (1) or to use
 a single thread to drive all the streams of the device or all devices (0), and
-STARPU_CUDA_THREAD_PER_DEV determines whether is it one thread per device or one
+\ref STARPU_CUDA_THREAD_PER_DEV determines whether is it one thread per device or one
 thread for all devices. The default value is 0. Setting it to 1 is contradictory
-with setting STARPU_CUDA_THREAD_PER_DEV.
+with setting \ref STARPU_CUDA_THREAD_PER_DEV.
 </dd>
 
 <dt>STARPU_CUDA_THREAD_PER_DEV</dt>
@@ -90,8 +90,8 @@ with setting STARPU_CUDA_THREAD_PER_DEV.
 \addindex __env__STARPU_CUDA_THREAD_PER_DEV
 Specify whether the cuda driver should use one thread per device (1) or to use a
 single thread to drive all the devices (0). The default value is 1.  It does not
-make sense to set this variable if STARPU_CUDA_THREAD_PER_WORKER is set to to 1
-(since STARPU_CUDA_THREAD_PER_DEV is then meaningless).
+make sense to set this variable if \ref STARPU_CUDA_THREAD_PER_WORKER is set to to 1
+(since \ref STARPU_CUDA_THREAD_PER_DEV is then meaningless).
 </dd>
 
 <dt>STARPU_CUDA_PIPELINE</dt>
@@ -482,7 +482,7 @@ Note: this currently only applies to <c>dm</c> and <c>dmda</c> scheduling polici
 <dd>
 \anchor STARPU_CALIBRATE_MINIMUM
 \addindex __env__STARPU_CALIBRATE_MINIMUM
-This defines the minimum number of calibration measurements that will be made
+Define the minimum number of calibration measurements that will be made
 before considering that the performance model is calibrated. The default value is 10.
 </dd>
 
@@ -497,7 +497,7 @@ If this variable is set to 1, the bus is recalibrated during intialization.
 <dd>
 \anchor STARPU_PREFETCH
 \addindex __env__STARPU_PREFETCH
-This variable indicates whether data prefetching should be enabled (0 means
+Indicate whether data prefetching should be enabled (0 means
 that it is disabled). If prefetching is enabled, when a task is scheduled to be
 executed e.g. on a GPU, StarPU will request an asynchronous transfer in
 advance, so that data is already present on the GPU when the task starts. As a
@@ -695,7 +695,7 @@ When unset or set to 1, simulate within simgrid the GPU transfer queueing.
 <dd>
 \anchor STARPU_MALLOC_SIMULATION_FOLD
 \addindex __env__STARPU_MALLOC_SIMULATION_FOLD
-This defines the size of the file used for folding virtual allocation, in
+Define the size of the file used for folding virtual allocation, in
 MiB. The default is 1, thus allowing 64GiB virtual memory when Linux's
 <c>sysctl vm.max_map_count</c> value is the default 65535.
 </dd>
@@ -738,7 +738,7 @@ it also makes simulation non-deterministic.
 <dd>
 \anchor STARPU_HOME
 \addindex __env__STARPU_HOME
-This specifies the main directory in which StarPU stores its
+Specify the main directory in which StarPU stores its
 configuration files. The default is <c>$HOME</c> on Unix environments,
 and <c>$USERPROFILE</c> on Windows environments.
 </dd>
@@ -748,7 +748,7 @@ and <c>$USERPROFILE</c> on Windows environments.
 \anchor STARPU_PATH
 \addindex __env__STARPU_PATH
 Only used on  Windows environments.
-This specifies the main directory in which StarPU is installed
+Specify the main directory in which StarPU is installed
 (\ref RunningABasicStarPUApplicationOnMicrosoft)
 </dd>
 
@@ -756,7 +756,7 @@ This specifies the main directory in which StarPU is installed
 <dd>
 \anchor STARPU_PERF_MODEL_DIR
 \addindex __env__STARPU_PERF_MODEL_DIR
-This specifies the main directory in which StarPU stores its
+Specify the main directory in which StarPU stores its
 performance model files. The default is <c>$STARPU_HOME/.starpu/sampling</c>.
 </dd>
 
@@ -834,7 +834,7 @@ machines by setting <c>export STARPU_HOSTNAME=some_global_name</c>.
 <dd>
 \anchor STARPU_OPENCL_PROGRAM_DIR
 \addindex __env__STARPU_OPENCL_PROGRAM_DIR
-This specifies the directory where the OpenCL codelet source files are
+Specify the directory where the OpenCL codelet source files are
 located. The function starpu_opencl_load_program_source() looks
 for the codelet in the current directory, in the directory specified
 by the environment variable \ref STARPU_OPENCL_PROGRAM_DIR, in the
@@ -846,37 +846,37 @@ StarPU, and finally in the source directory of StarPU.
 <dd>
 \anchor STARPU_SILENT
 \addindex __env__STARPU_SILENT
-This variable allows to disable verbose mode at runtime when StarPU
-has been configured with the option \ref enable-verbose "--enable-verbose". It also
-disables the display of StarPU information and warning messages.
+Allow to disable verbose mode at runtime when StarPU
+has been configured with the option \ref enable-verbose "--enable-verbose". Also
+disable the display of StarPU information and warning messages.
 </dd>
 
 <dt>STARPU_LOGFILENAME</dt>
 <dd>
 \anchor STARPU_LOGFILENAME
 \addindex __env__STARPU_LOGFILENAME
-This variable specifies in which file the debugging output should be saved to.
+Specify in which file the debugging output should be saved to.
 </dd>
 
 <dt>STARPU_FXT_PREFIX</dt>
 <dd>
 \anchor STARPU_FXT_PREFIX
 \addindex __env__STARPU_FXT_PREFIX
-This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
+Specify in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
 </dd>
 
 <dt>STARPU_FXT_TRACE</dt>
 <dd>
 \anchor STARPU_FXT_TRACE
 \addindex __env__STARPU_FXT_TRACE
-This variable specifies whether to generate (1) or not (0) the FxT trace in /tmp/prof_file_XXX_YYY . The default is 1 (generate it)
+Specify whether to generate (1) or not (0) the FxT trace in /tmp/prof_file_XXX_YYY . The default is 1 (generate it)
 </dd>
 
 <dt>STARPU_LIMIT_CUDA_devid_MEM</dt>
 <dd>
 \anchor STARPU_LIMIT_CUDA_devid_MEM
 \addindex __env__STARPU_LIMIT_CUDA_devid_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application on the CUDA device with the identifier
 <c>devid</c>. This variable is intended to be used for experimental
 purposes as it emulates devices that have a limited amount of memory.
@@ -888,7 +888,7 @@ When defined, the variable overwrites the value of the variable
 <dd>
 \anchor STARPU_LIMIT_CUDA_MEM
 \addindex __env__STARPU_LIMIT_CUDA_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application on each CUDA devices. This variable is
 intended to be used for experimental purposes as it emulates devices
 that have a limited amount of memory.
@@ -898,7 +898,7 @@ that have a limited amount of memory.
 <dd>
 \anchor STARPU_LIMIT_OPENCL_devid_MEM
 \addindex __env__STARPU_LIMIT_OPENCL_devid_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application on the OpenCL device with the identifier
 <c>devid</c>. This variable is intended to be used for experimental
 purposes as it emulates devices that have a limited amount of memory.
@@ -910,7 +910,7 @@ When defined, the variable overwrites the value of the variable
 <dd>
 \anchor STARPU_LIMIT_OPENCL_MEM
 \addindex __env__STARPU_LIMIT_OPENCL_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application on each OpenCL devices. This variable is
 intended to be used for experimental purposes as it emulates devices
 that have a limited amount of memory.
@@ -920,7 +920,7 @@ that have a limited amount of memory.
 <dd>
 \anchor STARPU_LIMIT_CPU_MEM
 \addindex __env__STARPU_LIMIT_CPU_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application in the main CPU memory. Setting it enables allocation
 cache in main memory. Setting it to zero lets StarPU overflow memory.
 </dd>
@@ -929,7 +929,7 @@ cache in main memory. Setting it to zero lets StarPU overflow memory.
 <dd>
 \anchor STARPU_LIMIT_CPU_NUMA_devid_MEM
 \addindex __env__STARPU_LIMIT_CPU_NUMA_devid_MEM
-This variable specifies the maximum number of megabytes that should be
+Specify the maximum number of megabytes that should be
 available to the application on the NUMA node with the OS identifier <c>devid</c>.
 </dd>
 
@@ -937,7 +937,7 @@ available to the application on the NUMA node with the OS identifier <c>devid</c
 <dd>
 \anchor STARPU_MINIMUM_AVAILABLE_MEM
 \addindex __env__STARPU_MINIMUM_AVAILABLE_MEM
-This specifies the minimum percentage of memory that should be available in GPUs
+Specify the minimum percentage of memory that should be available in GPUs
 (or in main memory, when using out of core), below which a reclaiming pass is
 performed. The default is 0%.
 </dd>
@@ -946,7 +946,7 @@ performed. The default is 0%.
 <dd>
 \anchor STARPU_TARGET_AVAILABLE_MEM
 \addindex __env__STARPU_TARGET_AVAILABLE_MEM
-This specifies the target percentage of memory that should be reached in
+Specify the target percentage of memory that should be reached in
 GPUs (or in main memory, when using out of core), when performing a periodic
 reclaiming pass. The default is 0%.
 </dd>
@@ -955,7 +955,7 @@ reclaiming pass. The default is 0%.
 <dd>
 \anchor STARPU_MINIMUM_CLEAN_BUFFERS
 \addindex __env__STARPU_MINIMUM_CLEAN_BUFFERS
-This specifies the minimum percentage of number of buffers that should be clean in GPUs
+Specify the minimum percentage of number of buffers that should be clean in GPUs
 (or in main memory, when using out of core), below which asynchronous writebacks will be
 issued. The default is 5%.
 </dd>
@@ -964,7 +964,7 @@ issued. The default is 5%.
 <dd>
 \anchor STARPU_TARGET_CLEAN_BUFFERS
 \addindex __env__STARPU_TARGET_CLEAN_BUFFERS
-This specifies the target percentage of number of buffers that should be reached in
+Specify the target percentage of number of buffers that should be reached in
 GPUs (or in main memory, when using out of core), when performing an asynchronous
 writeback pass. The default is 10%.
 </dd>
@@ -982,7 +982,7 @@ can lead to deadlocks, so is to be considered experimental only.
 <dd>
 \anchor STARPU_DISK_SWAP
 \addindex __env__STARPU_DISK_SWAP
-This specifies a path where StarPU can push data when the main memory is getting
+Specify a path where StarPU can push data when the main memory is getting
 full.
 </dd>
 
@@ -990,7 +990,7 @@ full.
 <dd>
 \anchor STARPU_DISK_SWAP_BACKEND
 \addindex __env__STARPU_DISK_SWAP_BACKEND
-This specifies then backend to be used by StarPU to push data when the main
+Specify the backend to be used by StarPU to push data when the main
 memory is getting full. The default is unistd (i.e. using read/write functions),
 other values are stdio (i.e. using fread/fwrite), unistd_o_direct (i.e. using
 read/write with O_DIRECT), leveldb (i.e. using a leveldb database), and hdf5
@@ -1001,7 +1001,7 @@ read/write with O_DIRECT), leveldb (i.e. using a leveldb database), and hdf5
 <dd>
 \anchor STARPU_DISK_SWAP_SIZE
 \addindex __env__STARPU_DISK_SWAP_SIZE
-This specifies then maximum size in MiB to be used by StarPU to push data when the main
+Specify the maximum size in MiB to be used by StarPU to push data when the main
 memory is getting full. The default is unlimited.
 </dd>
 
@@ -1009,7 +1009,7 @@ memory is getting full. The default is unlimited.
 <dd>
 \anchor STARPU_LIMIT_MAX_SUBMITTED_TASKS
 \addindex __env__STARPU_LIMIT_MAX_SUBMITTED_TASKS
-This variable allows the user to control the task submission flow by specifying
+Allow users to control the task submission flow by specifying
 to StarPU a maximum number of submitted tasks allowed at a given time, i.e. when
 this limit is reached task submission becomes blocking until enough tasks have
 completed, specified by \ref STARPU_LIMIT_MIN_SUBMITTED_TASKS.
@@ -1020,7 +1020,7 @@ Setting it enables allocation cache buffer reuse in main memory.
 <dd>
 \anchor STARPU_LIMIT_MIN_SUBMITTED_TASKS
 \addindex __env__STARPU_LIMIT_MIN_SUBMITTED_TASKS
-This variable allows the user to control the task submission flow by specifying
+Allow users to control the task submission flow by specifying
 to StarPU a submitted task threshold to wait before unblocking task submission. This
 variable has to be used in conjunction with \ref STARPU_LIMIT_MAX_SUBMITTED_TASKS
 which puts the task submission thread to
@@ -1031,7 +1031,7 @@ sleep.  Setting it enables allocation cache buffer reuse in main memory.
 <dd>
 \anchor STARPU_TRACE_BUFFER_SIZE
 \addindex __env__STARPU_TRACE_BUFFER_SIZE
-This sets the buffer size for recording trace events in MiB. Setting it to a big
+Set the buffer size for recording trace events in MiB. Setting it to a big
 size allows to avoid pauses in the trace while it is recorded on the disk. This
 however also consumes memory, of course. The default value is 64.
 </dd>
@@ -1040,7 +1040,7 @@ however also consumes memory, of course. The default value is 64.
 <dd>
 \anchor STARPU_GENERATE_TRACE
 \addindex __env__STARPU_GENERATE_TRACE
-When set to <c>1</c>, this variable indicates that StarPU should automatically
+When set to <c>1</c>, indicate that StarPU should automatically
 generate a Paje trace when starpu_shutdown() is called.
 </dd>
 
@@ -1117,7 +1117,7 @@ be used in combination with \ref STARPU_WATCHDOG_CRASH
 <dd>
 \anchor STARPU_WATCHDOG_CRASH
 \addindex __env__STARPU_WATCHDOG_CRASH
-When set to a value other than 0, it triggers a crash when the watch
+When set to a value other than 0, trigger a crash when the watch
 dog is reached, thus allowing to catch the situation in gdb, etc
 (see \ref DetectionStuckConditions)
 </dd>
@@ -1126,7 +1126,7 @@ dog is reached, thus allowing to catch the situation in gdb, etc
 <dd>
 \anchor STARPU_WATCHDOG_DELAY
 \addindex __env__STARPU_WATCHDOG_DELAY
-This delays the activation of the watchdog by the given time (in µs). This can
+Delay the activation of the watchdog by the given time (in µs). This can
 be convenient for letting the application initialize data etc. before starting
 to look for idle time.
 </dd>

+ 2 - 2
doc/doxygen/chapters/601_scaling_vector_example.doxy

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2010-2017                                CNRS
+ * Copyright (C) 2010-2017, 2019                          CNRS
  * Copyright (C) 2009-2011,2014                           Université de Bordeaux
  * Copyright (C) 2011,2012                                Inria
  *
@@ -28,7 +28,7 @@
 
 \section CUDAKernel CUDA Kernel
 
-\snippet vector_scal_cuda.cu To be included. You should update doxygen if you see this text.
+\snippet vector_scal_cuda.c To be included. You should update doxygen if you see this text.
 
 \section OpenCLKernel OpenCL Kernel
 

doc/doxygen/chapters/code/vector_scal_cuda.cu → doc/doxygen/chapters/code/vector_scal_cuda.c