12 년 전 · e58ed783fc
--- a/doc/doxygen/chapters/advanced_examples.doxy
+++ b/doc/doxygen/chapters/advanced_examples.doxy
@@ -52,8 +52,8 @@ implementations it was given, and pick the one that seems to be the fastest.
 
				 Some implementations may not run on some devices. For instance, some CUDA
			
 
				 devices do not support double floating point precision, and thus the kernel
			
 
				 execution would just fail; or the device may not have enough shared memory for
			
 
				-the implementation being used. The <c>can_execute</c> field of the <c>struct
			
 
				-starpu_codelet</c> structure permits to express this. For instance:
			
 
				+the implementation being used. The field starpu_codelet::can_execute
			
 
				+permits to express this. For instance:
			
 
				 
			
 
				 \code{.c}
			
 
				 static int can_execute(unsigned workerid, struct starpu_task *task, unsigned nimpl)
			
@@ -83,10 +83,11 @@ struct starpu_codelet cl = {
 
				 This can be essential e.g. when running on a machine which mixes various models
			
 
				 of CUDA devices, to take benefit from the new models without crashing on old models.
			
 
				 
			
 
				-Note: the <c>can_execute</c> function is called by the scheduler each time it
			
 
				-tries to match a task with a worker, and should thus be very fast. The
			
 
				-starpu_cuda_get_device_properties() provides a quick access to CUDA
			
 
				-properties of CUDA devices to achieve such efficiency.
			
 
				+Note: the function starpu_codelet::can_execute is called by the
			
 
				+scheduler each time it tries to match a task with a worker, and should
			
 
				+thus be very fast. The function starpu_cuda_get_device_properties()
			
 
				+provides a quick access to CUDA properties of CUDA devices to achieve
			
 
				+such efficiency.
			
 
				 
			
 
				 Another example is compiling CUDA code for various compute capabilities,
			
 
				 resulting with two CUDA functions, e.g. <c>scal_gpu_13</c> for compute capability
			
@@ -208,8 +209,8 @@ starpu_data_filter f =
 
				 starpu_data_partition(handle, &f);
			
 
				 \endcode
			
 
				 
			
 
				-The task submission then uses starpu_data_get_sub_data() to retrieve the
			
 
				-sub-handles to be passed as tasks parameters.
			
 
				+The task submission then uses the function starpu_data_get_sub_data()
			
 
				+to retrieve the sub-handles to be passed as tasks parameters.
			
 
				 
			
 
				 \code{.c}
			
 
				 /* Submit a task on each sub-vector */
			
@@ -268,17 +269,18 @@ but applications can also write their own data interfaces and filters, see
 
				 
			
 
				 To achieve good scheduling, StarPU scheduling policies need to be able to
			
 
				 estimate in advance the duration of a task. This is done by giving to codelets
			
 
				-a performance model, by defining a <c>starpu_perfmodel</c> structure and
			
 
				-providing its address in the <c>model</c> field of the <c>struct starpu_codelet</c>
			
 
				-structure. The <c>symbol</c> and <c>type</c> fields of <c>starpu_perfmodel</c>
			
 
				-are mandatory, to give a name to the model, and the type of the model, since
			
 
				-there are several kinds of performance models. For compatibility, make sure to
			
 
				-initialize the whole structure to zero, either by using explicit memset, or by
			
 
				-letting the compiler implicitly do it as examplified below.
			
 
				+a performance model, by defining a structure starpu_perfmodel and
			
 
				+providing its address in the field starpu_codelet::model. The fields
			
 
				+starpu_perfmodel::symbol and starpu_perfmodel::type are mandatory, to
			
 
				+give a name to the model, and the type of the model, since there are
			
 
				+several kinds of performance models. For compatibility, make sure to
			
 
				+initialize the whole structure to zero, either by using explicit
			
 
				+memset(), or by letting the compiler implicitly do it as examplified
			
 
				+below.
			
 
				 
			
 
				 <ul>
			
 
				 <li>
			
 
				-Measured at runtime (<c>STARPU_HISTORY_BASED</c> model type). This assumes that for a
			
 
				+Measured at runtime (model type ::STARPU_HISTORY_BASED). This assumes that for a
			
 
				 given set of data input/output sizes, the performance will always be about the
			
 
				 same. This is very true for regular kernels on GPUs for instance (<0.1% error),
			
 
				 and just a bit less true on CPUs (~=1% error). This also assumes that there are
			
@@ -287,15 +289,15 @@ the average time of previous executions on the various processing units, and use
 
				 it as an estimation. History is done per task size, by using a hash of the input
			
 
				 and ouput sizes as an index.
			
 
				 It will also save it in <c>$STARPU_HOME/.starpu/sampling/codelets</c>
			
 
				-for further executions, and can be observed by using the
			
 
				-<c>starpu_perfmodel_display</c> command, or drawn by using
			
 
				-the <c>starpu_perfmodel_plot</c> (\ref Performance_model_calibration).  The
			
 
				+for further executions, and can be observed by using the tool
			
 
				+<c>starpu_perfmodel_display</c>, or drawn by using
			
 
				+the tool <c>starpu_perfmodel_plot</c> (\ref Performance_model_calibration).  The
			
 
				 models are indexed by machine name. To
			
 
				 share the models between machines (e.g. for a homogeneous cluster), use
			
 
				 <c>export STARPU_HOSTNAME=some_global_name</c>. Measurements are only done
			
 
				-when using a task scheduler which makes use of it, such as 
			
 
				+when using a task scheduler which makes use of it, such as
			
 
				 <c>dmda</c>. Measurements can also be provided explicitly by the application, by
			
 
				-using the starpu_perfmodel_update_history() function.
			
 
				+using the function starpu_perfmodel_update_history().
			
 
				 
			
 
				 The following is a small code example.
			
 
				 
			
@@ -323,16 +325,17 @@ struct starpu_codelet cl = {
 
				 
			
 
				 </li>
			
 
				 <li>
			
 
				-Measured at runtime and refined by regression (<c>STARPU_*REGRESSION_BASED</c>
			
 
				+Measured at runtime and refined by regression (model types
			
 
				+::STARPU_REGRESSION_BASED and ::STARPU_NL_REGRESSION_BASED)
			
 
				 model type). This still assumes performance regularity, but works
			
 
				 with various data input sizes, by applying regression over observed
			
 
				-execution times. STARPU_REGRESSION_BASED uses an a*n^b regression
			
 
				-form, STARPU_NL_REGRESSION_BASED uses an a*n^b+c (more precise than
			
 
				-STARPU_REGRESSION_BASED, but costs a lot more to compute).
			
 
				+execution times. ::STARPU_REGRESSION_BASED uses an a*n^b regression
			
 
				+form, ::STARPU_NL_REGRESSION_BASED uses an a*n^b+c (more precise than
			
 
				+::STARPU_REGRESSION_BASED, but costs a lot more to compute).
			
 
				 
			
 
				 For instance,
			
 
				 <c>tests/perfmodels/regression_based.c</c> uses a regression-based performance
			
 
				-model for the <c>memset</c> operation.
			
 
				+model for the function memset().
			
 
				 
			
 
				 Of course, the application has to issue
			
 
				 tasks with varying size so that the regression can be computed. StarPU will not
			
@@ -341,11 +344,12 @@ and maximum observed input size. It can be useful to set the
 
				 <c>STARPU_CALIBRATE</c> environment variable to <c>1</c> and run the application
			
 
				 on varying input sizes with <c>STARPU_SCHED</c> set to <c>eager</c> scheduler,
			
 
				 so as to feed the performance model for a variety of
			
 
				-inputs. The application can also provide the measurements explictly by using
			
 
				-starpu_perfmodel_update_history(). The <c>starpu_perfmodel_display</c> and
			
 
				-<c>starpu_perfmodel_plot</c>
			
 
				-tools can be used to observe how much the performance model is calibrated (\ref Performance_model_calibration); when
			
 
				-their output look good, <c>STARPU_CALIBRATE</c> can be reset to <c>0</c> to let
			
 
				+inputs. The application can also provide the measurements explictly by
			
 
				+using the function starpu_perfmodel_update_history(). The tools
			
 
				+<c>starpu_perfmodel_display</c> and <c>starpu_perfmodel_plot</c> can
			
 
				+be used to observe how much the performance model is calibrated (\ref
			
 
				+Performance_model_calibration); when their output look good,
			
 
				+<c>STARPU_CALIBRATE</c> can be reset to <c>0</c> to let
			
 
				 StarPU use the resulting performance model without recording new measures, and
			
 
				 <c>STARPU_SCHED</c> can be set to <c>dmda</c> to benefit from the performance models. If
			
 
				 the data input sizes vary a lot, it is really important to set
			
@@ -360,24 +364,26 @@ performance model to perform scheduling, without using regression.
 
				 </li>
			
 
				 
			
 
				 <li>
			
 
				-Provided as an estimation from the application itself (<c>STARPU_COMMON</c> model type and <c>cost_function</c> field),
			
 
				+Provided as an estimation from the application itself (model type
			
 
				+::STARPU_COMMON and field starpu_perfmodel::cost_function),
			
 
				 see for instance
			
 
				 <c>examples/common/blas_model.h</c> and <c>examples/common/blas_model.c</c>.
			
 
				 </li>
			
 
				 
			
 
				 <li>
			
 
				-Provided explicitly by the application (<c>STARPU_PER_ARCH</c> model type): the
			
 
				-<c>.per_arch[arch][nimpl].cost_function</c> fields have to be filled with pointers to
			
 
				-functions which return the expected duration of the task in micro-seconds, one
			
 
				-per architecture.
			
 
				+Provided explicitly by the application (model type ::STARPU_PER_ARCH):
			
 
				+the fields <c>.per_arch[arch][nimpl].cost_function</c> have to be
			
 
				+filled with pointers to functions which return the expected duration
			
 
				+of the task in micro-seconds, one per architecture.
			
 
				 </li>
			
 
				 </ul>
			
 
				 
			
 
				-For the <c>STARPU_HISTORY_BASED</c> and <c>STARPU_*REGRESSION_BASE</c>,
			
 
				-the total size of task data (both input and output) is used as an index by
			
 
				-default. The <c>size_base</c> field of <c>struct starpu_perfmodel</c> however
			
 
				-permits the application to override that, when for instance some of the data
			
 
				-do not matter for task cost (e.g. mere reference table), or when using sparse
			
 
				+For ::STARPU_HISTORY_BASED, ::STARPU_REGRESSION_BASED, and
			
 
				+::STARPU_NL_REGRESSION_BASED, the total size of task data (both input
			
 
				+and output) is used as an index by default. The field
			
 
				+starpu_perfmodel::size_base however permits the application to
			
 
				+override that, when for instance some of the data do not matter for
			
 
				+task cost (e.g. mere reference table), or when using sparse
			
 
				 structures (in which case it is the number of non-zeros which matter), or when
			
 
				 there is some hidden parameter such as the number of iterations, etc. The
			
 
				 <c>examples/pi</c> examples uses this to include the number of iterations in the
			
@@ -386,42 +392,45 @@ base.
 
				 How to use schedulers which can benefit from such performance model is explained
			
 
				 in \ref Task_scheduling_policy.
			
 
				 
			
 
				-The same can be done for task power consumption estimation, by setting the
			
 
				-<c>power_model</c> field the same way as the <c>model</c> field. Note: for
			
 
				-now, the application has to give to the power consumption performance model
			
 
				-a name which is different from the execution time performance model.
			
 
				+The same can be done for task power consumption estimation, by setting
			
 
				+the field starpu_codelet::power_model the same way as the field
			
 
				+starpu_codelet::model. Note: for now, the application has to give to
			
 
				+the power consumption performance model a name which is different from
			
 
				+the execution time performance model.
			
 
				 
			
 
				 The application can request time estimations from the StarPU performance
			
 
				 models by filling a task structure as usual without actually submitting
			
 
				-it. The data handles can be created by calling <c>starpu_*_data_register</c>
			
 
				-functions with a <c>NULL</c> pointer and <c>-1</c> node and the
			
 
				-desired data sizes, and need to be unregistered as usual. The
			
 
				-starpu_task_expected_length() and starpu_task_expected_power() functions can then be called to get an estimation of the task cost on a given
			
 
				-arch. starpu_task_footprint() can also be used to get the footprint used
			
 
				-for indexing history-based performance models.
			
 
				-starpu_task_destroy()
			
 
				-needs to be called to destroy the dummy task afterwards. See
			
 
				-<c>tests/perfmodels/regression_based.c</c> for an example.
			
 
				+it. The data handles can be created by calling any of the functions
			
 
				+<c>starpu_*_data_register</c> with a <c>NULL</c> pointer and <c>-1</c>
			
 
				+node and the desired data sizes, and need to be unregistered as usual.
			
 
				+The functions starpu_task_expected_length() and
			
 
				+starpu_task_expected_power() can then be called to get an estimation
			
 
				+of the task cost on a given arch. starpu_task_footprint() can also be
			
 
				+used to get the footprint used for indexing history-based performance
			
 
				+models. starpu_task_destroy() needs to be called to destroy the dummy
			
 
				+task afterwards. See <c>tests/perfmodels/regression_based.c</c> for an example.
			
 
				 
			
 
				 \section Theoretical_lower_bound_on_execution_time_example Theoretical lower bound on execution time
			
 
				 
			
 
				-For kernels with history-based performance models (and provided that they are completely calibrated), StarPU can very easily provide a theoretical lower
			
 
				-bound for the execution time of a whole set of tasks. See for
			
 
				-instance <c>examples/lu/lu_example.c</c>: before submitting tasks,
			
 
				-call <c>starpu_bound_start</c>, and after complete execution, call
			
 
				-<c>starpu_bound_stop</c>. <c>starpu_bound_print_lp</c> or
			
 
				-<c>starpu_bound_print_mps</c> can then be used to output a Linear Programming
			
 
				-problem corresponding to the schedule of your tasks. Run it through
			
 
				-<c>lp_solve</c> or any other linear programming solver, and that will give you a
			
 
				-lower bound for the total execution time of your tasks. If StarPU was compiled
			
 
				-with the glpk library installed, <c>starpu_bound_compute</c> can be used to
			
 
				-solve it immediately and get the optimized minimum, in ms. Its <c>integer</c>
			
 
				-parameter allows to decide whether integer resolution should be computed
			
 
				-and returned too.
			
 
				-
			
 
				-The <c>deps</c> parameter tells StarPU whether to take tasks, implicit data, and tag
			
 
				-dependencies into account. Tags released in a callback or similar
			
 
				-are not taken into account, only tags associated with a task are.
			
 
				+For kernels with history-based performance models (and provided that
			
 
				+they are completely calibrated), StarPU can very easily provide a
			
 
				+theoretical lower bound for the execution time of a whole set of
			
 
				+tasks. See for instance <c>examples/lu/lu_example.c</c>: before
			
 
				+submitting tasks, call the function starpu_bound_start(), and after
			
 
				+complete execution, call starpu_bound_stop().
			
 
				+starpu_bound_print_lp() or starpu_bound_print_mps() can then be used
			
 
				+to output a Linear Programming problem corresponding to the schedule
			
 
				+of your tasks. Run it through <c>lp_solve</c> or any other linear
			
 
				+programming solver, and that will give you a lower bound for the total
			
 
				+execution time of your tasks. If StarPU was compiled with the glpk
			
 
				+library installed, starpu_bound_compute() can be used to solve it
			
 
				+immediately and get the optimized minimum, in ms. Its parameter
			
 
				+<c>integer</c> allows to decide whether integer resolution should be
			
 
				+computed and returned too.
			
 
				+
			
 
				+The <c>deps</c> parameter tells StarPU whether to take tasks, implicit
			
 
				+data, and tag dependencies into account. Tags released in a callback
			
 
				+or similar are not taken into account, only tags associated with a task are.
			
 
				 It must be understood that the linear programming
			
 
				 problem size is quadratic with the number of tasks and thus the time to solve it
			
 
				 will be very long, it could be minutes for just a few dozen tasks. You should
			
@@ -432,8 +441,9 @@ useful), but sometimes doesn't manage to converge. <c>cbc</c> might look
 
				 slower, but it is parallel. For <c>lp_solve</c>, be sure to try at least all the
			
 
				 <c>-B</c> options. For instance, we often just use <c>lp_solve -cc -B1 -Bb
			
 
				 -Bg -Bp -Bf -Br -BG -Bd -Bs -BB -Bo -Bc -Bi</c> , and the <c>-gr</c> option can
			
 
				-also be quite useful. The resulting schedule can be observed by using the
			
 
				-<c>starpu_lp2paje</c> tool, which converts it into the Paje format.
			
 
				+also be quite useful. The resulting schedule can be observed by using
			
 
				+the tool <c>starpu_lp2paje</c>, which converts it into the Paje
			
 
				+format.
			
 
				 
			
 
				 Data transfer time can only be taken into account when <c>deps</c> is set. Only
			
 
				 data transfers inferred from implicit data dependencies between tasks are taken
			
@@ -451,9 +461,8 @@ to a less optimal solution. This increases even more computation time.
 
				 
			
 
				 \section Insert_Task_Utility Insert Task Utility
			
 
				 
			
 
				-StarPU provides the wrapper function <c>starpu_insert_task</c> to ease
			
 
				-the creation and submission of tasks. See the definition of the
			
 
				-functions in \ref Insert_Task.
			
 
				+StarPU provides the wrapper function starpu_insert_task() to ease
			
 
				+the creation and submission of tasks.
			
 
				 
			
 
				 Here the implementation of the codelet:
			
 
				 
			
@@ -478,7 +487,7 @@ struct starpu_codelet mycodelet = {
 
				 };
			
 
				 \endcode
			
 
				 
			
 
				-And the call to the <c>starpu_insert_task</c> wrapper:
			
 
				+And the call to the function starpu_insert_task():
			
 
				 
			
 
				 \code{.c}
			
 
				 starpu_insert_task(&mycodelet,
			
@@ -488,7 +497,7 @@ starpu_insert_task(&mycodelet,
 
				                    0);
			
 
				 \endcode
			
 
				 
			
 
				-The call to <c>starpu_insert_task</c> is equivalent to the following
			
 
				+The call to starpu_insert_task() is equivalent to the following
			
 
				 code:
			
 
				 
			
 
				 \code{.c}
			
@@ -507,7 +516,7 @@ task->cl_arg_size = arg_buffer_size;
 
				 int ret = starpu_task_submit(task);
			
 
				 \endcode
			
 
				 
			
 
				-Here a similar call using <c>STARPU_DATA_ARRAY</c>.
			
 
				+Here a similar call using ::STARPU_DATA_ARRAY.
			
 
				 
			
 
				 \code{.c}
			
 
				 starpu_insert_task(&mycodelet,
			
@@ -518,7 +527,7 @@ starpu_insert_task(&mycodelet,
 
				 \endcode
			
 
				 
			
 
				 If some part of the task insertion depends on the value of some computation,
			
 
				-the <c>STARPU_DATA_ACQUIRE_CB</c> macro can be very convenient. For
			
 
				+the macro ::STARPU_DATA_ACQUIRE_CB can be very convenient. For
			
 
				 instance, assuming that the index variable <c>i</c> was registered as handle
			
 
				 <c>i_handle</c>:
			
 
				 
			
@@ -531,11 +540,11 @@ STARPU_DATA_ACQUIRE_CB(i_handle, STARPU_R,
 
				                        starpu_insert_task(&work, STARPU_RW, A_handle[i], 0));
			
 
				 \endcode
			
 
				 
			
 
				-The <c>STARPU_DATA_ACQUIRE_CB</c> macro submits an asynchronous request for
			
 
				+The macro ::STARPU_DATA_ACQUIRE_CB submits an asynchronous request for
			
 
				 acquiring data <c>i</c> for the main application, and will execute the code
			
 
				 given as third parameter when it is acquired. In other words, as soon as the
			
 
				 value of <c>i</c> computed by the <c>which_index</c> codelet can be read, the
			
 
				-portion of code passed as third parameter of <c>STARPU_DATA_ACQUIRE_CB</c> will
			
 
				+portion of code passed as third parameter of ::STARPU_DATA_ACQUIRE_CB will
			
 
				 be executed, and is allowed to read from <c>i</c> to use it e.g. as an
			
 
				 index. Note that this macro is only avaible when compiling StarPU with
			
 
				 the compiler <c>gcc</c>.
			
@@ -548,14 +557,14 @@ the histogram of a photograph, etc. When these results are produced along the
 
				 whole machine, it would not be efficient to accumulate them in only one place,
			
 
				 incurring data transmission each and access concurrency.
			
 
				 
			
 
				-StarPU provides a <c>STARPU_REDUX</c> mode, which permits to optimize
			
 
				+StarPU provides a mode ::STARPU_REDUX, which permits to optimize
			
 
				 that case: it will allocate a buffer on each memory node, and accumulate
			
 
				 intermediate results there. When the data is eventually accessed in the normal
			
 
				-<c>STARPU_R</c> mode, StarPU will collect the intermediate results in just one
			
 
				+mode ::STARPU_R, StarPU will collect the intermediate results in just one
			
 
				 buffer.
			
 
				 
			
 
				-For this to work, the user has to use the
			
 
				-<c>starpu_data_set_reduction_methods</c> to declare how to initialize these
			
 
				+For this to work, the user has to use the function
			
 
				+starpu_data_set_reduction_methods() to declare how to initialize these
			
 
				 buffers, and how to assemble partial results.
			
 
				 
			
 
				 For instance, <c>cg</c> uses that to optimize its dot product: it first defines
			
@@ -592,7 +601,7 @@ struct starpu_codelet accumulate_variable_cl =
 
				 }
			
 
				 \endcode
			
 
				 
			
 
				-and attaches them as reduction methods for its dtq handle:
			
 
				+and attaches them as reduction methods for its <c>dtq</c> handle:
			
 
				 
			
 
				 \code{.c}
			
 
				 starpu_variable_data_register(&dtq_handle, -1, NULL, sizeof(type));
			
@@ -600,8 +609,8 @@ starpu_data_set_reduction_methods(dtq_handle,
 
				         &accumulate_variable_cl, &bzero_variable_cl);
			
 
				 \endcode
			
 
				 
			
 
				-and <c>dtq_handle</c> can now be used in <c>STARPU_REDUX</c> mode for the dot products
			
 
				-with partitioned vectors:
			
 
				+and <c>dtq_handle</c> can now be used in mode ::STARPU_REDUX for the
			
 
				+dot products with partitioned vectors:
			
 
				 
			
 
				 \code{.c}
			
 
				 for (b = 0; b < nblocks; b++)
			
@@ -612,24 +621,25 @@ for (b = 0; b < nblocks; b++)
 
				         0);
			
 
				 \endcode
			
 
				 
			
 
				-During registration, we have here provided NULL, i.e. there is no initial value
			
 
				-to be taken into account during reduction. StarPU will thus only take into
			
 
				-account the contributions from the <c>dot_kernel_cl</c> tasks. Also, it will not
			
 
				-allocate any memory for <c>dtq_handle</c> before <c>dot_kernel_cl</c> tasks are
			
 
				-ready to run.
			
 
				+During registration, we have here provided <c>NULL</c>, i.e. there is
			
 
				+no initial value to be taken into account during reduction. StarPU
			
 
				+will thus only take into account the contributions from the tasks
			
 
				+<c>dot_kernel_cl</c>. Also, it will not allocate any memory for
			
 
				+<c>dtq_handle</c> before tasks <c>dot_kernel_cl</c> are ready to run.
			
 
				 
			
 
				 If another dot product has to be performed, one could unregister
			
 
				-<c>dtq_handle</c>, and re-register it. But one can also use
			
 
				-<c>starpu_data_invalidate_submit(dtq_handle)</c>, which will clear all data from the handle,
			
 
				-thus resetting it back to the initial <c>register(NULL)</c> state.
			
 
				+<c>dtq_handle</c>, and re-register it. But one can also call
			
 
				+starpu_data_invalidate_submit() with the parameter <c>dtq_handle</c>,
			
 
				+which will clear all data from the handle, thus resetting it back to
			
 
				+the initial status <c>register(NULL)</c>.
			
 
				 
			
 
				-The <c>cg</c> example also uses reduction for the blocked gemv kernel, leading
			
 
				-to yet more relaxed dependencies and more parallelism.
			
 
				+The example <c>cg</c> also uses reduction for the blocked gemv kernel,
			
 
				+leading to yet more relaxed dependencies and more parallelism.
			
 
				 
			
 
				-STARPU_REDUX can also be passed to <c>starpu_mpi_insert_task</c> in the MPI
			
 
				+::STARPU_REDUX can also be passed to starpu_mpi_insert_task() in the MPI
			
 
				 case. That will however not produce any MPI communication, but just pass
			
 
				-STARPU_REDUX to the underlying <c>starpu_insert_task</c>. It is up to the
			
 
				-application to call <c>starpu_mpi_redux_data</c>, which posts tasks that will
			
 
				+::STARPU_REDUX to the underlying starpu_insert_task(). It is up to the
			
 
				+application to call starpu_mpi_redux_data(), which posts tasks that will
			
 
				 reduce the partial results among MPI nodes into the MPI node which owns the
			
 
				 data. For instance, some hypothetical application which collects partial results
			
 
				 into data <c>res</c>, then uses it for other computation, before looping again
			
@@ -662,7 +672,7 @@ and destroy it on unregistration.
 
				 
			
 
				 In addition to that, it can be tedious for the application to have to unregister
			
 
				 the data, since it will not use its content anyway. The unregistration can be
			
 
				-done lazily by using the <c>starpu_data_unregister_submit(handle)</c> function,
			
 
				+done lazily by using the function starpu_data_unregister_submit(),
			
 
				 which will record that no more tasks accessing the handle will be submitted, so
			
 
				 that it can be freed as soon as the last task accessing it is over.
			
 
				 
			
@@ -725,7 +735,7 @@ Two modes of execution exist to accomodate with existing usages.
 
				 
			
 
				 In the Fork mode, StarPU will call the codelet function on one
			
 
				 of the CPUs of the combined worker. The codelet function can use
			
 
				-<c>starpu_combined_worker_get_size()</c> to get the number of threads it is
			
 
				+starpu_combined_worker_get_size() to get the number of threads it is
			
 
				 allowed to start to achieve the computation. The CPU binding mask for the whole
			
 
				 set of CPUs is already enforced, so that threads created by the function will
			
 
				 inherit the mask, and thus execute where StarPU expected, the OS being in charge
			
@@ -768,9 +778,9 @@ Other examples include for instance calling a BLAS parallel CPU implementation
 
				 
			
 
				 In the SPMD mode, StarPU will call the codelet function on
			
 
				 each CPU of the combined worker. The codelet function can use
			
 
				-<c>starpu_combined_worker_get_size()</c> to get the total number of CPUs
			
 
				+starpu_combined_worker_get_size() to get the total number of CPUs
			
 
				 involved in the combined worker, and thus the number of calls that are made in
			
 
				-parallel to the function, and <c>starpu_combined_worker_get_rank()</c> to get
			
 
				+parallel to the function, and starpu_combined_worker_get_rank() to get
			
 
				 the rank of the current CPU within the combined worker. For instance:
			
 
				 
			
 
				 \code{.c}
			
@@ -828,7 +838,7 @@ intermediate sizes. The <c>STARPU_SYNTHESIZE_ARITY_COMBINED_WORKER</c> variable
 
				 permits to tune the maximum arity between levels of combined workers.
			
 
				 
			
 
				 The combined workers actually produced can be seen in the output of the
			
 
				-<c>starpu_machine_display</c> tool (the <c>STARPU_SCHED</c> environment variable
			
 
				+tool <c>starpu_machine_display</c> (the <c>STARPU_SCHED</c> environment variable
			
 
				 has to be set to a combined worker-aware scheduler such as <c>pheft</c> or
			
 
				 <c>peager</c>).
			
 
				 
			
@@ -847,9 +857,9 @@ from different threads, due to the use of global variables in their sequential
 
				 sections for instance.
			
 
				 
			
 
				 The solution is then to use only one combined worker at a time.  This can be
			
 
				-done by setting <c>single_combined_worker</c> to 1 in the <c>starpu_conf</c>
			
 
				-structure, or setting the <c>STARPU_SINGLE_COMBINED_WORKER</c> environment
			
 
				-variable to 1. StarPU will then run only one parallel task at a time (but other
			
 
				+done by setting the field starpu_conf::single_combined_worker to 1, or
			
 
				+setting the <c>STARPU_SINGLE_COMBINED_WORKER</c> environment variable
			
 
				+to 1. StarPU will then run only one parallel task at a time (but other 
			
 
				 CPU and GPU tasks are not affected and can be run concurrently). The parallel
			
 
				 task scheduler will however still however still try varying combined worker
			
 
				 sizes to look for the most efficient ones.
			
@@ -919,10 +929,11 @@ starpu_multiformat_data_register(handle, 0, &array_of_structs, NX, &format_ops);
 
				 \endcode
			
 
				 
			
 
				 Kernels can be written almost as for any other interface. Note that
			
 
				-STARPU_MULTIFORMAT_GET_CPU_PTR shall only be used for CPU kernels. CUDA kernels
			
 
				-must use STARPU_MULTIFORMAT_GET_CUDA_PTR, and OpenCL kernels must use
			
 
				-STARPU_MULTIFORMAT_GET_OPENCL_PTR. STARPU_MULTIFORMAT_GET_NX may be used in any
			
 
				-kind of kernel.
			
 
				+::STARPU_MULTIFORMAT_GET_CPU_PTR shall only be used for CPU kernels. CUDA kernels
			
 
				+must use ::STARPU_MULTIFORMAT_GET_CUDA_PTR, and OpenCL kernels must use
			
 
				+::STARPU_MULTIFORMAT_GET_OPENCL_PTR. ::STARPU_MULTIFORMAT_GET_NX may
			
 
				+be used in any kind of kernel.
			
 
				+
			
 
				 \code{.c}
			
 
				 static void
			
 
				 multiformat_scal_cpu_func(void *buffers[], void *args)
			
@@ -977,7 +988,7 @@ if (ret != 0)
 
				 A full example showing how to define a new scheduling policy is available in
			
 
				 the StarPU sources in the directory <c>examples/scheduler/</c>.
			
 
				 
			
 
				-@pxref{Scheduling Policy}
			
 
				+\ref Scheduling_Policy
			
 
				 
			
 
				 \code{.c}
			
 
				 static struct starpu_sched_policy dummy_sched_policy = {
			
@@ -1008,14 +1019,15 @@ to be the one that runs CUDA computations for that GPU.
 
				 To achieve this with StarPU, pass the <c>--disable-cuda-memcpy-peer</c> option
			
 
				 to <c>./configure</c> (TODO: make it dynamic), OpenGL/GLUT has to be initialized
			
 
				 first, and the interoperability mode has to
			
 
				-be enabled by using the <c>cuda_opengl_interoperability</c> field of the
			
 
				-<c>starpu_conf</c> structure, and the driver loop has to be run by
			
 
				-the application, by using the <c>not_launched_drivers</c> field of
			
 
				-<c>starpu_conf</c> to prevent StarPU from running it in a separate thread, and
			
 
				-by using <c>starpu_driver_run</c> to run the loop. The <c>gl_interop</c> and
			
 
				-<c>gl_interop_idle</c> examples shows how it articulates in a simple case, where
			
 
				-rendering is done in task callbacks. The former uses <c>glutMainLoopEvent</c>
			
 
				-to make GLUT progress from the StarPU driver loop, while the latter uses
			
 
				+be enabled by using the field
			
 
				+starpu_conf::cuda_opengl_interoperability, and the driver loop has to
			
 
				+be run by the application, by using the field
			
 
				+starpu_conf::not_launched_drivers to prevent StarPU from running it in
			
 
				+a separate thread, and by using starpu_driver_run() to run the loop.
			
 
				+The examples <c>gl_interop</c> and <c>gl_interop_idle</c> show how it
			
 
				+articulates in a simple case, where rendering is done in task
			
 
				+callbacks. The former uses <c>glutMainLoopEvent</c> to make GLUT
			
 
				+progress from the StarPU driver loop, while the latter uses
			
 
				 <c>glutIdleFunc</c> to make StarPU progress from the GLUT main loop.
			
 
				 
			
 
				 Then, to use an OpenGL buffer as a CUDA data, StarPU simply needs to be given
			
@@ -1060,7 +1072,7 @@ struct starpu_complex_interface
 
				 \endcode
			
 
				 
			
 
				 Registering such a data to StarPU is easily done using the function
			
 
				-<c>starpu_data_register</c> (@pxref{Basic Data Management API}). The last
			
 
				+starpu_data_register(). The last
			
 
				 parameter of the function, <c>interface_complex_ops</c>, will be
			
 
				 described below.
			
 
				 
			
@@ -1085,10 +1097,9 @@ void starpu_complex_data_register(starpu_data_handle_t *handle,
 
				 \endcode
			
 
				 
			
 
				 Different operations need to be defined for a data interface through
			
 
				-the type <c>struct starpu_data_interface_ops</c> (@pxref{Defining
			
 
				-Interface}). We only define here the basic operations needed to
			
 
				-run simple applications. The source code for the different functions
			
 
				-can be found in the file
			
 
				+the type starpu_data_interface_ops. We only define here the basic
			
 
				+operations needed to run simple applications. The source code for the
			
 
				+different functions can be found in the file
			
 
				 <c>examples/interface/complex_interface.c</c>.
			
 
				 
			
 
				 \code{.c}
			
--- a/doc/doxygen/chapters/api/codelet_and_tasks.doxy
+++ b/doc/doxygen/chapters/api/codelet_and_tasks.doxy
@@ -6,75 +6,75 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Codelet_And_Tasks Codelet And Tasks
			
 
				+/*! \defgroup API_Codelet_And_Tasks Codelet And Tasks
			
 
				 
			
 
				 \brief This section describes the interface to manipulate codelets and tasks.
			
 
				 
			
 
				 \enum starpu_codelet_type
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 \brief Describes the type of parallel task. See \ref Parallel_Tasks for details.
			
 
				 \var starpu_codelet_type::STARPU_SEQ
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 (default) for classical sequential tasks.
			
 
				 \var starpu_codelet_type::STARPU_SPMD
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 for a parallel task whose threads are handled by StarPU, the code has
			
 
				 to use starpu_combined_worker_get_size() and
			
 
				 starpu_combined_worker_get_rank() to distribute the work.
			
 
				 \var starpu_codelet_type::STARPU_FORKJOIN
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 for a parallel task whose threads are started by the codelet function,
			
 
				 which has to use starpu_combined_worker_get_size() to determine how
			
 
				 many threads should be started.
			
 
				 
			
 
				 \enum starpu_task_status
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 \brief Task status
			
 
				 \var starpu_task_status::STARPU_TASK_INVALID
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task has just been initialized.
			
 
				 \var starpu_task_status::STARPU_TASK_BLOCKED
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task has just been submitted, and its dependencies has not been
			
 
				 checked yet.
			
 
				 \var starpu_task_status::STARPU_TASK_READY
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is ready for execution.
			
 
				 \var starpu_task_status::STARPU_TASK_RUNNING
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is running on some worker.
			
 
				 \var starpu_task_status::STARPU_TASK_FINISHED
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is finished executing.
			
 
				 \var starpu_task_status::STARPU_TASK_BLOCKED_ON_TAG
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is waiting for a tag.
			
 
				 \var starpu_task_status::STARPU_TASK_BLOCKED_ON_TASK
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is waiting for a task.
			
 
				 \var starpu_task_status::STARPU_TASK_BLOCKED_ON_DATA
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 The task is waiting for some data.
			
 
				 
			
 
				 
			
 
				 \def STARPU_CPU
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This macro is used when setting the field starpu_codelet::where
			
 
				 to specify the codelet may be executed on a CPU processing unit.
			
 
				 
			
 
				 \def STARPU_CUDA
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This macro is used when setting the field starpu_codelet::where
			
 
				 to specify the codelet may be executed on a CUDA processing unit.
			
 
				 
			
 
				 \def STARPU_OPENCL
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This macro is used when setting the field starpu_codelet::where to
			
 
				 specify the codelet may be executed on a OpenCL processing unit.
			
 
				 
			
 
				 \def STARPU_MULTIPLE_CPU_IMPLEMENTATIONS
			
 
				 \deprecated
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Setting the field starpu_codelet::cpu_func with this macro
			
 
				 indicates the codelet will have several implementations. The use of
			
 
				 this macro is deprecated. One should always only define the field
			
@@ -82,7 +82,7 @@ starpu_codelet::cpu_funcs.
 
				 
			
 
				 \def STARPU_MULTIPLE_CUDA_IMPLEMENTATIONS
			
 
				 \deprecated
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Setting the field starpu_codelet::cuda_func with this macro
			
 
				 indicates the codelet will have several implementations. The use of
			
 
				 this macro is deprecated. One should always only define the field
			
@@ -90,30 +90,30 @@ starpu_codelet::cuda_funcs.
 
				 
			
 
				 \def STARPU_MULTIPLE_OPENCL_IMPLEMENTATIONS
			
 
				 \deprecated
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Setting the field starpu_codelet::opencl_func with
			
 
				 this macro indicates the codelet will have several implementations.
			
 
				 The use of this macro is deprecated. One should always only define the
			
 
				 field starpu_codelet::opencl_funcs.
			
 
				 
			
 
				 \def starpu_cpu_func_t
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 CPU implementation of a codelet.
			
 
				 
			
 
				 \def starpu_cuda_func_t
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 CUDA implementation of a codelet.
			
 
				 
			
 
				 \def starpu_opencl_func_t
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 OpenCL implementation of a codelet.
			
 
				 
			
 
				 \def starpu_mic_func_t
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 MIC implementation of a codelet.
			
 
				 
			
 
				 \def starpu_scc_func_t
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 SCC implementation of a codelet.
			
 
				 
			
 
				 \struct starpu_codelet
			
@@ -122,7 +122,7 @@ implemented on various targets. For compatibility, make sure to
 
				 initialize the whole structure to zero, either by using explicit
			
 
				 memset, or the function starpu_codelet_init(), or by letting the
			
 
				 compiler implicitly do it in e.g. static storage case.
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 \var starpu_codelet::where.
			
 
				 Optional field to indicate which types of processing units are able to
			
 
				 execute the codelet. The different values ::STARPU_CPU, ::STARPU_CUDA,
			
@@ -249,14 +249,14 @@ Optional name of the codelet. This can be useful for debugging
 
				 purposes.
			
 
				 
			
 
				 \fn void starpu_codelet_init(struct starpu_codelet *cl)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Initialize \p cl with default values. Codelets should
			
 
				 preferably be initialized statically as shown in \ref
			
 
				 Defining_a_Codelet. However such a initialisation is not always
			
 
				 possible, e.g. when using C++.
			
 
				 
			
 
				 \struct starpu_data_descr
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 \brief This type is used to describe a data handle along with an
			
 
				 access mode.
			
 
				 \var starpu_data_descr::handle
			
@@ -265,7 +265,7 @@ describes a data
 
				 describes its access mode
			
 
				 
			
 
				 \struct starpu_task
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 \brief The structure describes a task that can be offloaded on the
			
 
				 various processing units managed by StarPU. It instantiates a codelet.
			
 
				 It can either be allocated dynamically with the function
			
@@ -473,7 +473,7 @@ Helps the hypervisor monitor the execution of this task.
 
				 Whether the scheduler has pushed the task on some queue
			
 
				 
			
 
				 \fn void starpu_task_init(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Initialize task with default values. This function is
			
 
				 implicitly called by starpu_task_create(). By default, tasks initialized
			
 
				 with starpu_task_init() must be deinitialized explicitly with
			
@@ -481,13 +481,13 @@ starpu_task_clean(). Tasks can also be initialized statically, using
 
				 STARPU_TASK_INITIALIZER.
			
 
				 
			
 
				 \def STARPU_TASK_INITIALIZER
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 It is possible to initialize statically allocated tasks with
			
 
				 this value. This is equivalent to initializing a structure starpu_task
			
 
				 with the function starpu_task_init() function.
			
 
				 
			
 
				 \def STARPU_TASK_GET_HANDLE(struct starpu_task *task, int i)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Return the \p i th data handle of the given task. If the task
			
 
				 is defined with a static or dynamic number of handles, will either
			
 
				 return the \p i th element of the field starpu_task::handles or the \p
			
@@ -495,7 +495,7 @@ i th element of the field starpu_task::dyn_handles (see \ref
 
				 Setting_the_Data_Handles_for_a_Task)
			
 
				 
			
 
				 \def STARPU_TASK_SET_HANDLE(struct starpu_task *task, starpu_data_handle_t handle, int i)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Set the \p i th data handle of the given task with the given
			
 
				 dat handle. If the task is defined with a static or dynamic number of
			
 
				 handles, will either set the \p i th element of the field
			
@@ -504,7 +504,7 @@ starpu_task::dyn_handles (see \ref
 
				 Setting_the_Data_Handles_for_a_Task)
			
 
				 
			
 
				 \def STARPU_CODELET_GET_MODE(struct starpu_codelet *codelet, int i)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Return the access mode of the \p i th data handle of the given
			
 
				 codelet. If the codelet is defined with a static or dynamic number of
			
 
				 handles, will either return the \p i th element of the field
			
@@ -513,7 +513,7 @@ starpu_codelet::dyn_modes (see \ref
 
				 Setting_the_Data_Handles_for_a_Task)
			
 
				 
			
 
				 \def STARPU_CODELET_SET_MODE(struct starpu_codelet *codelet, enum starpu_data_access_mode mode, int i)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Set the access mode of the \p i th data handle of the given
			
 
				 codelet. If the codelet is defined with a static or dynamic number of
			
 
				 handles, will either set the \p i th element of the field
			
@@ -522,7 +522,7 @@ starpu_codelet::dyn_modes (see \ref
 
				 Setting_the_Data_Handles_for_a_Task)
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_create(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Allocate a task structure and initialize it with default
			
 
				 values. Tasks allocated dynamically with starpu_task_create() are
			
 
				 automatically freed when the task is terminated. This means that the
			
@@ -533,12 +533,12 @@ explicitly unset, the resources used by the task have to be freed by
 
				 calling starpu_task_destroy().
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_dup(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Allocate a task structure which is the exact duplicate of the
			
 
				 given task.
			
 
				 
			
 
				 \fn void starpu_task_clean(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Release all the structures automatically allocated to execute
			
 
				 task, but not the task structure itself and values set by the user
			
 
				 remain unchanged. It is thus useful for statically allocated tasks for
			
@@ -550,7 +550,7 @@ only after explicitly waiting for the task or after starpu_shutdown()
 
				 manipulates the task after calling the callback).
			
 
				 
			
 
				 \fn void starpu_task_destroy(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Free the resource allocated during starpu_task_create() and
			
 
				 associated with task. This function is already called automatically
			
 
				 after the execution of a task when the field starpu_task::destroy is
			
@@ -559,7 +559,7 @@ Calling this function on a statically allocated task results in an
 
				 undefined behaviour.
			
 
				 
			
 
				 \fn int starpu_task_wait(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function blocks until \p task has been executed. It is not
			
 
				 possible to synchronize with a task more than once. It is not possible
			
 
				 to wait for synchronous or detached tasks. Upon successful completion,
			
@@ -567,7 +567,7 @@ this function returns 0. Otherwise, <c>-EINVAL</c> indicates that the
 
				 specified task was either synchronous or detached.
			
 
				 
			
 
				 \fn int starpu_task_submit(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function submits task to StarPU. Calling this function
			
 
				 does not mean that the task will be executed immediately as there can
			
 
				 be data or task (tag) dependencies that are not fulfilled yet: StarPU
			
@@ -585,56 +585,56 @@ functions and callbacks, provided that the field
 
				 starpu_task::synchronous is set to 0.
			
 
				 
			
 
				 \fn int starpu_task_wait_for_all(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function blocks until all the tasks that were submitted
			
 
				 (to the current context or the global one if there aren't any) are
			
 
				 terminated. It does not destroy these tasks.
			
 
				 
			
 
				 \fn int starpu_task_wait_for_all_in_ctx(unsigned sched_ctx_id)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function waits until all the tasks that were already
			
 
				 submitted to the context \p sched_ctx_id have been executed
			
 
				 
			
 
				 \fn int starpu_task_nready(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 TODO
			
 
				 
			
 
				 \fn int starpu_task_nsubmitted(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Return the number of submitted tasks which have not completed yet.
			
 
				 
			
 
				 \fn int starpu_task_nready(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Return the number of submitted tasks which are ready for
			
 
				 execution are already executing. It thus does not include tasks
			
 
				 waiting for dependencies.
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_get_current(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function returns the task currently executed by the
			
 
				 worker, or <c>NULL</c> if it is called either from a thread that is not a
			
 
				 task or simply because there is no task being executed at the moment.
			
 
				 
			
 
				 \fn void starpu_codelet_display_stats(struct starpu_codelet *cl)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 Output on stderr some statistics on the codelet \p cl.
			
 
				 
			
 
				 \fn int starpu_task_wait_for_no_ready(void)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function waits until there is no more ready task.
			
 
				 
			
 
				 \fn void starpu_task_set_implementation(struct starpu_task *task, unsigned impl)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function should be called by schedulers to specify the
			
 
				 codelet implementation to be executed when executing the task.
			
 
				 
			
 
				 \fn unsigned starpu_task_get_implementation(struct starpu_task *task)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This function return the codelet implementation to be executed
			
 
				 when executing the task.
			
 
				 
			
 
				 \fn void starpu_create_sync_task(starpu_tag_t sync_tag, unsigned ndeps, starpu_tag_t *deps,	void (*callback)(void *), void *callback_arg)
			
 
				-\ingroup Codelet_And_Tasks
			
 
				+\ingroup API_Codelet_And_Tasks
			
 
				 This creates (and submits) an empty task that unlocks a tag once all
			
 
				 its dependencies are fulfilled.
			
 
				 
			
--- a/doc/doxygen/chapters/api/cuda_extensions.doxy
+++ b/doc/doxygen/chapters/api/cuda_extensions.doxy
@@ -6,16 +6,16 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup CUDA_Extensions CUDA Extensions
			
 
				+/*! \defgroup API_CUDA_Extensions CUDA Extensions
			
 
				 
			
 
				 \def STARPU_USE_CUDA
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 This macro is defined when StarPU has been installed with CUDA
			
 
				 support. It should be used in your code to detect the availability of
			
 
				 CUDA as shown in Full source code for the 'Scaling a Vector' example.
			
 
				 
			
 
				 \fn cudaStream_t starpu_cuda_get_local_stream(void)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 This function gets the current worker’s CUDA stream. StarPU
			
 
				 provides a stream for every CUDA device controlled by StarPU. This
			
 
				 function is only provided for convenience so that programmers can
			
@@ -27,20 +27,20 @@ allowed, but will reduce the likelihood of having all transfers
 
				 overlapped.
			
 
				 
			
 
				 \fn const struct cudaDeviceProp * starpu_cuda_get_device_properties(unsigned workerid)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 This function returns a pointer to device properties for worker
			
 
				 \p workerid (assumed to be a CUDA worker).
			
 
				 
			
 
				 \fn void starpu_cuda_report_error(const char *func, const char *file, int line, cudaError_t status)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Report a CUDA error.
			
 
				 
			
 
				 \def STARPU_CUDA_REPORT_ERROR (cudaError_t status)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Calls starpu_cuda_report_error(), passing the current function, file and line position.
			
 
				 
			
 
				 \fn int starpu_cuda_copy_async_sync (void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t ssize, cudaStream_t stream, enum cudaMemcpyKind kind)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Copy \p ssize bytes from the pointer \p src_ptr on \p src_node
			
 
				 to the pointer \p dst_ptr on \p dst_node. The function first tries to
			
 
				 copy the data asynchronous (unless stream is <c>NULL</c>). If the
			
@@ -50,13 +50,13 @@ asynchronous launch was successfull. It returns 0 if the synchronous
 
				 copy was successful, or fails otherwise.
			
 
				 
			
 
				 \fn void starpu_cuda_set_device(unsigned devid)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Calls cudaSetDevice(devid) or cudaGLSetGLDevice(devid),
			
 
				 according to whether \p devid is among the field
			
 
				 starpu_conf::cuda_opengl_interoperability.
			
 
				 
			
 
				 \fn void starpu_cublas_init(void)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 This function initializes CUBLAS on every CUDA device. The
			
 
				 CUBLAS library must be initialized prior to any CUBLAS call. Calling
			
 
				 starpu_cublas_init() will initialize CUBLAS on every CUDA device
			
@@ -64,16 +64,16 @@ controlled by StarPU. This call blocks until CUBLAS has been properly
 
				 initialized on every device.
			
 
				 
			
 
				 \fn void starpu_cublas_shutdown(void)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 This function synchronously deinitializes the CUBLAS library on
			
 
				 every CUDA device.
			
 
				 
			
 
				 \fn void starpu_cublas_report_error(const char *func, const char *file, int line, cublasStatus status)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Report a cublas error.
			
 
				 
			
 
				 \def STARPU_CUBLAS_REPORT_ERROR (cublasStatus status)
			
 
				-\ingroup CUDA_Extensions
			
 
				+\ingroup API_CUDA_Extensions
			
 
				 Calls starpu_cublas_report_error(), passing the current
			
 
				 function, file and line position.
			
 
				 
			
--- a/doc/doxygen/chapters/api/data_interfaces.doxy
+++ b/doc/doxygen/chapters/api/data_interfaces.doxy
@@ -6,11 +6,11 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Data_Interfaces Data Interfaces
			
 
				+/*! \defgroup API_Data_Interfaces Data Interfaces
			
 
				 
			
 
				 \struct starpu_data_interface_ops
			
 
				 \brief Per-interface data transfer methods.
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_data_interface_ops::register_data_handle
			
 
				 Register an existing interface into a data handle.
			
 
				 \var starpu_data_interface_ops::allocate_data_on_node
			
@@ -54,7 +54,7 @@ If the any_to_any method is
 
				 provided, it will be used by default if no more specific method is
			
 
				 provided. It can still be useful to provide more specific method in
			
 
				 case of e.g. available particular CUDA or OpenCL support.
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_data_copy_methods::ram_to_ram
			
 
				 Define how to copy data from the src_interface interface on the
			
 
				 src_node CPU node to the dst_interface interface on the dst_node CPU
			
@@ -146,7 +146,7 @@ starpu_interface_copy() calls has returned -EAGAIN (i.e. at least some
 
				 transfer is still ongoing), and return 0 otherwise.
			
 
				 
			
 
				 @name Registering Data
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 There are several ways to register a memory region so that it can be
			
 
				 managed by StarPU. The functions below allow the registration of
			
@@ -154,7 +154,7 @@ vectors, 2D matrices, 3D matrices as well as BCSR and CSR sparse
 
				 matrices.
			
 
				 
			
 
				 \fn void starpu_void_data_register(starpu_data_handle_t *handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register a void interface. There is no data really associated
			
 
				 to that interface, but it may be used as a synchronization mechanism.
			
 
				 It also permits to express an abstract piece of data that is managed
			
@@ -163,7 +163,7 @@ concurrent execution of different tasks accessing the same <c>void</c> data
 
				 in read-write concurrently. 
			
 
				 
			
 
				 \fn void starpu_variable_data_register(starpu_data_handle_t *handle, unsigned home_node, uintptr_t ptr, size_t size)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register the \p size byte element pointed to by \p ptr, which is
			
 
				 typically a scalar, and initialize \p handle to represent this data item.
			
 
				 
			
@@ -175,7 +175,7 @@ starpu_variable_data_register(&var_handle, 0, (uintptr_t)&var, sizeof(var));
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_vector_data_register(starpu_data_handle_t *handle, unsigned home_node, uintptr_t ptr, uint32_t nx, size_t elemsize)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register the \p nx elemsize-byte elements pointed to by \p ptr and initialize \p handle to represent it.
			
 
				 
			
 
				 Here an example of how to use the function.
			
@@ -186,7 +186,7 @@ starpu_vector_data_register(&vector_handle, 0, (uintptr_t)vector, NX, sizeof(vec
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_matrix_data_register(starpu_data_handle_t *handle, unsigned home_node, uintptr_t ptr, uint32_t ld, uint32_t nx, uint32_t ny, size_t elemsize)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register the \p nx x \p  ny 2D matrix of \p elemsize-byte elements pointed
			
 
				 by \p ptr and initialize \p handle to represent it. \p ld specifies the number
			
 
				 of elements between rows. a value greater than \p nx adds padding, which
			
@@ -201,7 +201,7 @@ starpu_matrix_data_register(&matrix_handle, 0, (uintptr_t)matrix, width, width,
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_block_data_register(starpu_data_handle_t *handle, unsigned home_node, uintptr_t ptr, uint32_t ldy, uint32_t ldz, uint32_t nx, uint32_t ny, uint32_t nz, size_t elemsize)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register the \p nx x \p ny x \p nz 3D matrix of \p elemsize byte elements
			
 
				 pointed by \p ptr and initialize \p handle to represent it. Again, \p ldy and
			
 
				 \p ldz specify the number of elements between rows and between z planes.
			
@@ -215,7 +215,7 @@ starpu_block_data_register(&block_handle, 0, (uintptr_t)block, nx, nx*ny, nx, ny
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_bcsr_data_register(starpu_data_handle_t *handle, unsigned home_node, uint32_t nnz, uint32_t nrow, uintptr_t nzval, uint32_t *colind, uint32_t *rowptr, uint32_t firstentry, uint32_t r, uint32_t c, size_t elemsize)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 This variant of starpu_data_register() uses the BCSR (Blocked
			
 
				 Compressed Sparse Row Representation) sparse matrix interface.
			
 
				 Register the sparse matrix made of \p nnz non-zero blocks of elements of
			
@@ -227,22 +227,22 @@ blocks), \p colind[i] is the block-column index for block i in \p nzval,
 
				 (usually 0 or 1). 
			
 
				 
			
 
				 \fn void starpu_csr_data_register(starpu_data_handle_t *handle, unsigned home_node, uint32_t nnz, uint32_t nrow, uintptr_t nzval, uint32_t *colind, uint32_t *rowptr, uint32_t firstentry, size_t elemsize)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 This variant of starpu_data_register() uses the CSR (Compressed
			
 
				 Sparse Row Representation) sparse matrix interface. TODO
			
 
				 
			
 
				 \fn void starpu_coo_data_register(starpu_data_handle_t *handleptr, unsigned home_node, uint32_t nx, uint32_t ny, uint32_t n_values, uint32_t *columns, uint32_t *rows, uintptr_t values, size_t elemsize);
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Register the \p nx x \p ny 2D matrix given in the COO format, using the
			
 
				 \p columns, \p rows, \p values arrays, which must have \p n_values elements of
			
 
				 size \p elemsize. Initialize \p handleptr.
			
 
				 
			
 
				 \fn void *starpu_data_get_interface_on_node(starpu_data_handle_t handle, unsigned memory_node)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the interface associated with \p handle on \p memory_node.
			
 
				 
			
 
				 @name Accessing Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 Each data interface is provided with a set of field access functions.
			
 
				 The ones using a void * parameter aimed to be used in codelet
			
@@ -250,27 +250,27 @@ implementations (see for example the code in \ref
 
				 Vector_Scaling_Using_StarPU_API).
			
 
				 
			
 
				 \fn void *starpu_data_handle_to_pointer(starpu_data_handle_t handle, unsigned node)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the pointer associated with \p handle on node \p node or <c>NULL</c>
			
 
				 if handle’s interface does not support this operation or data for this
			
 
				 \p handle is not allocated on that \p node.
			
 
				 
			
 
				 \fn void *starpu_data_get_local_ptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the local pointer associated with \p handle or <c>NULL</c> if
			
 
				 \p handle’s interface does not have data allocated locally 
			
 
				 
			
 
				 \fn enum starpu_data_interface_id starpu_data_get_interface_id(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the unique identifier of the interface associated with
			
 
				 the given \p handle.
			
 
				 
			
 
				 \fn size_t starpu_data_get_size(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the data associated with \p handle.
			
 
				 
			
 
				 \fn int starpu_data_pack(starpu_data_handle_t handle, void **ptr, starpu_ssize_t *count)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Execute the packing operation of the interface of the data
			
 
				 registered at \p handle (see starpu_data_interface_ops). This
			
 
				 packing operation must allocate a buffer large enough at \p ptr and copy
			
@@ -281,7 +281,7 @@ the size of the buffer which would have been allocated. The special
 
				 value -1 indicates the size is yet unknown.
			
 
				 
			
 
				 \fn int starpu_data_unpack(starpu_data_handle_t handle, void *ptr, size_t count)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Unpack in handle the data located at \p ptr of size \p count as
			
 
				 described by the interface of the data. The interface registered at
			
 
				 \p handle must define a unpacking operation (see
			
@@ -289,11 +289,11 @@ starpu_data_interface_ops). The memory at the address \p ptr is freed
 
				 after calling the data unpacking operation.
			
 
				 
			
 
				 @name Accessing Variable Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_variable_interface
			
 
				 \brief Variable interface for a single data (not a vector, a matrix, a list, ...)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_variable_interface::id
			
 
				 Identifier of the interface
			
 
				 \var starpu_variable_interface::ptr
			
@@ -306,38 +306,38 @@ offset in the variable
 
				 size of the variable
			
 
				 
			
 
				 \fn size_t starpu_variable_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the variable designated by \p handle.
			
 
				 
			
 
				 \fn uintptr_t starpu_variable_get_local_ptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the variable designated by \p handle.
			
 
				 
			
 
				 \def STARPU_VARIABLE_GET_PTR(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the variable designated by \p interface.
			
 
				 
			
 
				 \def STARPU_VARIABLE_GET_ELEMSIZE(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the variable designated by \p interface.
			
 
				 
			
 
				 \def STARPU_VARIABLE_GET_DEV_HANDLE(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the variable designated by
			
 
				 \p interface, to be used on OpenCL. The offset documented below has to be
			
 
				 used in addition to this.
			
 
				 
			
 
				 \def STARPU_VARIABLE_GET_OFFSET()
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the variable designated by \p interface, to
			
 
				 be used with the device handle.
			
 
				 
			
 
				 @name Accessing Vector Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_vector_interface
			
 
				 \brief Vector interface
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_vector_interface::id
			
 
				 Identifier of the interface
			
 
				 \var starpu_vector_interface::ptr
			
@@ -352,50 +352,50 @@ number of elements on the x-axis of the vector
 
				 size of the elements of the vector
			
 
				 
			
 
				 \fn uint32_t starpu_vector_get_nx(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements registered into the array designated by \p handle.
			
 
				 
			
 
				 \fn size_t starpu_vector_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of each element of the array designated by \p handle.
			
 
				 
			
 
				 \fn uintptr_t starpu_vector_get_local_ptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the local pointer associated with \p handle.
			
 
				 
			
 
				 \def STARPU_VECTOR_GET_PTR(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the array designated by \p interface, valid on
			
 
				 CPUs and CUDA only. For OpenCL, the device handle and offset need to
			
 
				 be used instead.
			
 
				 
			
 
				 \def STARPU_VECTOR_GET_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the array designated by \p interface,
			
 
				 to be used on OpenCL. the offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_VECTOR_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the array designated by \p interface, to be
			
 
				 used with the device handle.
			
 
				 
			
 
				 \def STARPU_VECTOR_GET_NX(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements registered into the array
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_VECTOR_GET_ELEMSIZE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of each element of the array designated by
			
 
				 \p interface.
			
 
				 
			
 
				 @name Accessing Matrix Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_matrix_interface
			
 
				 \brief Matrix interface for dense matrices
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_matrix_interface::id
			
 
				 Identifier of the interface
			
 
				 \var starpu_matrix_interface::ptr
			
@@ -415,72 +415,72 @@ starpu_matrix_interface::nx when there is no padding.
 
				 size of the elements of the matrix
			
 
				 
			
 
				 \fn uint32_t starpu_matrix_get_nx(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the x-axis of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_matrix_get_ny(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the y-axis of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_matrix_get_local_ld(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each row of the matrix
			
 
				 designated by \p handle. Maybe be equal to nx when there is no padding.
			
 
				 
			
 
				 \fn uintptr_t starpu_matrix_get_local_ptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the local pointer associated with \p handle.
			
 
				 
			
 
				 \fn size_t starpu_matrix_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements registered into the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_PTR(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the matrix designated by \p interface, valid
			
 
				 on CPUs and CUDA devices only. For OpenCL devices, the device handle
			
 
				 and offset need to be used instead.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the matrix designated by \p interface,
			
 
				 to be used on OpenCL. The offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the matrix designated by \p interface, to be
			
 
				 used with the device handle.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_NX(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the x-axis of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_NY(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the y-axis of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_LD(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each row of the matrix
			
 
				 designated by \p interface. May be equal to nx when there is no padding.
			
 
				 
			
 
				 \def STARPU_MATRIX_GET_ELEMSIZE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements registered into the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 @name Accessing Block Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_block_interface
			
 
				 \brief Block interface for 3D dense blocks
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \struct starpu_block_interface::id
			
 
				 identifier of the interface
			
 
				 \var starpu_block_interface::ptr
			
@@ -503,92 +503,92 @@ number of elements between two planes
 
				 size of the elements of the block.
			
 
				 
			
 
				 \fn uint32_t starpu_block_get_nx(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the x-axis of the block
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_block_get_ny(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the y-axis of the block
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_block_get_nz(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the z-axis of the block
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_block_get_local_ldy(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each row of the block
			
 
				 designated by \p handle, in the format of the current memory node.
			
 
				 
			
 
				 \fn uint32_t starpu_block_get_local_ldz(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each z plane of the block
			
 
				 designated by \p handle, in the format of the current memory node.
			
 
				 
			
 
				 \fn uintptr_t starpu_block_get_local_ptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the local pointer associated with \p handle.
			
 
				 
			
 
				 \fn size_t starpu_block_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements of the block designated by
			
 
				 \p handle.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_PTR(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the block designated by \p interface.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the block designated by \p interface,
			
 
				 to be used on OpenCL. The offset document below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the block designated by \p interface, to be
			
 
				 used with the device handle.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_NX(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the x-axis of the block
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_NY(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the y-axis of the block
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_NZ(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the z-axis of the block
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_LDY(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each row of the block
			
 
				 designated by \p interface. May be equal to nx when there is no padding.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_LDZ(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements between each z plane of the block
			
 
				 designated by \p interface. May be equal to nx*ny when there is no
			
 
				 padding.
			
 
				 
			
 
				 \def STARPU_BLOCK_GET_ELEMSIZE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements of the block designated by
			
 
				 \p interface.
			
 
				 
			
 
				 @name Accessing BCSR Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_bcsr_interface
			
 
				 \brief BCSR interface for sparse matrices (blocked compressed sparse
			
 
				 row representation)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_bcsr_interface::id
			
 
				 Identifier of the interface
			
 
				 \var starpu_bcsr_interface::nnz
			
@@ -611,97 +611,97 @@ size of the blocks
 
				 size of the elements of the matrix
			
 
				 
			
 
				 \fn uint32_t starpu_bcsr_get_nnz(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of non-zero elements in the matrix designated
			
 
				 by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_bcsr_get_nrow(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of rows (in terms of blocks of size r*c) in
			
 
				 the matrix designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_bcsr_get_firstentry(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the index at which all arrays (the column indexes, the
			
 
				 row pointers...) of the matrix desginated by \p handle.
			
 
				 
			
 
				 \fn uintptr_t starpu_bcsr_get_local_nzval(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the non-zero values of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t * starpu_bcsr_get_local_colind(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the column index, which holds the positions
			
 
				 of the non-zero entries in the matrix designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t * starpu_bcsr_get_local_rowptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the row pointer array of the matrix designated by
			
 
				 \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_bcsr_get_r(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of rows in a block.
			
 
				 
			
 
				 \fn uint32_t starpu_bcsr_get_c(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the numberof columns in a block.
			
 
				 
			
 
				 \fn size_t starpu_bcsr_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements in the matrix designated by
			
 
				 \p handle.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_NNZ(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of non-zero values in the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_NZVAL(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the non-zero values of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_NZVAL_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the array of non-zero values in the
			
 
				 matrix designated by \p interface. The offset documented below has to be
			
 
				 used in addition to this.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_COLIND(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the column index of the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_COLIND_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the column index of the matrix
			
 
				 designated by \p interface. The offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_ROWPTR(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the row pointer array of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_ROWPTR_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the row pointer array of the matrix
			
 
				 designated by \p interface. The offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_BCSR_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the arrays (coling, rowptr, nzval) of the
			
 
				 matrix designated by \p interface, to be used with the device handles.
			
 
				 
			
 
				 @name Accessing CSR Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_csr_interface
			
 
				 \brief CSR interface for sparse matrices (compressed sparse row representation)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_csr_interface::id
			
 
				 Identifier of the interface
			
 
				 \var starpu_csr_interface::nnz
			
@@ -720,104 +720,104 @@ k for k-based indexing (0 or 1 usually). also useful when partitionning the matr
 
				 size of the elements of the matrix
			
 
				 
			
 
				 \fn uint32_t starpu_csr_get_nnz(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of non-zero values in the matrix designated
			
 
				 by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_csr_get_nrow(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the row pointer array of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t starpu_csr_get_firstentry(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the index at which all arrays (the column indexes, the
			
 
				 row pointers...) of the matrix designated by \p handle.
			
 
				 
			
 
				 \fn uintptr_t starpu_csr_get_local_nzval(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a local pointer to the non-zero values of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t * starpu_csr_get_local_colind(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a local pointer to the column index of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn uint32_t * starpu_csr_get_local_rowptr(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a local pointer to the row pointer array of the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \fn size_t starpu_csr_get_elemsize(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements registered into the matrix
			
 
				 designated by \p handle.
			
 
				 
			
 
				 \def STARPU_CSR_GET_NNZ(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of non-zero values in the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_NROW(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the row pointer array of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_NZVAL(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the non-zero values of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_NZVAL_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the array of non-zero values in the
			
 
				 matrix designated by \p interface. The offset documented below has to be
			
 
				 used in addition to this.
			
 
				 
			
 
				 \def STARPU_CSR_GET_COLIND(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the column index of the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_COLIND_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the column index of the matrix
			
 
				 designated by \p interface. The offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_CSR_GET_ROWPTR(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the row pointer array of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_CSR_GET_ROWPTR_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the row pointer array of the matrix
			
 
				 designated by \p interface. The offset documented below has to be used in
			
 
				 addition to this.
			
 
				 
			
 
				 \def STARPU_CSR_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the arrays (colind, rowptr, nzval) of the
			
 
				 matrix designated by \p interface, to be used with the device handles.
			
 
				 
			
 
				 \def STARPU_CSR_GET_FIRSTENTRY(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the index at which all arrays (the column indexes, the
			
 
				 row pointers...) of the \p interface start.
			
 
				 
			
 
				 \def STARPU_CSR_GET_ELEMSIZE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements registered into the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 @name Accessing COO Data Interfaces
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 \struct starpu_coo_interface
			
 
				 \brief COO Matrices
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 \var starpu_coo_interface::id
			
 
				 identifier of the interface
			
 
				 \var starpu_coo_interface::columns
			
@@ -836,81 +836,81 @@ number of values registered in the matrix
 
				 size of the elements of the matrix
			
 
				 
			
 
				 \def STARPU_COO_GET_COLUMNS(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the column array of the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_COLUMNS_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the column array of the matrix
			
 
				 designated by \p interface, to be used on OpenCL. The offset documented
			
 
				 below has to be used in addition to this.
			
 
				 
			
 
				 \def STARPU_COO_GET_ROWS(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the rows array of the matrix designated by
			
 
				 \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_ROWS_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the row array of the matrix
			
 
				 designated by \p interface, to be used on OpenCL. The offset documented
			
 
				 below has to be used in addition to this.
			
 
				 
			
 
				 \def STARPU_COO_GET_VALUES(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a pointer to the values array of the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_VALUES_DEV_HANDLE(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return a device handle for the value array of the matrix
			
 
				 designated by \p interface, to be used on OpenCL. The offset documented
			
 
				 below has to be used in addition to this.
			
 
				 
			
 
				 \def STARPU_COO_GET_OFFSET(void *interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the offset in the arrays of the COO matrix designated by
			
 
				 \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_NX(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the x-axis of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_NY(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of elements on the y-axis of the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_NVALUES(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the number of values registered in the matrix designated
			
 
				 by \p interface.
			
 
				 
			
 
				 \def STARPU_COO_GET_ELEMSIZE(interface)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the size of the elements registered into the matrix
			
 
				 designated by \p interface.
			
 
				 
			
 
				 @name Defining Interface
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 
			
 
				 Applications can provide their own interface as shown in \ref
			
 
				 Defining_a_New_Data_Interface.
			
 
				 
			
 
				 \fn uintptr_t starpu_malloc_on_node(unsigned dst_node, size_t size)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Allocate \p size bytes on node \p dst_node. This returns 0 if
			
 
				 allocation failed, the allocation method should then return <c>-ENOMEM</c> as
			
 
				 allocated size.
			
 
				 
			
 
				 \fn void starpu_free_on_node(unsigned dst_node, uintptr_t addr, size_t size)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Free \p addr of \p size bytes on node \p dst_node.
			
 
				 
			
 
				 \fn int starpu_interface_copy(uintptr_t src, size_t src_offset, unsigned src_node, uintptr_t dst, size_t dst_offset, unsigned dst_node, size_t size, void *async_data)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Copy \p size bytes from byte offset \p src_offset of \p src on \p src_node
			
 
				 to byte offset \p dst_offset of \p dst on \p dst_node. This is to be used in
			
 
				 the any_to_any() copy method, which is provided with the async_data to
			
@@ -918,28 +918,28 @@ be passed to starpu_interface_copy(). this returns <c>-EAGAIN</c> if the
 
				 transfer is still ongoing, or 0 if the transfer is already completed.
			
 
				 
			
 
				 \fn uint32_t starpu_hash_crc32c_be_n(const void *input, size_t n, uint32_t inputcrc)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Compute the CRC of a byte buffer seeded by the \p inputcrc
			
 
				 <em>current state</em>. The return value should be considered as the new
			
 
				 <em>current state</em> for future CRC computation. This is used for computing
			
 
				 data size footprint.
			
 
				 
			
 
				 \fn uint32_t starpu_hash_crc32c_be(uint32_t input, uint32_t inputcrc)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Compute the CRC of a 32bit number seeded by the \p inputcrc
			
 
				 <em>current state</em>. The return value should be considered as the new
			
 
				 <em>current state</em> for future CRC computation. This is used for computing
			
 
				 data size footprint.
			
 
				 
			
 
				 \fn uint32_t starpu_hash_crc32c_string(const char *str, uint32_t inputcrc)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Compute the CRC of a string seeded by the \p inputcrc <em>current
			
 
				 state</em>. The return value should be considered as the new <em>current
			
 
				 state</em> for future CRC computation. This is used for computing data
			
 
				 size footprint.
			
 
				 
			
 
				 \fn int starpu_data_interface_get_next_id(void)
			
 
				-\ingroup Data_Interfaces
			
 
				+\ingroup API_Data_Interfaces
			
 
				 Return the next available id for a newly created data interface
			
 
				 (\ref Defining_a_New_Data_Interface).
			
 
				 
			
--- a/doc/doxygen/chapters/api/data_management.doxy
+++ b/doc/doxygen/chapters/api/data_management.doxy
@@ -6,7 +6,7 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Data_Management Data Management
			
 
				+/*! \defgroup API_Data_Management Data Management
			
 
				 
			
 
				 \brief This section describes the data management facilities provided
			
 
				 by StarPU. We show how to use existing data interfaces in \ref
			
@@ -14,7 +14,7 @@ Data_Interfaces, but developers can design their own data interfaces
 
				 if required.
			
 
				 
			
 
				 \typedef starpu_data_handle_t
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 StarPU uses ::starpu_data_handle_t as an opaque handle to
			
 
				 manage a piece of data. Once a piece of data has been registered to
			
 
				 StarPU, it is associated to a starpu_data_handle_t which keeps track
			
@@ -22,22 +22,22 @@ of the state of the piece of data over the entire machine, so that we
 
				 can maintain data consistency and locate data replicates for instance.
			
 
				 
			
 
				 \enum starpu_data_access_mode
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This datatype describes a data access mode.
			
 
				 \var starpu_data_access_mode::STARPU_NONE
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 TODO!
			
 
				 \var starpu_data_access_mode::STARPU_R
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 read-only mode.
			
 
				 \var starpu_data_access_mode::STARPU_W
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 write-only mode.
			
 
				 \var starpu_data_access_mode::STARPU_RW
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 read-write mode. This is equivalent to ::STARPU_R|::STARPU_W
			
 
				 \var starpu_data_access_mode::STARPU_SCRATCH
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 A temporary buffer is allocated for the task, but StarPU does not
			
 
				 enforce data consistency---i.e. each device has its own buffer,
			
 
				 independently from each other (even for CPUs), and no data transfer is
			
@@ -51,14 +51,14 @@ value.  For now, data to be used in ::STARPU_SCRATCH mode should be
 
				 registered with node <c>-1</c> and a <c>NULL</c> pointer, since the
			
 
				 value of the provided buffer is simply ignored for now.
			
 
				 \var starpu_data_access_mode::STARPU_REDUX
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 todo
			
 
				 \var starpu_data_access_mode::STARPU_COMMUTE
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 todo
			
 
				 
			
 
				 @name Basic Data Management API
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 
			
 
				 Data management is done at a high-level in StarPU: rather than
			
 
				 accessing a mere list of contiguous buffers, the tasks may manipulate
			
@@ -89,7 +89,7 @@ data initially resides (we also call this memory node the home node of
 
				 a piece of data).
			
 
				 
			
 
				 \fn void starpu_data_register(starpu_data_handle_t *handleptr, unsigned home_node, void *data_interface, struct starpu_data_interface_ops *ops)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Register a piece of data into the handle located at the
			
 
				 \p handleptr address. The \p data_interface buffer contains the initial
			
 
				 description of the data in the \p home_node. The \p ops argument is a
			
@@ -106,12 +106,12 @@ functions (e.g. starpu_vector_data_register() or
 
				 starpu_matrix_data_register()).
			
 
				 
			
 
				 \fn void starpu_data_register_same(starpu_data_handle_t *handledst, starpu_data_handle_t handlesrc)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Register a new piece of data into the handle \p handledst with the
			
 
				 same interface as the handle \p handlesrc.
			
 
				 
			
 
				 \fn void starpu_data_unregister(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This function unregisters a data handle from StarPU. If the
			
 
				 data was automatically allocated by StarPU because the home node was
			
 
				 -1, all automatically allocated buffers are freed. Otherwise, a valid
			
@@ -122,30 +122,30 @@ to update the value of the data in the home node, we can use
 
				 the function starpu_data_unregister_no_coherency() instead.
			
 
				 
			
 
				 \fn void starpu_data_unregister_no_coherency(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This is the same as starpu_data_unregister(), except that
			
 
				 StarPU does not put back a valid copy into the home node, in the
			
 
				 buffer that was initially registered.
			
 
				 
			
 
				 \fn void starpu_data_unregister_submit(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Destroy the data handle once it is not needed anymore by any
			
 
				 submitted task. No coherency is assumed.
			
 
				 
			
 
				 \fn void starpu_data_invalidate(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Destroy all replicates of the data handle immediately. After
			
 
				 data invalidation, the first access to the handle must be performed in
			
 
				 write-only mode. Accessing an invalidated data in read-mode results in
			
 
				 undefined behaviour.
			
 
				 
			
 
				 \fn void starpu_data_invalidate_submit(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Submits invalidation of the data handle after completion of
			
 
				 previously submitted tasks.
			
 
				 
			
 
				 \fn void starpu_data_set_wt_mask(starpu_data_handle_t handle, uint32_t wt_mask)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This function sets the write-through mask of a given data (and
			
 
				 its children), i.e. a bitmask of nodes where the data should be always
			
 
				 replicated after modification. It also prevents the data from being
			
@@ -155,7 +155,7 @@ instance a <c>1<<0</c> write-through mask means that the CUDA workers
 
				 will commit their changes in main memory (node 0).
			
 
				 
			
 
				 \fn int starpu_data_prefetch_on_node(starpu_data_handle_t handle, unsigned node, unsigned async)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Issue a prefetch request for a given data to a given node, i.e.
			
 
				 requests that the data be replicated to the given node, so that it is
			
 
				 available there for tasks. If the \p async parameter is 0, the call will
			
@@ -164,35 +164,35 @@ soon as the request is scheduled (which may however have to wait for a
 
				 task completion).
			
 
				 
			
 
				 \fn starpu_data_handle_t starpu_data_lookup(const void *ptr)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Return the handle corresponding to the data pointed to by the \p ptr host pointer.
			
 
				 
			
 
				 \fn int starpu_data_request_allocation(starpu_data_handle_t handle, unsigned node)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Explicitly ask StarPU to allocate room for a piece of data on
			
 
				 the specified memory node.
			
 
				 
			
 
				 \fn void starpu_data_query_status(starpu_data_handle_t handle, int memory_node, int *is_allocated, int *is_valid, int *is_requested)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Query the status of \p handle on the specified \p memory_node.
			
 
				 
			
 
				 \fn void starpu_data_advise_as_important(starpu_data_handle_t handle, unsigned is_important)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This function allows to specify that a piece of data can be
			
 
				 discarded without impacting the application.
			
 
				 
			
 
				 \fn void starpu_data_set_reduction_methods(starpu_data_handle_t handle, struct starpu_codelet *redux_cl, struct starpu_codelet *init_cl)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This sets the codelets to be used for \p handle when it is
			
 
				 accessed in STARPU_REDUX mode. Per-worker buffers will be initialized with
			
 
				 the \p init_cl codelet, and reduction between per-worker buffers will be
			
 
				 done with the \p redux_cl codelet.
			
 
				 
			
 
				 @name Access registered data from the application
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 
			
 
				 \fn int starpu_data_acquire(starpu_data_handle_t handle, enum starpu_data_access_mode mode)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 The application must call this function prior to accessing
			
 
				 registered data from main memory outside tasks. StarPU ensures that
			
 
				 the application will get an up-to-date copy of the data in main memory
			
@@ -210,7 +210,7 @@ callbacks (in that case, starpu_data_acquire() returns <c>-EDEADLK</c>). Upon
 
				 successful completion, this function returns 0.
			
 
				 
			
 
				 \fn int starpu_data_acquire_cb(starpu_data_handle_t handle, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 Asynchronous equivalent of starpu_data_acquire(). When the data
			
 
				 specified in \p handle is available in the appropriate access
			
 
				 mode, the \p callback function is executed. The application may access
			
@@ -223,18 +223,18 @@ non-blocking and may be called from task callbacks. Upon successful
 
				 completion, this function returns 0.
			
 
				 
			
 
				 \fn int starpu_data_acquire_on_node(starpu_data_handle_t handle, unsigned node, enum starpu_data_access_mode mode)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This is the same as starpu_data_acquire(), except that the data
			
 
				 will be available on the given memory node instead of main memory.
			
 
				 
			
 
				 \fn int starpu_data_acquire_on_node_cb(starpu_data_handle_t handle, unsigned node, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This is the same as starpu_data_acquire_cb(), except that the
			
 
				 data will be available on the given memory node instead of main
			
 
				 memory.
			
 
				 
			
 
				 \def STARPU_DATA_ACQUIRE_CB(starpu_data_handle_t handle, enum starpu_data_access_mode mode, code)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 STARPU_DATA_ACQUIRE_CB() is the same as starpu_data_acquire_cb(),
			
 
				 except that the code to be executed in a callback is directly provided
			
 
				 as a macro parameter, and the data \p handle is automatically released
			
@@ -243,13 +243,13 @@ value of some registered data. This is non-blocking too and may be
 
				 called from task callbacks.
			
 
				 
			
 
				 \fn void starpu_data_release(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This function releases the piece of data acquired by the
			
 
				 application either by starpu_data_acquire() or by
			
 
				 starpu_data_acquire_cb().
			
 
				 
			
 
				 \fn void starpu_data_release_on_node(starpu_data_handle_t handle, unsigned node)
			
 
				-\ingroup Data_Management
			
 
				+\ingroup API_Data_Management
			
 
				 This is the same as starpu_data_release(), except that the data
			
 
				 will be available on the given memory \p node instead of main memory.
			
 
				 
			
--- a/doc/doxygen/chapters/api/data_partition.doxy
+++ b/doc/doxygen/chapters/api/data_partition.doxy
@@ -6,11 +6,11 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Data_Partition Data Partition
			
 
				+/*! \defgroup API_Data_Partition Data Partition
			
 
				 
			
 
				 \struct starpu_data_filter
			
 
				 \brief The filter structure describes a data partitioning operation, to be given to the starpu_data_partition() function.
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 \var starpu_data_filter::filter_func
			
 
				 This function fills the child_interface structure with interface
			
 
				 information for the id-th child of the parent father_interface (among
			
@@ -31,10 +31,10 @@ Allow to define an additional pointer parameter for the filter
 
				 function, such as the sizes of the different parts.
			
 
				 
			
 
				 @name Basic API
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 
			
 
				 \fn void starpu_data_partition(starpu_data_handle_t initial_handle, struct starpu_data_filter *f)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This requests partitioning one StarPU data initial_handle into
			
 
				 several subdata according to the filter \p f.
			
 
				 
			
@@ -50,7 +50,7 @@ starpu_data_partition(A_handle, &f);
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_data_unpartition(starpu_data_handle_t root_data, unsigned gathering_node)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This unapplies one filter, thus unpartitioning the data. The
			
 
				 pieces of data are collected back into one big piece in the
			
 
				 \p gathering_node (usually 0). Tasks working on the partitioned data must
			
@@ -62,16 +62,16 @@ starpu_data_unpartition(A_handle, 0);
 
				 \endcode
			
 
				 
			
 
				 \fn int starpu_data_get_nb_children(starpu_data_handle_t handle)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This function returns the number of children.
			
 
				 
			
 
				 \fn starpu_data_handle_t starpu_data_get_child(starpu_data_handle_t handle, unsigned i)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Return the ith child of the given \p handle, which must have been
			
 
				 partitionned beforehand.
			
 
				 
			
 
				 \fn starpu_data_handle_t starpu_data_get_sub_data (starpu_data_handle_t root_data, unsigned depth, ... )
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 After partitioning a StarPU data by applying a filter,
			
 
				 starpu_data_get_sub_data() can be used to get handles for each of the
			
 
				 data portions. \p root_data is the parent data that was partitioned.
			
@@ -86,24 +86,24 @@ h = starpu_data_get_sub_data(A_handle, 1, taskx);
 
				 \endcode
			
 
				 
			
 
				 \fn starpu_data_handle_t starpu_data_vget_sub_data(starpu_data_handle_t root_data, unsigned depth, va_list pa)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This function is similar to starpu_data_get_sub_data() but uses a
			
 
				 va_list for the parameter list.
			
 
				 
			
 
				 \fn void starpu_data_map_filters(starpu_data_handle_t root_data, unsigned nfilters, ...)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Applies \p nfilters filters to the handle designated by
			
 
				 \p root_handle recursively. \p nfilters pointers to variables of the type
			
 
				 starpu_data_filter should be given.
			
 
				 
			
 
				 \fn void starpu_data_vmap_filters(starpu_data_handle_t root_data, unsigned nfilters, va_list pa)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Applies \p nfilters filters to the handle designated by
			
 
				 \p root_handle recursively. It uses a va_list of pointers to variables of
			
 
				 the type starpu_data_filter.
			
 
				 
			
 
				 @name Predefined Vector Filter Functions
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 
			
 
				 This section gives a partial list of the predefined partitioning
			
 
				 functions for vector data. Examples on how to use them are shown in
			
@@ -111,13 +111,13 @@ functions for vector data. Examples on how to use them are shown in
 
				 starpu_data_filters.h.
			
 
				 
			
 
				 \fn void starpu_vector_filter_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Return in \p child_interface the \p id th element of the vector
			
 
				 represented by \p father_interface once partitioned in \p nparts chunks of
			
 
				 equal size.
			
 
				 
			
 
				 \fn void starpu_vector_filter_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Return in \p child_interface the \p id th element of the vector
			
 
				 represented by \p father_interface once partitioned in \p nparts chunks of
			
 
				 equal size with a shadow border <c>filter_arg_ptr</c>, thus getting a vector
			
@@ -128,7 +128,7 @@ enforced for the shadowed parts. An usage example is available in
 
				 examples/filters/shadow.c
			
 
				 
			
 
				 \fn void starpu_vector_filter_list(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Return in \p child_interface the \p id th element of the vector
			
 
				 represented by \p father_interface once partitioned into \p nparts chunks
			
 
				 according to the <c>filter_arg_ptr</c> field of \p f. The
			
@@ -137,13 +137,13 @@ elements, each of which specifies the number of elements in each chunk
 
				 of the partition.
			
 
				 
			
 
				 \fn void starpu_vector_filter_divide_in_2(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 Return in \p child_interface the \p id th element of the vector
			
 
				 represented by \p father_interface once partitioned in <c>2</c> chunks of
			
 
				 equal size, ignoring nparts. Thus, \p id must be <c>0</c> or <c>1</c>.
			
 
				 
			
 
				 @name Predefined Matrix Filter Functions
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 
			
 
				 This section gives a partial list of the predefined partitioning
			
 
				 functions for matrix data. Examples on how to use them are shown in
			
@@ -151,13 +151,13 @@ functions for matrix data. Examples on how to use them are shown in
 
				 starpu_data_filters.h.
			
 
				 
			
 
				 \fn void starpu_matrix_filter_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a dense Matrix along the x dimension, thus
			
 
				 getting (x/\p nparts ,y) matrices. If \p nparts does not divide x, the
			
 
				 last submatrix contains the remainder.
			
 
				 
			
 
				 \fn void starpu_matrix_filter_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a dense Matrix along the x dimension, with a
			
 
				 shadow border <c>filter_arg_ptr</c>, thus getting ((x-2*shadow)/\p
			
 
				 nparts +2*shadow,y) matrices. If \p nparts does not divide x-2*shadow,
			
@@ -167,13 +167,13 @@ shadowed parts. A usage example is available in
 
				 examples/filters/shadow2d.c
			
 
				 
			
 
				 \fn void starpu_matrix_filter_vertical_block(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a dense Matrix along the y dimension, thus
			
 
				 getting (x,y/\p nparts) matrices. If \p nparts does not divide y, the
			
 
				 last submatrix contains the remainder.
			
 
				 
			
 
				 \fn void starpu_matrix_filter_vertical_block_shadow(void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a dense Matrix along the y dimension, with a
			
 
				 shadow border <c>filter_arg_ptr</c>, thus getting
			
 
				 (x,(y-2*shadow)/\p nparts +2*shadow) matrices. If \p nparts does not
			
@@ -183,7 +183,7 @@ coherency is enforced for the shadowed parts. A usage example is
 
				 available in examples/filters/shadow2d.c 
			
 
				 
			
 
				 @name Predefined Block Filter Functions
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 
			
 
				 This section gives a partial list of the predefined partitioning
			
 
				 functions for block data. Examples on how to use them are shown in
			
@@ -192,13 +192,13 @@ starpu_data_filters.h. A usage example is available in
 
				 examples/filters/shadow3d.c
			
 
				 
			
 
				 \fn void starpu_block_filter_block (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the X dimension, thus getting
			
 
				 (x/\p nparts ,y,z) 3D matrices. If \p nparts does not divide x, the last
			
 
				 submatrix contains the remainder.
			
 
				 
			
 
				 \fn void starpu_block_filter_block_shadow (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the X dimension, with a
			
 
				 shadow border <p>filter_arg_ptr</p>, thus getting
			
 
				 ((x-2*shadow)/\p nparts +2*shadow,y,z) blocks. If \p nparts does not
			
@@ -207,13 +207,13 @@ This can only be used for read-only access, as no coherency is
 
				 enforced for the shadowed parts.
			
 
				 
			
 
				 \fn void starpu_block_filter_vertical_block (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the Y dimension, thus getting
			
 
				 (x,y/\p nparts ,z) blocks. If \p nparts does not divide y, the last
			
 
				 submatrix contains the remainder.
			
 
				 
			
 
				 \fn void starpu_block_filter_vertical_block_shadow (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the Y dimension, with a
			
 
				 shadow border <p>filter_arg_ptr</p>, thus getting
			
 
				 (x,(y-2*shadow)/\p nparts +2*shadow,z) 3D matrices. If \p nparts does not
			
@@ -222,13 +222,13 @@ This can only be used for read-only access, as no coherency is
 
				 enforced for the shadowed parts.
			
 
				 
			
 
				 \fn void starpu_block_filter_depth_block (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the Z dimension, thus getting
			
 
				 (x,y,z/\p nparts) blocks. If \p nparts does not divide z, the last
			
 
				 submatrix contains the remainder.
			
 
				 
			
 
				 \fn void starpu_block_filter_depth_block_shadow (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block along the Z dimension, with a
			
 
				 shadow border <p>filter_arg_ptr</p>, thus getting
			
 
				 (x,y,(z-2*shadow)/\p nparts +2*shadow) blocks. If \p nparts does not
			
@@ -237,7 +237,7 @@ This can only be used for read-only access, as no coherency is
 
				 enforced for the shadowed parts.
			
 
				 
			
 
				 @name Predefined BCSR Filter Functions
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 
			
 
				 This section gives a partial list of the predefined partitioning
			
 
				 functions for BCSR data. Examples on how to use them are shown in
			
@@ -245,11 +245,11 @@ functions for BCSR data. Examples on how to use them are shown in
 
				 starpu_data_filters.h.
			
 
				 
			
 
				 \fn void starpu_bcsr_filter_canonical_block (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block-sparse matrix into dense matrices.
			
 
				 
			
 
				 \fn void starpu_csr_filter_vertical_block (void *father_interface, void *child_interface, struct starpu_data_filter *f, unsigned id, unsigned nparts)
			
 
				-\ingroup Data_Partition
			
 
				+\ingroup API_Data_Partition
			
 
				 This partitions a block-sparse matrix into vertical
			
 
				 block-sparse matrices.
			
 
				 
			
--- a/doc/doxygen/chapters/api/expert_mode.doxy
+++ b/doc/doxygen/chapters/api/expert_mode.doxy
@@ -6,19 +6,19 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Expert_Mode Expert Mode
			
 
				+/*! \defgroup API_Expert_Mode Expert Mode
			
 
				 
			
 
				 \fn void starpu_wake_all_blocked_workers(void)
			
 
				-\ingroup Expert_Mode
			
 
				+\ingroup API_Expert_Mode
			
 
				 Wake all the workers, so they can inspect data requests and task
			
 
				 submissions again.
			
 
				 
			
 
				 \fn int starpu_progression_hook_register(unsigned (*func)(void *arg), void *arg)
			
 
				-\ingroup Expert_Mode
			
 
				+\ingroup API_Expert_Mode
			
 
				 Register a progression hook, to be called when workers are idle.
			
 
				 
			
 
				 \fn void starpu_progression_hook_deregister(int hook_id)
			
 
				-\ingroup Expert_Mode
			
 
				+\ingroup API_Expert_Mode
			
 
				 Unregister a given progression hook.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/explicit_dependencies.doxy
+++ b/doc/doxygen/chapters/api/explicit_dependencies.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Explicit_Dependencies Explicit Dependencies
			
 
				+/*! \defgroup API_Explicit_Dependencies Explicit Dependencies
			
 
				 
			
 
				 \fn void starpu_task_declare_deps_array(struct starpu_task *task, unsigned ndeps, struct starpu_task *task_array[])
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief Declare task dependencies between a \p task and an array of
			
 
				 tasks of length \p ndeps. This function must be called prior to the
			
 
				 submission of the task, but it may called after the submission or the
			
@@ -23,7 +23,7 @@ in this case, the dependencies are added. It is possible to have
 
				 redundancy in the task dependencies.
			
 
				 
			
 
				 \typedef starpu_tag_t
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This type defines a task logical identifer. It is possible to
			
 
				 associate a task with a unique <em>tag</em> chosen by the application,
			
 
				 and to express dependencies between tasks by the means of those tags.
			
@@ -34,7 +34,7 @@ will not be started until the tasks which holds the declared
 
				 dependency tags are completed.
			
 
				 
			
 
				 \fn void starpu_tag_declare_deps(starpu_tag_t id, unsigned ndeps, ...)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief Specify the dependencies of the task identified by tag \p id.
			
 
				 The first argument specifies the tag which is configured, the second
			
 
				 argument gives the number of tag(s) on which \p id depends. The
			
@@ -57,7 +57,7 @@ starpu_tag_declare_deps((starpu_tag_t)0x1, 2, (starpu_tag_t)0x32, (starpu_tag_t)
 
				 \endcode
			
 
				 
			
 
				 \fn void starpu_tag_declare_deps_array(starpu_tag_t id, unsigned ndeps, starpu_tag_t *array)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function is similar to starpu_tag_declare_deps(), except
			
 
				 that its does not take a variable number of arguments but an array of
			
 
				 tags of size \p ndeps.
			
@@ -69,7 +69,7 @@ starpu_tag_declare_deps_array((starpu_tag_t)0x1, 2, tag_array);
 
				 \endcode
			
 
				 
			
 
				 \fn int starpu_tag_wait(starpu_tag_t id)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function blocks until the task associated to tag \p id has
			
 
				 been executed. This is a blocking call which must therefore not be
			
 
				 called within tasks or callbacks, but only from the application
			
@@ -80,13 +80,13 @@ task for which the strucuture starpu_task was freed (e.g. if the field
 
				 starpu_task::destroy was enabled).
			
 
				 
			
 
				 \fn int starpu_tag_wait_array(unsigned ntags, starpu_tag_t *id)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function is similar to starpu_tag_wait() except that it
			
 
				 blocks until all the \p ntags tags contained in the array \p id are
			
 
				 terminated.
			
 
				 
			
 
				 \fn void starpu_tag_restart(starpu_tag_t id)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function can be used to clear the <em>already
			
 
				 notified</em> status of a tag which is not associated with a task.
			
 
				 Before that, calling starpu_tag_notify_from_apps() again will not
			
@@ -94,13 +94,13 @@ notify the successors. After that, the next call to
 
				 starpu_tag_notify_from_apps() will notify the successors.
			
 
				 
			
 
				 \fn void starpu_tag_remove(starpu_tag_t id)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function releases the resources associated to tag \p id.
			
 
				 It can be called once the corresponding task has been executed and
			
 
				 when there is no other tag that depend on this tag anymore.
			
 
				 
			
 
				 \fn void starpu_tag_notify_from_apps (starpu_tag_t id)
			
 
				-\ingroup Explicit_Dependencies
			
 
				+\ingroup API_Explicit_Dependencies
			
 
				 \brief This function explicitly unlocks tag \p id. It may be useful in
			
 
				 the case of applications which execute part of their computation
			
 
				 outside StarPU tasks (e.g. third-party libraries). It is also provided
			
--- a/doc/doxygen/chapters/api/fft_support.doxy
+++ b/doc/doxygen/chapters/api/fft_support.doxy
@@ -6,56 +6,56 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup FFT_Support FFT Support
			
 
				+/*! \defgroup API_FFT_Support FFT Support
			
 
				 
			
 
				 \fn void * starpufft_malloc(size_t n)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Allocates memory for \p n bytes. This is preferred over malloc(),
			
 
				 since it allocates pinned memory, which allows overlapped transfers.
			
 
				 
			
 
				 \fn void * starpufft_free(void *p)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Release memory previously allocated.
			
 
				 
			
 
				 \fn struct starpufft_plan * starpufft_plan_dft_1d(int n, int sign, unsigned flags)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Initializes a plan for 1D FFT of size \p n. \p sign can be STARPUFFT_FORWARD
			
 
				 or STARPUFFT_INVERSE. \p flags must be 0.
			
 
				 
			
 
				 \fn struct starpufft_plan * starpufft_plan_dft_2d(int n, int m, int sign, unsigned flags)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Initializes a plan for 2D FFT of size (\p n, \p m). \p sign can be
			
 
				 STARPUFFT_FORWARD or STARPUFFT_INVERSE. flags must be \p 0.
			
 
				 
			
 
				 \fn struct starpu_task * starpufft_start(starpufft_plan p, void *in, void *out)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Start an FFT previously planned as \p p, using \p in and \p out as
			
 
				 input and output. This only submits the task and does not wait for it.
			
 
				 The application should call starpufft_cleanup() to unregister the
			
 
				 
			
 
				 \fn struct starpu_task * starpufft_start_handle(starpufft_plan p, starpu_data_handle_t in, starpu_data_handle_t out)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Start an FFT previously planned as \p p, using data handles \p in and
			
 
				 \p out as input and output (assumed to be vectors of elements of the
			
 
				 expected types). This only submits the task and does not wait for it.
			
 
				 
			
 
				 \fn void starpufft_execute(starpufft_plan p, void *in, void *out)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Execute an FFT previously planned as \p p, using \p in and \p out as
			
 
				 input and output. This submits and waits for the task.
			
 
				 
			
 
				 \fn void starpufft_execute_handle(starpufft_plan p, starpu_data_handle_t in, starpu_data_handle_t out)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Execute an FFT previously planned as \p p, using data handles \p in
			
 
				 and \p out as input and output (assumed to be vectors of elements of
			
 
				 the expected types). This submits and waits for the task.
			
 
				 
			
 
				 \fn void starpufft_cleanup(starpufft_plan p)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Releases data for plan \p p, in the starpufft_start() case.
			
 
				 
			
 
				 \fn void starpufft_destroy_plan(starpufft_plan p)
			
 
				-\ingroup FFT_Support
			
 
				+\ingroup API_FFT_Support
			
 
				 Destroys plan \p p, i.e. release all CPU (fftw) and GPU (cufft)
			
 
				 resources.
			
 
				 
			
--- a/doc/doxygen/chapters/api/fxt_support.doxy
+++ b/doc/doxygen/chapters/api/fxt_support.doxy
@@ -6,11 +6,11 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup FxT_Support FxT Support
			
 
				+/*! \defgroup API_FxT_Support FxT Support
			
 
				 
			
 
				 \struct starpu_fxt_codelet_event
			
 
				 \brief todo
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 \var starpu_fxt_codelet_event::symbol[256
			
 
				 name of the codelet
			
 
				 \var starpu_fxt_codelet_event::workerid
			
@@ -21,7 +21,7 @@ name of the codelet
 
				 
			
 
				 \struct starpu_fxt_options
			
 
				 \brief todo
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 \var starpu_fxt_options::per_task_colour
			
 
				 \var starpu_fxt_options::no_counter
			
 
				 \var starpu_fxt_options::no_bus
			
@@ -55,15 +55,15 @@ In case we want to dump the list of codelets to an external tool
 
				 In case we want to dump the list of codelets to an external tool
			
 
				 
			
 
				 \fn void starpu_fxt_options_init(struct starpu_fxt_options *options)
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 todo
			
 
				 
			
 
				 \fn void starpu_fxt_generate_trace(struct starpu_fxt_options *options)
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 todo
			
 
				 
			
 
				 \fn void starpu_fxt_start_profiling(void)
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 Start recording the trace. The trace is by default started from
			
 
				 starpu_init() call, but can be paused by using
			
 
				 starpu_fxt_stop_profiling(), in which case
			
@@ -71,7 +71,7 @@ starpu_fxt_start_profiling() should be called to resume recording
 
				 events.
			
 
				 
			
 
				 \fn void starpu_fxt_stop_profiling(void)
			
 
				-\ingroup FxT_Support
			
 
				+\ingroup API_FxT_Support
			
 
				 Stop recording the trace. The trace is by default stopped when calling
			
 
				 starpu_shutdown(). starpu_fxt_stop_profiling() can however be used to
			
 
				 stop it earlier. starpu_fxt_start_profiling() can then be called to
			
--- a/doc/doxygen/chapters/api/implicit_dependencies.doxy
+++ b/doc/doxygen/chapters/api/implicit_dependencies.doxy
@@ -6,7 +6,7 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Implicit_Data_Dependencies Implicit Data Dependencies
			
 
				+/*! \defgroup API_Implicit_Data_Dependencies Implicit Data Dependencies
			
 
				 
			
 
				 \brief In this section, we describe how StarPU makes it possible to
			
 
				 insert implicit task dependencies in order to enforce sequential data
			
@@ -19,7 +19,7 @@ the two first tasks and the third one. Implicit data dependencies are
 
				 also inserted in the case of data accesses from the application.
			
 
				 
			
 
				 \fn starpu_data_set_default_sequential_consistency_flag(unsigned flag)
			
 
				-\ingroup Implicit_Data_Dependencies
			
 
				+\ingroup API_Implicit_Data_Dependencies
			
 
				 \brief Set the default sequential consistency flag. If a non-zero
			
 
				 value is passed, a sequential data consistency will be enforced for
			
 
				 all handles registered after this function call, otherwise it is
			
@@ -29,11 +29,11 @@ data handle with the function
 
				 starpu_data_set_sequential_consistency_flag().
			
 
				 
			
 
				 \fn unsigned starpu_data_get_default_sequential_consistency_flag(void)
			
 
				-\ingroup Implicit_Data_Dependencies
			
 
				+\ingroup API_Implicit_Data_Dependencies
			
 
				 \brief Return the default sequential consistency flag
			
 
				 
			
 
				 \fn void starpu_data_set_sequential_consistency_flag(starpu_data_handle_t handle, unsigned flag)
			
 
				-\ingroup Implicit_Data_Dependencies
			
 
				+\ingroup API_Implicit_Data_Dependencies
			
 
				 \brief Set the data consistency mode associated to a data handle. The
			
 
				 consistency mode set using this function has the priority over the
			
 
				 default mode which can be set with
			
--- a/doc/doxygen/chapters/api/initialization.doxy
+++ b/doc/doxygen/chapters/api/initialization.doxy
@@ -6,11 +6,11 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Initialization_and_Termination Initialization and Termination
			
 
				+/*! \defgroup API_Initialization_and_Termination Initialization and Termination
			
 
				 
			
 
				 \struct starpu_driver
			
 
				 \brief structure for a driver
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 \var starpu_driver::type
			
 
				 The type of the driver. Only STARPU_CPU_DRIVER,
			
 
				 STARPU_CUDA_DRIVER and STARPU_OPENCL_DRIVER are currently supported.
			
@@ -19,11 +19,11 @@ The identifier of the driver.
 
				 
			
 
				 \struct starpu_vector_interface
			
 
				 \brief vector interface for contiguous (non-strided) buffers
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 
			
 
				 
			
 
				 \struct starpu_conf
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 \brief structure for configuring StarPU.
			
 
				 
			
 
				 This structure is passed to the starpu_init() function in order to
			
@@ -192,7 +192,7 @@ be interesting to specify a bigger value to avoid any
 
				 flushing (which would disturb the trace).
			
 
				 
			
 
				 \fn int starpu_init(struct starpu_conf *conf)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 This is StarPU initialization method, which must be called prior to
			
 
				 any other StarPU call. It is possible to specify StarPU’s
			
 
				 configuration (e.g. scheduling policy, number of cores, ...) by
			
@@ -202,13 +202,13 @@ returns 0. Otherwise, -ENODEV indicates that no worker was available
 
				 (so that StarPU was not initialized).
			
 
				 
			
 
				 \fn int starpu_initialize(struct starpu_conf *user_conf, int *argc, char ***argv)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 Alternative initialization method with argc and argv. This is used by
			
 
				 MIC, MPI, and SCC implementation. Do not call starpu_init() and
			
 
				 starpu_initialize() in the same program.
			
 
				 
			
 
				 \fn int starpu_conf_init(struct starpu_conf *conf)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 This function initializes the conf structure passed as argument with
			
 
				 the default values. In case some configuration parameters are already
			
 
				 specified through environment variables, starpu_conf_init initializes
			
@@ -219,28 +219,28 @@ completion, this function returns 0. Otherwise, -EINVAL indicates that
 
				 the argument was NULL.
			
 
				 
			
 
				 \fn void starpu_shutdown(void)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 This is StarPU termination method. It must be called at the end of the
			
 
				 application: statistics and other post-mortem debugging information
			
 
				 are not guaranteed to be available until this method has been called.
			
 
				 
			
 
				 \fn int starpu_asynchronous_copy_disabled(void)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 Return 1 if asynchronous data transfers between CPU and accelerators
			
 
				 are disabled.
			
 
				 
			
 
				 \fn int starpu_asynchronous_cuda_copy_disabled(void)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 Return 1 if asynchronous data transfers between CPU and CUDA
			
 
				 accelerators are disabled.
			
 
				 
			
 
				 \fn int starpu_asynchronous_opencl_copy_disabled(void)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 Return 1 if asynchronous data transfers between CPU and OpenCL
			
 
				 accelerators are disabled.
			
 
				 
			
 
				 \fn void starpu_topology_print(FILE *f)
			
 
				-\ingroup Initialization_and_Termination
			
 
				+\ingroup API_Initialization_and_Termination
			
 
				 Prints a description of the topology on f.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/insert_task.doxy
+++ b/doc/doxygen/chapters/api/insert_task.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Insert_Task Insert_Task
			
 
				+/*! \defgroup API_Insert_Task Insert_Task
			
 
				 
			
 
				 \fn int starpu_insert_task(struct starpu_codelet *cl, ...)
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief Create and submit a task corresponding to \p cl with the
			
 
				 following arguments. The argument list must be zero-terminated.
			
 
				 
			
@@ -34,18 +34,18 @@ starpu_codelet_unpack_args() must be called within the codelet
 
				 implementation to retrieve them.
			
 
				 
			
 
				 \def STARPU_VALUE
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by a pointer to a constant value and the size of the
			
 
				 constant
			
 
				 
			
 
				 \def STARPU_CALLBACK
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by a pointer to a callback function
			
 
				 
			
 
				 \def STARPU_CALLBACK_WITH_ARG
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by two pointers: one to a callback function, and the other
			
 
				 to be given as an argument to the callback function; this is
			
@@ -53,45 +53,45 @@ equivalent to using both ::STARPU_CALLBACK and
 
				 ::STARPU_CALLBACK_WITH_ARG.
			
 
				 
			
 
				 \def STARPU_CALLBACK_ARG
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by a pointer to be given as an argument to the callback
			
 
				 function
			
 
				 
			
 
				 \def STARPU_PRIORITY
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by a integer defining a priority level
			
 
				 
			
 
				 \def STARPU_DATA_ARRAY
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief TODO
			
 
				 
			
 
				 \def STARPU_TAG
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must be followed by a tag.
			
 
				 
			
 
				 \def STARPU_FLOPS
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by an amount of floating point operations, as a double.
			
 
				 Users <b>MUST</b> explicitly cast into double, otherwise parameter
			
 
				 passing will not work.
			
 
				 
			
 
				 \def STARPU_SCHED_CTX
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief this macro is used when calling starpu_insert_task(), and must
			
 
				 be followed by the id of the scheduling context to which we want to
			
 
				 submit the task.
			
 
				 
			
 
				 \fn void starpu_codelet_pack_args(void **arg_buffer, size_t *arg_buffer_size, ...)
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief Pack arguments of type ::STARPU_VALUE into a buffer which can be
			
 
				 given to a codelet and later unpacked with the function
			
 
				 starpu_codelet_unpack_args().
			
 
				 
			
 
				 \fn void starpu_codelet_unpack_args (void *cl_arg, ...)
			
 
				-\ingroup Insert_Task
			
 
				+\ingroup API_Insert_Task
			
 
				 \brief Retrieve the arguments of type ::STARPU_VALUE associated to a
			
 
				 task automatically created using the function starpu_insert_task().
			
 
				 
			
--- a/doc/doxygen/chapters/api/lower_bound.doxy
+++ b/doc/doxygen/chapters/api/lower_bound.doxy
@@ -6,42 +6,42 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Theoretical_lower_bound_on_execution_time Theoretical lower bound on execution time
			
 
				+/*! \defgroup API_Theoretical_Lower_Bound_on_Execution_Time Theoretical Lower Bound on Execution Time
			
 
				 
			
 
				 \brief Compute theoretical upper computation efficiency bound
			
 
				 corresponding to some actual execution.
			
 
				 
			
 
				 \fn void starpu_bound_start (int deps, int prio)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Start recording tasks (resets stats). \p deps tells whether
			
 
				 dependencies should be recorded too (this is quite expensive)
			
 
				 
			
 
				 \fn void starpu_bound_stop (void)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Stop recording tasks
			
 
				 
			
 
				 \fn void starpu_bound_print_dot (FILE *output)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Print the DAG that was recorded
			
 
				 
			
 
				 \fn void starpu_bound_compute (double *res, double *integer_res, int integer)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Get theoretical upper bound (in ms) (needs glpk support
			
 
				 detected by configure script). It returns 0 if some performance models
			
 
				 are not calibrated.
			
 
				 
			
 
				 \fn void starpu_bound_print_lp (FILE *output)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Emit the Linear Programming system on \p output for the recorded
			
 
				 tasks, in the lp format
			
 
				 
			
 
				 \fn void starpu_bound_print_mps (FILE *output)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Emit the Linear Programming system on \p output for the recorded
			
 
				 tasks, in the mps format
			
 
				 
			
 
				 \fn void starpu_bound_print (FILE *output, int integer)
			
 
				-\ingroup Theoretical_lower_bound_on_execution_time
			
 
				+\ingroup API_Theoretical_Lower_Bound_on_Execution_Time
			
 
				 Emit statistics of actual execution vs theoretical upper bound.
			
 
				 \p integer permits to choose between integer solving (which takes a
			
 
				 long time but is correct), and relaxed solving (which provides an
			
--- a/doc/doxygen/chapters/api/misc_helpers.doxy
+++ b/doc/doxygen/chapters/api/misc_helpers.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Miscellaneous_helpers Miscellaneous helpers
			
 
				+/*! \defgroup API_Miscellaneous_Helpers Miscellaneous Helpers
			
 
				 
			
 
				 \fn int starpu_data_cpy(starpu_data_handle_t dst_handle, starpu_data_handle_t src_handle, int asynchronous, void (*callback_func)(void*), void *callback_arg)
			
 
				-\ingroup Miscellaneous_helpers
			
 
				+\ingroup API_Miscellaneous_Helpers
			
 
				 Copy the content of \p src_handle into \p dst_handle. The parameter \p
			
 
				 asynchronous indicates whether the function should block or not. In
			
 
				 the case of an asynchronous call, it is possible to synchronize with
			
@@ -20,7 +20,7 @@ the handle has been copied, and it is given the pointer \p pointer
 
				 callback_arg as argument.
			
 
				 
			
 
				 \fn void starpu_execute_on_each_worker(void (*func)(void *), void *arg, uint32_t where)
			
 
				-\ingroup Miscellaneous_helpers
			
 
				+\ingroup API_Miscellaneous_Helpers
			
 
				 This function executes the given function on a subset of workers. When
			
 
				 calling this method, the offloaded function \p func is executed by
			
 
				 every StarPU worker that may execute the function. The argument \p arg
			
--- a/doc/doxygen/chapters/api/mpi.doxy
+++ b/doc/doxygen/chapters/api/mpi.doxy
@@ -6,13 +6,13 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup MPI_Support MPI Support
			
 
				+/*! \defgroup API_MPI_Support MPI Support
			
 
				 
			
 
				 @name Initialisation
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 
			
 
				 \fn int starpu_mpi_init (int *argc, char ***argv, int initialize_mpi)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Initializes the starpumpi library. \p initialize_mpi indicates if MPI
			
 
				 should be initialized or not by StarPU. If the value is not 0, MPI
			
 
				 will be initialized by calling <c>MPI_Init_Thread(argc, argv,
			
@@ -20,63 +20,63 @@ MPI_THREAD_SERIALIZED, ...)</c>.
 
				 
			
 
				 \fn int starpu_mpi_initialize (void)
			
 
				 \deprecated
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 This function has been made deprecated. One should use instead the
			
 
				 function starpu_mpi_init(). This function does not call MPI_Init(), it
			
 
				 should be called beforehand.
			
 
				 
			
 
				 \fn int starpu_mpi_initialize_extended (int *rank, int *world_size)
			
 
				 \deprecated
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 This function has been made deprecated. One should use instead the
			
 
				 function starpu_mpi_init(). MPI will be initialized by starpumpi by
			
 
				 calling <c>MPI_Init_Thread(argc, argv, MPI_THREAD_SERIALIZED,
			
 
				 ...)</c>.
			
 
				 
			
 
				 \fn int starpu_mpi_shutdown (void)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Cleans the starpumpi library. This must be called between calling
			
 
				 starpu_mpi functions and starpu_shutdown(). MPI_Finalize() will be
			
 
				 called if StarPU-MPI has been initialized by starpu_mpi_init().
			
 
				 
			
 
				 \fn void starpu_mpi_comm_amounts_retrieve (size_t *comm_amounts)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Retrieve the current amount of communications from the current node in
			
 
				 the array \p comm_amounts which must have a size greater or equal to
			
 
				 the world size. Communications statistics must be enabled (see
			
 
				 STARPU_COMM_STATS).
			
 
				 
			
 
				 @name Communication
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 
			
 
				 \fn int starpu_mpi_send (starpu_data_handle_t data_handle, int dest, int mpi_tag, MPI_Comm comm)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Performs a standard-mode, blocking send of \p data_handle to the node
			
 
				 \p dest using the message tag \p mpi_tag within the communicator \p
			
 
				 comm.
			
 
				 
			
 
				 \fn int starpu_mpi_recv (starpu_data_handle_t data_handle, int source, int mpi_tag, MPI_Comm comm, MPI_Status *status)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Performs a standard-mode, blocking receive in \p data_handle from the
			
 
				 node \p source using the message tag \p mpi_tag within the
			
 
				 communicator \p comm.
			
 
				 
			
 
				 \fn int starpu_mpi_isend (starpu_data_handle_t data_handle, starpu_mpi_req *req, int dest, int mpi_tag, MPI_Comm comm)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a standard-mode, non blocking send of \p data_handle to the node
			
 
				 \p dest using the message tag \p mpi_tag within the communicator \p
			
 
				 comm. After the call, the pointer to the request \p req can be used to
			
 
				 test or to wait for the completion of the communication.
			
 
				 
			
 
				 \fn int starpu_mpi_irecv (starpu_data_handle_t data_handle, starpu_mpi_req *req, int source, int mpi_tag, MPI_Comm comm)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a nonblocking receive in \p data_handle from the node \p source
			
 
				 using the message tag \p mpi_tag within the communicator \p comm.
			
 
				 After the call, the pointer to the request \p req can be used to test
			
 
				 or to wait for the completion of the communication.
			
 
				 
			
 
				 \fn int starpu_mpi_isend_detached (starpu_data_handle_t data_handle, int dest, int mpi_tag, MPI_Comm comm, void (*callback)(void *), void *arg)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a standard-mode, non blocking send of \p data_handle to the node
			
 
				 \p dest using the message tag \p mpi_tag within the communicator \p
			
 
				 comm. On completion, the \p callback function is called with the
			
@@ -87,7 +87,7 @@ to the system, there is no need to test or to wait for the completion
 
				 of the request.
			
 
				 
			
 
				 \fn int starpu_mpi_irecv_detached (starpu_data_handle_t data_handle, int source, int mpi_tag, MPI_Comm comm, void (*callback)(void *), void *arg)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a nonblocking receive in \p data_handle from the node \p source
			
 
				 using the message tag \p mpi_tag within the communicator \p comm. On
			
 
				 completion, the \p callback function is called with the argument \p
			
@@ -98,34 +98,34 @@ to the system, there is no need to test or to wait for the completion
 
				 of the request.
			
 
				 
			
 
				 \fn int starpu_mpi_wait (starpu_mpi_req *req, MPI_Status *status)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Returns when the operation identified by request \p req is complete.
			
 
				 
			
 
				 \fn int starpu_mpi_test (starpu_mpi_req *req, int *flag, MPI_Status *status)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 If the operation identified by \p req is complete, set \p flag to 1.
			
 
				 The \p status object is set to contain information on the completed
			
 
				 operation.
			
 
				 
			
 
				 \fn int starpu_mpi_barrier (MPI_Comm comm)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Blocks the caller until all group members of the communicator \p comm
			
 
				 have called it.
			
 
				 
			
 
				 \fn int starpu_mpi_isend_detached_unlock_tag (starpu_data_handle_t data_handle, int dest, int mpi_tag, MPI_Comm comm, starpu_tag_t tag)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a standard-mode, non blocking send of \p data_handle to the node
			
 
				 \p dest using the message tag \p mpi_tag within the communicator \p
			
 
				 comm. On completion, \p tag is unlocked.
			
 
				 
			
 
				 \fn int starpu_mpi_irecv_detached_unlock_tag (starpu_data_handle_t data_handle, int source, int mpi_tag, MPI_Comm comm, starpu_tag_t tag)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts a nonblocking receive in \p data_handle from the node \p source
			
 
				 using the message tag \p mpi_tag within the communicator \p comm. On
			
 
				 completion, \p tag is unlocked.
			
 
				 
			
 
				 \fn int starpu_mpi_isend_array_detached_unlock_tag (unsigned array_size, starpu_data_handle_t *data_handle, int *dest, int *mpi_tag, MPI_Comm *comm, starpu_tag_t tag)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts \p array_size standard-mode, non blocking send. Each post sends
			
 
				 the n-th data of the array \p data_handle to the n-th node of the
			
 
				 array \p dest using the n-th message tag of the array \p mpi_tag
			
@@ -133,7 +133,7 @@ within the n-th communicator of the array \p comm. On completion of
 
				 the all the requests, \p tag is unlocked.
			
 
				 
			
 
				 \fn int starpu_mpi_irecv_array_detached_unlock_tag (unsigned array_size, starpu_data_handle_t *data_handle, int *source, int *mpi_tag, MPI_Comm *comm, starpu_tag_t tag)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Posts \p array_size nonblocking receive. Each post receives in the n-th
			
 
				 data of the array \p data_handle from the n-th node of the array \p
			
 
				 source using the n-th message tag of the array \p mpi_tag within the
			
@@ -141,57 +141,57 @@ n-th communicator of the array \p comm. On completion of the all the
 
				 requests, \p tag is unlocked.
			
 
				 
			
 
				 @name Communication Cache
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 
			
 
				 \fn void starpu_mpi_cache_flush (MPI_Comm comm, starpu_data_handle_t data_handle)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Clear the send and receive communication cache for the data
			
 
				 \p data_handle. The function has to be called synchronously by all the
			
 
				 MPI nodes. The function does nothing if the cache mechanism is
			
 
				 disabled (see STARPU_MPI_CACHE).
			
 
				 
			
 
				 \fn void starpu_mpi_cache_flush_all_data (MPI_Comm comm)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Clear the send and receive communication cache for all data. The
			
 
				 function has to be called synchronously by all the MPI nodes. The
			
 
				 function does nothing if the cache mechanism is disabled (see
			
 
				 STARPU_MPI_CACHE).
			
 
				 
			
 
				 @name MPI Insert Task
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 
			
 
				 \fn int starpu_data_set_tag (starpu_data_handle_t handle, int tag)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Tell StarPU-MPI which MPI tag to use when exchanging the data.
			
 
				 
			
 
				 \fn int starpu_data_get_tag (starpu_data_handle_t handle)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Returns the MPI tag to be used when exchanging the data.
			
 
				 
			
 
				 \fn int starpu_data_set_rank (starpu_data_handle_t handle, int rank)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Tell StarPU-MPI which MPI node "owns" a given data, that is, the node
			
 
				 which will always keep an up-to-date value, and will by default
			
 
				 execute tasks which write to it.
			
 
				 
			
 
				 \fn int starpu_data_get_rank (starpu_data_handle_t handle)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Returns the last value set by starpu_data_set_rank().
			
 
				 
			
 
				 \def STARPU_EXECUTE_ON_NODE
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 this macro is used when calling starpu_mpi_insert_task(), and must be
			
 
				 followed by a integer value which specified the node on which to
			
 
				 execute the codelet.
			
 
				 
			
 
				 \def STARPU_EXECUTE_ON_DATA
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 this macro is used when calling starpu_mpi_insert_task(), and must be
			
 
				 followed by a data handle to specify that the node owning the given
			
 
				 data will execute the codelet.
			
 
				 
			
 
				 \fn int starpu_mpi_insert_task (MPI_Comm comm, struct starpu_codelet *codelet, ...)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Create and submit a task corresponding to codelet with the following
			
 
				 arguments. The argument list must be zero-terminated.
			
 
				 
			
@@ -229,28 +229,28 @@ allows not to send data twice to the same MPI node, unless the data
 
				 has been modified. The cache can be disabled (see STARPU_MPI_CACHE).
			
 
				 
			
 
				 \fn void starpu_mpi_get_data_on_node (MPI_Comm comm, starpu_data_handle_t data_handle, int node)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Transfer data \p data_handle to MPI node \p node, sending it from its
			
 
				 owner if needed. At least the target node and the owner have to call
			
 
				 the function.
			
 
				 
			
 
				 \fn void starpu_mpi_get_data_on_node_detached (MPI_Comm comm, starpu_data_handle_t data_handle, int node, void (*callback)(void*), void *arg)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Transfer data \p data_handle to MPI node \p node, sending it from its
			
 
				 owner if needed. At least the target node and the owner have to call
			
 
				 the function. On reception, the \p callback function is called with
			
 
				 the argument \p arg.
			
 
				 
			
 
				 @name Collective Operations
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 
			
 
				 \fn void starpu_mpi_redux_data (MPI_Comm comm, starpu_data_handle_t data_handle)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Perform a reduction on the given data. All nodes send the data to its
			
 
				 owner node which will perform a reduction.
			
 
				 
			
 
				 \fn int starpu_mpi_scatter_detached (starpu_data_handle_t *data_handles, int count, int root, MPI_Comm comm, void (*scallback)(void *), void *sarg, void (*rcallback)(void *), void *rarg)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Scatter data among processes of the communicator based on the
			
 
				 ownership of the data. For each data of the array \p data_handles, the
			
 
				 process \p root sends the data to the process owning this data. Processes
			
@@ -261,7 +261,7 @@ rcallback function is called with the argument \p rarg on any other
 
				 process.
			
 
				 
			
 
				 \fn int starpu_mpi_gather_detached (starpu_data_handle_t *data_handles, int count, int root, MPI_Comm comm, void (*scallback)(void *), void *sarg, void (*rcallback)(void *), void *rarg)
			
 
				-\ingroup MPI_Support
			
 
				+\ingroup API_MPI_Support
			
 
				 Gather data from the different processes of the communicator onto the
			
 
				 process \p root. Each process owning data handle in the array
			
 
				 \p data_handles will send them to the process \p root. The process \p
			
--- a/doc/doxygen/chapters/api/multiformat_data_interface.doxy
+++ b/doc/doxygen/chapters/api/multiformat_data_interface.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Multiformat_Data_Interface Multiformat Data Interface
			
 
				+/*! \defgroup API_Multiformat_Data_Interface Multiformat Data Interface
			
 
				 
			
 
				 \struct starpu_multiformat_data_interface_ops
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 \brief The different fields are:
			
 
				 \var starpu_multiformat_data_interface_ops::cpu_elemsize
			
 
				         the size of each element on CPUs
			
@@ -28,7 +28,7 @@
 
				 
			
 
				 \struct starpu_multiformat_interface
			
 
				 \brief todo
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 \var starpu_multiformat_interface::id
			
 
				 \var starpu_multiformat_interface::cpu_ptr
			
 
				 \var starpu_multiformat_interface::cuda_ptr
			
@@ -38,7 +38,7 @@
 
				 \var starpu_multiformat_interface::ops
			
 
				 
			
 
				 \fn void starpu_multiformat_data_register(starpu_data_handle_t *handle, unsigned home_node, void *ptr, uint32_t nobjects, struct starpu_multiformat_data_interface_ops *format_ops)
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 Register a piece of data that can be represented in different
			
 
				 ways, depending upon the processing unit that manipulates it. It
			
 
				 allows the programmer, for instance, to use an array of structures
			
@@ -47,19 +47,19 @@ GPU. \p nobjects is the number of elements in the data. \p format_ops
 
				 describes the format.
			
 
				 
			
 
				 \def STARPU_MULTIFORMAT_GET_CPU_PTR(void *interface)
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 returns the local pointer to the data with CPU format.
			
 
				 
			
 
				 \def STARPU_MULTIFORMAT_GET_CUDA_PTR(void *interface)
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 returns the local pointer to the data with CUDA format.
			
 
				 
			
 
				 \def STARPU_MULTIFORMAT_GET_OPENCL_PTR(void *interface)
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 returns the local pointer to the data with OpenCL format.
			
 
				 
			
 
				 \def STARPU_MULTIFORMAT_GET_NX (void *interface)
			
 
				-\ingroup Multiformat_Data_Interface
			
 
				+\ingroup API_Multiformat_Data_Interface
			
 
				 returns the number of elements in the data.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/opencl_extensions.doxy
+++ b/doc/doxygen/chapters/api/opencl_extensions.doxy
@@ -6,43 +6,43 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup OpenCL_Extensions OpenCL Extensions
			
 
				+/*! \defgroup API_OpenCL_Extensions OpenCL Extensions
			
 
				 
			
 
				 \def STARPU_USE_OPENCL
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 \brief This macro is defined when StarPU has been installed with
			
 
				 OpenCL support. It should be used in your code to detect the
			
 
				 availability of OpenCL as shown in Full source code for the 'Scaling a
			
 
				 Vector' example.
			
 
				 
			
 
				 @name Writing OpenCL kernels
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 
			
 
				 \fn void starpu_opencl_get_context(int devid, cl_context *context)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Places the OpenCL context of the device designated by \p devid
			
 
				 into \p context.
			
 
				 
			
 
				 \fn void starpu_opencl_get_device(int devid, cl_device_id *device)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Places the cl_device_id corresponding to \p devid in \p device.
			
 
				 
			
 
				 \fn void starpu_opencl_get_queue(int devid, cl_command_queue *queue)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Places the command queue of the device designated by \p devid
			
 
				 into \p queue.
			
 
				 
			
 
				 \fn void starpu_opencl_get_current_context(cl_context *context)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Return the context of the current worker.
			
 
				 
			
 
				 \fn void starpu_opencl_get_current_queue(cl_command_queue *queue)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Return the computation kernel command queue of the current
			
 
				 worker.
			
 
				 
			
 
				 \fn int starpu_opencl_set_kernel_args(cl_int *err, cl_kernel *kernel, ...)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Sets the arguments of a given kernel. The list of arguments
			
 
				 must be given as <c>(size_t size_of_the_argument, cl_mem *
			
 
				 pointer_to_the_argument)</c>. The last argument must be 0. Returns the
			
@@ -65,7 +65,7 @@ if (n != 2)
 
				 \endcode
			
 
				 
			
 
				 @name Compiling OpenCL kernels
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 
			
 
				 Source codes for OpenCL kernels can be stored in a file or in a
			
 
				 string. StarPU provides functions to build the program executable for
			
@@ -77,26 +77,26 @@ different programs on the different OpenCL devices, for relocation
 
				 purpose for instance).
			
 
				 
			
 
				 \struct starpu_opencl_program
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 \brief Stores the OpenCL programs as compiled for the different OpenCL
			
 
				 devices.
			
 
				 \var starpu_opencl_program::programs
			
 
				 Stores each program for each OpenCL device.
			
 
				 
			
 
				 \fn int starpu_opencl_load_opencl_from_file(const char *source_file_name, struct starpu_opencl_program *opencl_programs, const char* build_options)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 This function compiles an OpenCL source code stored in a file.
			
 
				 
			
 
				 \fn int starpu_opencl_load_opencl_from_string(const char *opencl_program_source, struct starpu_opencl_program *opencl_programs, const char* build_options)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 This function compiles an OpenCL source code stored in a string.
			
 
				 
			
 
				 \fn int starpu_opencl_unload_opencl(struct starpu_opencl_program *opencl_programs)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 This function unloads an OpenCL compiled code.
			
 
				 
			
 
				 \fn void starpu_opencl_load_program_source(const char *source_file_name, char *located_file_name, char *located_dir_name, char *opencl_program_source)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Store the contents of the file \p source_file_name in the buffer
			
 
				 \p opencl_program_source. The file \p source_file_name can be located in the
			
 
				 current directory, or in the directory specified by the environment
			
@@ -109,7 +109,7 @@ where it has been located. Otherwise, they are both set to the empty
 
				 string.
			
 
				 
			
 
				 \fn int starpu_opencl_compile_opencl_from_file(const char *source_file_name, const char * build_options)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Compile the OpenCL kernel stored in the file \p source_file_name
			
 
				 with the given options \p build_options and stores the result in the
			
 
				 directory <c>$STARPU_HOME/.starpu/opencl</c> with the same filename as
			
@@ -118,7 +118,7 @@ and the filename is suffixed with the vendor id and the device id of
 
				 the OpenCL device.
			
 
				 
			
 
				 \fn int starpu_opencl_compile_opencl_from_string(const char *opencl_program_source, const char *file_name, const char*build_options)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Compile the OpenCL kernel in the string \p opencl_program_source
			
 
				 with the given options \p build_options and stores the result in the
			
 
				 directory <c>$STARPU_HOME/.starpu/opencl</c> with the filename \p
			
@@ -127,29 +127,29 @@ filename is suffixed with the vendor id and the device id of the
 
				 OpenCL device.
			
 
				 
			
 
				 \fn int starpu_opencl_load_binary_opencl(const char *kernel_id, struct starpu_opencl_program *opencl_programs)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Compile the binary OpenCL kernel identified with \p kernel_id.
			
 
				 For every OpenCL device, the binary OpenCL kernel will be loaded from
			
 
				 the file
			
 
				 <c>$STARPU_HOME/.starpu/opencl/\<kernel_id\>.\<device_type\>.vendor_id_\<vendor_id\>_device_id_\<device_id\></c>.
			
 
				 
			
 
				 @name Loading OpenCL kernels
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 
			
 
				 \fn int starpu_opencl_load_kernel(cl_kernel *kernel, cl_command_queue *queue, struct starpu_opencl_program *opencl_programs, const char *kernel_name, int devid)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Create a kernel \p kernel for device \p devid, on its computation
			
 
				 command queue returned in \p queue, using program \p opencl_programs
			
 
				 and name \p kernel_name.
			
 
				 
			
 
				 \fn int starpu_opencl_release_kernel(cl_kernel kernel)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Release the given \p kernel, to be called after kernel execution.
			
 
				 
			
 
				 @name OpenCL statistics
			
 
				 
			
 
				 \fn int starpu_opencl_collect_stats(cl_event event)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 This function allows to collect statistics on a kernel execution.
			
 
				 After termination of the kernels, the OpenCL codelet should call this
			
 
				 function to pass it the even returned by clEnqueueNDRangeKernel, to
			
@@ -157,48 +157,48 @@ let StarPU collect statistics about the kernel execution (used cycles,
 
				 consumed power).
			
 
				 
			
 
				 @name OpenCL utilities
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 
			
 
				 \fn const char * starpu_opencl_error_string(cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Return the error message in English corresponding to \p status, an OpenCL
			
 
				 error code.
			
 
				 
			
 
				 \fn void starpu_opencl_display_error(const char *func, const char *file, int line, const char *msg, cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Given a valid error status, prints the corresponding error message on
			
 
				 stdout, along with the given function name \p func, the given filename
			
 
				 \p file, the given line number \p line and the given message \p msg.
			
 
				 
			
 
				 \def STARPU_OPENCL_DISPLAY_ERROR(cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Call the function starpu_opencl_display_error() with the given error
			
 
				 \p status, the current function name, current file and line number,
			
 
				 and a empty message.
			
 
				 
			
 
				 \fn void starpu_opencl_report_error(const char *func, const char *file, int line, const char *msg, cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Call the function starpu_opencl_display_error() and abort.
			
 
				 
			
 
				 \def STARPU_OPENCL_REPORT_ERROR (cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Call the function starpu_opencl_report_error() with the given error \p
			
 
				 status, with the current function name, current file and line number,
			
 
				 and a empty message.
			
 
				 
			
 
				 \def STARPU_OPENCL_REPORT_ERROR_WITH_MSG(const char *msg, cl_int status)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Call the function starpu_opencl_report_error() with the given \p msg
			
 
				 and the given error \p status, with the current function name, current
			
 
				 file and line number.
			
 
				 
			
 
				 \fn cl_int starpu_opencl_allocate_memory(cl_mem *addr, size_t size, cl_mem_flags flags)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Allocate \p size bytes of memory, stored in \p addr. \p flags must be a valid
			
 
				 combination of cl_mem_flags values.
			
 
				 
			
 
				 \fn cl_int starpu_opencl_copy_ram_to_opencl(void *ptr, unsigned src_node, cl_mem buffer, unsigned dst_node, size_t size, size_t offset, cl_event *event, int *ret)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Copy \p size bytes from the given \p ptr on RAM \p src_node to the
			
 
				 given \p buffer on OpenCL \p dst_node. \p offset is the offset, in
			
 
				 bytes, in \p buffer. if \p event is <c>NULL</c>, the copy is
			
@@ -211,7 +211,7 @@ asynchronous launch was successful, or to 0 if \p event was
 
				 <c>NULL</c>.
			
 
				 
			
 
				 \fn cl_int starpu_opencl_copy_opencl_to_ram(cl_mem buffer, unsigned src_node, void *ptr, unsigned dst_node, size_t size, size_t offset, cl_event *event, int *ret)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Copy \p size bytes asynchronously from the given \p buffer on OpenCL
			
 
				 \p src_node to the given \p ptr on RAM \p dst_node. \p offset is the
			
 
				 offset, in bytes, in \p buffer. if \p event is <c>NULL</c>, the copy
			
@@ -224,7 +224,7 @@ asynchronous launch was successful, or to 0 if \p event was
 
				 <c>NULL</c>.
			
 
				 
			
 
				 \fn cl_int starpu_opencl_copy_opencl_to_opencl(cl_mem src, unsigned src_node, size_t src_offset, cl_mem dst, unsigned dst_node, size_t dst_offset, size_t size, cl_event *event, int *ret)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Copy \p size bytes asynchronously from byte offset \p src_offset of \p
			
 
				 src on OpenCL \p src_node to byte offset \p dst_offset of \p dst on
			
 
				 OpenCL \p dst_node. if \p event is <c>NULL</c>, the copy is
			
@@ -237,7 +237,7 @@ asynchronous launch was successful, or to 0 if \p event was
 
				 <c>NULL</c>.
			
 
				 
			
 
				 \fn cl_int starpu_opencl_copy_async_sync(uintptr_t src, size_t src_offset, unsigned src_node, uintptr_t dst, size_t dst_offset, unsigned dst_node, size_t size, cl_event *event)
			
 
				-\ingroup OpenCL_Extensions
			
 
				+\ingroup API_OpenCL_Extensions
			
 
				 Copy \p size bytes from byte offset \p src_offset of \p src on \p
			
 
				 src_node to byte offset \p dst_offset of \p dst on \p dst_node. if \p
			
 
				 event is <c>NULL</c>, the copy is synchronous, i.e. the queue is
			
--- a/doc/doxygen/chapters/api/parallel_tasks.doxy
+++ b/doc/doxygen/chapters/api/parallel_tasks.doxy
@@ -6,44 +6,44 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Parallel_Tasks Parallel Tasks
			
 
				+/*! \defgroup API_Parallel_Tasks Parallel Tasks
			
 
				 
			
 
				 \fn int starpu_combined_worker_get_size(void)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Return the size of the current combined worker, i.e. the total number
			
 
				 of cpus running the same task in the case of ::STARPU_SPMD parallel
			
 
				 tasks, or the total number of threads that the task is allowed to
			
 
				 start in the case of ::STARPU_FORKJOIN parallel tasks.
			
 
				 
			
 
				 \fn int starpu_combined_worker_get_rank(void)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Return the rank of the current thread within the combined worker. Can
			
 
				 only be used in ::STARPU_FORKJOIN parallel tasks, to know which part
			
 
				 of the task to work on.
			
 
				 
			
 
				 \fn unsigned starpu_combined_worker_get_count(void)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Return the number of different combined workers.
			
 
				 
			
 
				 \fn int starpu_combined_worker_get_id(void)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Return the identifier of the current combined worker.
			
 
				 
			
 
				 \fn int starpu_combined_worker_assign_workerid(int nworkers, int workerid_array[])
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Register a new combined worker and get its identifier
			
 
				 
			
 
				 \fn int starpu_combined_worker_get_description(int workerid, int *worker_size, int **combined_workerid)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Get the description of a combined worker
			
 
				 
			
 
				 \fn int starpu_combined_worker_can_execute_task(unsigned workerid, struct starpu_task *task, unsigned nimpl)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Variant of starpu_worker_can_execute_task() compatible with combined
			
 
				 workers
			
 
				 
			
 
				 \fn void starpu_parallel_task_barrier_init(struct starpu_task*task, int workerid)
			
 
				-\ingroup Parallel_Tasks
			
 
				+\ingroup API_Parallel_Tasks
			
 
				 Initialise the barrier for the parallel task, and dispatch the task
			
 
				 between the different combined workers.
			
 
				 
			
--- a/doc/doxygen/chapters/api/performance_model.doxy
+++ b/doc/doxygen/chapters/api/performance_model.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Performance_Model Performance Model
			
 
				+/*! \defgroup API_Performance_Model Performance Model
			
 
				 
			
 
				 \enum starpu_perfmodel_archtype
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \brief Enumerates the various types of architectures.
			
 
				 
			
 
				 it is possible that we have multiple versions of the same kind of
			
@@ -32,38 +32,38 @@ STARPU_MAXCUDADEVS - 1 (GPU number STARPU_MAXCUDADEVS - 1).
 
				 STARPU_MAXOPENCLDEVS - 1).
			
 
				 </ul>
			
 
				 \var starpu_perfmodel_archtype::STARPU_CPU_DEFAULT
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 CPU combined workers between 0 and STARPU_MAXCPUS-1
			
 
				 \var starpu_perfmodel_archtype::STARPU_CUDA_DEFAULT
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 CUDA workers
			
 
				 \var starpu_perfmodel_archtype::STARPU_OPENCL_DEFAULT
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 OpenCL workers
			
 
				 \var starpu_perfmodel_archtype::STARPU_MIC_DEFAULT
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 MIC workers
			
 
				 \var starpu_perfmodel_archtype::STARPU_SCC_DEFAULT
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 SCC workers
			
 
				 
			
 
				 \enum starpu_perfmodel_type
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \brief TODO
			
 
				 \var starpu_perfmodel_type::STARPU_PER_ARCH
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Application-provided per-arch cost model function
			
 
				 \var starpu_perfmodel_type::STARPU_COMMON
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Application-provided common cost model function, with per-arch factor
			
 
				 \var starpu_perfmodel_type::STARPU_HISTORY_BASED
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Automatic history-based cost model
			
 
				 \var starpu_perfmodel_type::STARPU_REGRESSION_BASED
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Automatic linear regression-based cost model  (alpha * size ^ beta)
			
 
				 \var starpu_perfmodel_type::STARPU_NL_REGRESSION_BASED
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Automatic non-linear regression-based cost model (a * size ^ b + c)
			
 
				 
			
 
				 \struct starpu_perfmodel
			
@@ -73,7 +73,7 @@ model for a codelet. For compatibility, make sure to initialize the
 
				 whole structure to zero, either by using explicit memset, or by
			
 
				 letting the compiler implicitly do it in e.g. static storage case. If
			
 
				 not provided, other fields have to be zero.
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \var starpu_perfmodel::type
			
 
				 is the type of performance model
			
 
				 <ul>
			
@@ -120,7 +120,7 @@ the values (W), and making a performance estimation (R).
 
				 
			
 
				 \struct starpu_perfmodel_regression_model
			
 
				 \brief ...
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \var starpu_perfmodel_regression_model::sumlny
			
 
				 sum of ln(measured)
			
 
				 \var starpu_perfmodel_regression_model::sumlnx
			
@@ -153,7 +153,7 @@ number of sample values for non-linear regression
 
				 \struct starpu_perfmodel_per_arch
			
 
				 \brief contains information about the performance model of a given
			
 
				 arch.
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \var starpu_perfmodel_per_arch::cost_model
			
 
				 \deprecated
			
 
				 This field is deprecated. Use instead the field
			
@@ -181,7 +181,7 @@ regression.
 
				 
			
 
				 \struct starpu_perfmodel_history_list
			
 
				 \brief todo
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \var starpu_perfmodel_history_list::next
			
 
				 todo
			
 
				 \var starpu_perfmodel_history_list::entry
			
@@ -189,7 +189,7 @@ todo
 
				 
			
 
				 \struct starpu_perfmodel_history_entry
			
 
				 \brief todo
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 \var starpu_perfmodel_history_entry::mean
			
 
				 mean_n = 1/n sum
			
 
				 \var starpu_perfmodel_history_entry::deviation
			
@@ -208,51 +208,51 @@ in bytes
 
				 Provided by the application
			
 
				 
			
 
				 \fn int starpu_perfmodel_load_symbol(const char *symbol, struct starpu_perfmodel *model)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 loads a given performance model. The model structure has to be
			
 
				 completely zero, and will be filled with the information saved in
			
 
				 <c>$STARPU_HOME/.starpu</c>. The function is intended to be used by
			
 
				 external tools that should read the performance model files.
			
 
				 
			
 
				 \fn int starpu_perfmodel_unload_model(struct starpu_perfmodel *model)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 unloads the given model which has been previously loaded
			
 
				 through the function starpu_perfmodel_load_symbol()
			
 
				 
			
 
				 \fn void starpu_perfmodel_debugfilepath(struct starpu_perfmodel *model, enum starpu_perfmodel_archtype arch, char *path, size_t maxlen, unsigned nimpl)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 returns the path to the debugging information for the performance model.
			
 
				 
			
 
				 \fn void starpu_perfmodel_get_arch_name(enum starpu_perfmodel_archtype arch, char *archname, size_t maxlen, unsigned nimpl)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 returns the architecture name for \p arch
			
 
				 
			
 
				 \fn enum starpu_perfmodel_archtype starpu_worker_get_perf_archtype(int workerid)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 returns the architecture type of a given worker.
			
 
				 
			
 
				 \fn int starpu_perfmodel_list(FILE *output)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 prints a list of all performance models on \p output
			
 
				 
			
 
				 \fn void starpu_perfmodel_print(struct starpu_perfmodel *model, enum starpu_perfmodel_archtype arch, unsigned nimpl, char *parameter, uint32_t *footprint, FILE *output)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 todo
			
 
				 
			
 
				 \fn int starpu_perfmodel_print_all(struct starpu_perfmodel *model, char *arch, char *parameter, uint32_t *footprint, FILE *output)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 todo
			
 
				 
			
 
				 \fn void starpu_bus_print_bandwidth(FILE *f)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 prints a matrix of bus bandwidths on \p f.
			
 
				 
			
 
				 \fn void starpu_bus_print_affinity(FILE *f)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 prints the affinity devices on \p f.
			
 
				 
			
 
				 \fn void starpu_perfmodel_update_history(struct starpu_perfmodel *model, struct starpu_task *task, enum starpu_perfmodel_archtype arch, unsigned cpuid, unsigned nimpl, double measured);
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 This feeds the performance model model with an explicit
			
 
				 measurement measured, in addition to measurements done by StarPU
			
 
				 itself. This can be useful when the application already has an
			
@@ -261,11 +261,11 @@ could benefit from instead of doing on-line measurements. And example
 
				 of use can be see in \ref Performance_model_example.
			
 
				 
			
 
				 \fn double starpu_get_bandwidth_RAM_CUDA(unsigned cudadev)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Used to compute the velocity of resources
			
 
				 
			
 
				 \fn double starpu_get_latency_RAM_CUDA(unsigned cudadev)
			
 
				-\ingroup Performance_Model
			
 
				+\ingroup API_Performance_Model
			
 
				 Used to compute the velocity of resources
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/profiling.doxy
+++ b/doc/doxygen/chapters/api/profiling.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Profiling Profiling
			
 
				+/*! \defgroup API_Profiling Profiling
			
 
				 
			
 
				 \struct starpu_profiling_task_info
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 \brief This structure contains information about the execution of a
			
 
				 task. It is accessible from the field starpu_task::profiling_info if
			
 
				 profiling was enabled.
			
@@ -68,7 +68,7 @@ Time when the worker finished releasing data.
 
				 \brief This structure contains the profiling information associated to
			
 
				 a worker. The timing is provided since the previous call to
			
 
				 starpu_profiling_worker_get_info()
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 \var starpu_profiling_worker_info::start_time
			
 
				         Starting date for the reported profiling measurements.
			
 
				 \var starpu_profiling_worker_info::total_time
			
@@ -88,7 +88,7 @@ starpu_profiling_worker_get_info()
 
				 
			
 
				 \struct starpu_profiling_bus_info
			
 
				 \brief todo
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 \var starpu_profiling_bus_info::start_time
			
 
				         Time of bus profiling startup.
			
 
				 \var starpu_profiling_bus_info::total_time
			
@@ -99,7 +99,7 @@ starpu_profiling_worker_get_info()
 
				         Number of transfers during profiling.
			
 
				 
			
 
				 \fn int starpu_profiling_status_set(int status)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 This function sets the profiling status. Profiling is activated
			
 
				 by passing STARPU_PROFILING_ENABLE in status. Passing
			
 
				 STARPU_PROFILING_DISABLE disables profiling. Calling this function
			
@@ -110,17 +110,17 @@ of the task. Negative return values indicate an error, otherwise the
 
				 previous status is returned.
			
 
				 
			
 
				 \fn int starpu_profiling_status_get(void)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Return the current profiling status or a negative value in case
			
 
				 there was an error.
			
 
				 
			
 
				 \fn void starpu_profiling_set_id(int new_id)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 This function sets the ID used for profiling trace filename. It
			
 
				 needs to be called before starpu_init().
			
 
				 
			
 
				 \fn int starpu_profiling_worker_get_info(int workerid, struct starpu_profiling_worker_info *worker_info)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Get the profiling info associated to the worker identified by
			
 
				 \p workerid, and reset the profiling measurements. If the argument \p
			
 
				 worker_info is NULL, only reset the counters associated to worker
			
@@ -128,47 +128,47 @@ worker_info is NULL, only reset the counters associated to worker
 
				 Otherwise, a negative value is returned.
			
 
				 
			
 
				 \fn int starpu_bus_get_profiling_info(int busid, struct starpu_profiling_bus_info *bus_info)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 todo
			
 
				 
			
 
				 \fn int starpu_bus_get_count(void)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Return the number of buses in the machine
			
 
				 
			
 
				 \fn int starpu_bus_get_id(int src, int dst)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Return the identifier of the bus between \p src and \p dst
			
 
				 
			
 
				 \fn int starpu_bus_get_src(int busid)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Return the source point of bus \p busid
			
 
				 
			
 
				 \fn int starpu_bus_get_dst(int busid)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Return the destination point of bus \p busid
			
 
				 
			
 
				 \fn double starpu_timing_timespec_delay_us(struct timespec *start, struct timespec *end)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Returns the time elapsed between \p start and \p end in microseconds.
			
 
				 
			
 
				 \fn double starpu_timing_timespec_to_us(struct timespec *ts)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Converts the given timespec \p ts into microseconds
			
 
				 
			
 
				 \fn void starpu_profiling_bus_helper_display_summary(void)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Displays statistics about the bus on stderr. if the environment
			
 
				 variable STARPU_BUS_STATS is defined. The function is called
			
 
				 automatically by starpu_shutdown().
			
 
				 
			
 
				 \fn void starpu_profiling_worker_helper_display_summary(void)
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Displays statistics about the workers on stderr if the
			
 
				 environment variable STARPU_WORKER_STATS is defined. The function is
			
 
				 called automatically by starpu_shutdown().
			
 
				 
			
 
				 \fn void starpu_data_display_memory_stats()
			
 
				-\ingroup Profiling
			
 
				+\ingroup API_Profiling
			
 
				 Display statistics about the current data handles registered
			
 
				 within StarPU. StarPU must have been configured with the option
			
 
				 <c>--enable-memory-stats</c> (see \ref Memory_feedback).
			
--- a/doc/doxygen/chapters/api/running_driver.doxy
+++ b/doc/doxygen/chapters/api/running_driver.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Running_Drivers Running Drivers
			
 
				+/*! \defgroup API_Running_Drivers Running Drivers
			
 
				 
			
 
				 \fn int starpu_driver_run(struct starpu_driver *d)
			
 
				-\ingroup Running_Drivers
			
 
				+\ingroup API_Running_Drivers
			
 
				 Initialize the given driver, run it until it receives a request to
			
 
				 terminate, deinitialize it and return 0 on success. It returns
			
 
				 <c>-EINVAL</c> if <c>d->type</c> is not a valid StarPU device type
			
@@ -19,21 +19,21 @@ starpu_driver_init(), then calling starpu_driver_run_once() in a loop,
 
				 and eventually starpu_driver_deinit().
			
 
				 
			
 
				 \fn int starpu_driver_init(struct starpu_driver *d)
			
 
				-\ingroup Running_Drivers
			
 
				+\ingroup API_Running_Drivers
			
 
				 Initialize the given driver. Returns 0 on success, <c>-EINVAL</c> if
			
 
				 <c>d->type</c> is not a valid ::starpu_worker_archtype.
			
 
				 
			
 
				 \fn int starpu_driver_run_once(struct starpu_driver *d)
			
 
				-\ingroup Running_Drivers
			
 
				+\ingroup API_Running_Drivers
			
 
				 Run the driver once, then returns 0 on success, <c>-EINVAL</c> if <c>d->type</c> is not a valid ::starpu_worker_archtype.
			
 
				 
			
 
				 \fn int starpu_driver_deinit(struct starpu_driver *d)
			
 
				-\ingroup Running Drivers
			
 
				+\ingroup API_Running_Drivers
			
 
				 Deinitialize the given driver. Returns 0 on success, <c>-EINVAL</c> if
			
 
				 <c>d->type</c> is not a valid ::starpu_worker_archtype.
			
 
				 
			
 
				 \fn void starpu_drivers_request_termination(void)
			
 
				-\ingroup Running Drivers
			
 
				+\ingroup API_Running_Drivers
			
 
				 Notify all running drivers they should terminate.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/scheduling_context_hypervisor.doxy
+++ b/doc/doxygen/chapters/api/scheduling_context_hypervisor.doxy
@@ -6,10 +6,10 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Scheduling_Context_Hypervisor Scheduling Context Hypervisor
			
 
				+/*! \defgroup API_Scheduling_Context_Hypervisor Scheduling Context Hypervisor
			
 
				 
			
 
				 \struct sc_hypervisor_policy
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 \brief This structure contains all the methods that implement a hypervisor resizing policy.
			
 
				 \var sc_hypervisor_policy::name
			
 
				         Indicates the name of the policy, if there is not a custom policy, the policy corresponding to this name will be used by the hypervisor
			
@@ -27,7 +27,7 @@
 
				         It is called whenever a tag task has just been executed. The table of resize requests is provided as well as the tag
			
 
				 
			
 
				 \struct sc_hypervisor_policy_config
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 \brief This structure contains all configuration information of a
			
 
				 context. It contains configuration information for each context, which
			
 
				 can be used to construct new resize strategies.
			
@@ -47,7 +47,7 @@ can be used to construct new resize strategies.
 
				         Indicates the maximum idle time accepted before a resize is triggered for the workers that just arrived in the new context
			
 
				 
			
 
				 \struct sc_hypervisor_wrapper
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 \brief This structure is a wrapper of the contexts available in StarPU
			
 
				 and contains all information about a context obtained by incrementing
			
 
				 the performance counters.
			
@@ -75,7 +75,7 @@ the performance counters.
 
				         The structure confirming the last resize finished and a new one can be done
			
 
				 
			
 
				 \struct sc_hypervisor_resize_ack
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 \brief This structures checks if the workers moved to another context
			
 
				 are actually taken into account in that context.
			
 
				 \var sc_hypervisor_resize_ack::receiver_sched_ctx
			
@@ -90,7 +90,7 @@ are actually taken into account in that context.
 
				 
			
 
				 \struct sc_hypervisor_policy_task_pool
			
 
				 \brief task wrapper linked list
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 \var sc_hypervisor_policy_task_pool::cl
			
 
				 Which codelet has been executed
			
 
				 \var sc_hypervisor_policy_task_pool::footprint
			
@@ -103,7 +103,7 @@ Number of tasks of this kind
 
				 Other task kinds
			
 
				 
			
 
				 @name Managing the hypervisor
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 
			
 
				 There is a single hypervisor that is in charge of resizing contexts
			
 
				 and the resizing strategy is chosen at the initialization of the
			
@@ -115,7 +115,7 @@ the hypervisor in the resizing decision making process. TODO maybe
 
				 they should be hidden to the user
			
 
				 
			
 
				 \fn struct starpu_sched_ctx_performance_counters *sc_hypervisor_init(struct sc_hypervisor_policy * policy)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Initializes the hypervisor to use the strategy provided as parameter
			
 
				 and creates the performance counters (see \ref Performance_Counters).
			
 
				 These performance counters represent actually some callbacks that will
			
@@ -127,14 +127,14 @@ certain conditions trigger the resizing process (there is no
 
				 additional thread assigned to the hypervisor).
			
 
				 
			
 
				 \fn void sc_hypervisor_shutdown(void)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 The hypervisor and all information concerning it is cleaned. There is
			
 
				 no synchronization between this function and starpu_shutdown(). Thus,
			
 
				 this should be called after starpu_shutdown(), because the performance
			
 
				 counters will still need allocated callback functions.
			
 
				 
			
 
				 @name Registering Scheduling Contexts to the hypervisor
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 
			
 
				 Scheduling Contexts that have to be resized by the hypervisor must be
			
 
				 first registered to the hypervisor. Whenever we want to exclude
			
@@ -142,39 +142,39 @@ contexts from the resizing process we have to unregister them from the
 
				 hypervisor.
			
 
				 
			
 
				 \fn void sc_hypervisor_register_ctx(unsigned sched_ctx, double total_flops)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Register the context to the hypervisor, and indicate the number of
			
 
				 flops the context will execute (needed for Gflops rate based strategy
			
 
				 see \ref Resizing_strategies or any other custom strategy needing it, for
			
 
				 the others we can pass 0.0)
			
 
				 
			
 
				 \fn void sc_hypervisor_unregister_ctx (unsigned sched_ctx)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Unregister the context from the hypervisor.
			
 
				 
			
 
				 @name The user’s input in the resizing process
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 
			
 
				 The user can totally forbid the resizing of a certain context or can
			
 
				 then change his mind and allow it (in this case the resizing is
			
 
				 managed by the hypervisor, that can forbid it or allow it)
			
 
				 
			
 
				 \fn void sc_hypervisor_stop_resize(unsigned sched_ctx)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Forbid resizing of a context
			
 
				 
			
 
				 \fn void sc_hypervisor_start_resize(unsigned sched_ctx)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Allow resizing of a context. The user can then provide information to
			
 
				 the hypervisor concerning the conditions of resizing.
			
 
				 
			
 
				 \fn void sc_hypervisor_ioctl(unsigned sched_ctx, ...)
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 Inputs conditions to the context sched_ctx with the following
			
 
				 arguments. The argument list must be zero-terminated.
			
 
				 
			
 
				 \def HYPERVISOR_MAX_IDLE
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 3 arguments: an array of int for the workerids to apply
			
 
				 the condition, an int to indicate the size of the array, and a double
			
@@ -182,7 +182,7 @@ value indicating the maximum idle time allowed for a worker before the
 
				 resizing process should be triggered
			
 
				 
			
 
				 \def HYPERVISOR_PRIORITY
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 3 arguments: an array of int for the workerids to apply
			
 
				 the condition, an int to indicate the size of the array, and an int
			
@@ -190,20 +190,20 @@ value indicating the priority of the workers previously mentioned. The
 
				 workers with the smallest priority are moved the first.
			
 
				 
			
 
				 \def HYPERVISOR_MIN_WORKERS
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument(int) indicating the minimum number of workers a
			
 
				 context should have, underneath this limit the context cannot execute.
			
 
				 
			
 
				 \def HYPERVISOR_MAX_WORKERS
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument(int) indicating the maximum number of workers a
			
 
				 context should have, above this limit the context would not be able to
			
 
				 scale
			
 
				 
			
 
				 \def HYPERVISOR_GRANULARITY
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument(int) indicating the granularity of the resizing
			
 
				 process (the number of workers should be moved from the context once
			
@@ -212,14 +212,14 @@ strategy see Resizing strategies, the number of workers that have to
 
				 be moved is calculated by the strategy.
			
 
				 
			
 
				 \def HYPERVISOR_FIXED_WORKERS
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 2 arguments: an array of int for the workerids to apply
			
 
				 the condition and an int to indicate the size of the array. These
			
 
				 workers are not allowed to be moved from the context.
			
 
				 
			
 
				 \def HYPERVISOR_MIN_TASKS
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument (int) that indicated the minimum number of
			
 
				 tasks that have to be executed before the context could be resized.
			
@@ -228,20 +228,20 @@ Resizing strategies where the user indicates exactly when the resize
 
				 should be done.
			
 
				 
			
 
				 \def HYPERVISOR_NEW_WORKERS_MAX_IDLE
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument, a double value indicating the maximum idle
			
 
				 time allowed for workers that have just been moved from other contexts
			
 
				 in the current context.
			
 
				 
			
 
				 \def HYPERVISOR_TIME_TO_APPLY
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 This macro is used when calling sc_hypervisor_ioctl() and must be
			
 
				 followed by 1 argument (int) indicating the tag an executed task
			
 
				 should have such that this configuration should be taken into account.
			
 
				 
			
 
				 @name Defining a new hypervisor policy
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				 
			
 
				 While Scheduling Context Hypervisor Plugin comes with a variety of
			
 
				 resizing policies (see \ref Resizing_strategies), it may sometimes be
			
@@ -264,31 +264,31 @@ struct sc_hypervisor_policy dummy_policy =
 
				 \endcode
			
 
				 
			
 
				 \fn void sc_hypervisor_move_workers(unsigned sender_sched_ctx, unsigned receiver_sched_ctx, int *workers_to_move, unsigned nworkers_to_move, unsigned now);
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Moves workers from one context to another
			
 
				 
			
 
				 \fn struct sc_hypervisor_policy_config * sc_hypervisor_get_config(unsigned sched_ctx);
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Returns the configuration structure of a context
			
 
				 
			
 
				 \fn int * sc_hypervisor_get_sched_ctxs();
			
 
				-\ingroup Scheduling_Contex_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Gets the contexts managed by the hypervisor
			
 
				 
			
 
				 \fn int sc_hypervisor_get_nsched_ctxs();
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Gets the number of contexts managed by the hypervisor
			
 
				 
			
 
				 \fn struct sc_hypervisor_wrapper * sc_hypervisor_get_wrapper(unsigned sched_ctx);
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Returns the wrapper corresponding the context \p sched_ctx
			
 
				 
			
 
				 \fn double sc_hypervisor_get_elapsed_flops_per_sched_ctx(struct sc_hypervisor_wrapper * sc_w);
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Returns the flops of a context elapsed from the last resize
			
 
				 
			
 
				 \fn char * sc_hypervisor_get_policy();
			
 
				-\ingroup Scheduling_Context_Hypervisor
			
 
				+\ingroup API_Scheduling_Context_Hypervisor
			
 
				     Returns the name of the resizing policy the hypervisor uses
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/scheduling_contexts.doxy
+++ b/doc/doxygen/chapters/api/scheduling_contexts.doxy
@@ -6,7 +6,7 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Scheduling_Contexts Scheduling Contexts
			
 
				+/*! \defgroup API_Scheduling_Contexts Scheduling Contexts
			
 
				 
			
 
				 \brief StarPU permits on one hand grouping workers in combined workers
			
 
				 in order to execute a parallel task and on the other hand grouping
			
@@ -17,16 +17,16 @@ the context. Scheduling contexts can be created, deleted and modified
 
				 dynamically.
			
 
				 
			
 
				 \enum starpu_worker_collection_type
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 types of structures the worker collection can implement
			
 
				 \var starpu_worker_collection_type::STARPU_WORKER_LIST
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 List of workers
			
 
				 
			
 
				 \struct starpu_sched_ctx_performance_counters
			
 
				 \brief Performance counters used by the starpu to indicate the
			
 
				 hypervisor how the application and the resources are executing.
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 \var starpu_sched_ctx_performance_counters::notify_idle_cycle
			
 
				         Informs the hypervisor for how long a worker has been idle in the specified context
			
 
				 \var starpu_sched_ctx_performance_counters::notify_idle_end
			
@@ -39,10 +39,10 @@ hypervisor how the application and the resources are executing.
 
				         Notifies the hypervisor a task has just been executed
			
 
				 
			
 
				 @name Scheduling Contexts Basic API
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_create(const char *policy_name, int *workerids_ctx, int nworkers_ctx, const char *sched_ctx_name)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 This function creates a scheduling context which uses the scheduling
			
 
				 policy \p policy_name and assigns the workers in \p workerids_ctx to
			
 
				 execute the tasks submitted to it.
			
@@ -52,102 +52,102 @@ tasks will be submitted to. The return value should be at most
 
				 STARPU_NMAX_SCHED_CTXS.
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_create_inside_interval(const char *policy_name, const char *sched_name, int min_ncpus, int max_ncpus, int min_ngpus, int max_ngpus, unsigned allow_overlap)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Create a context indicating an approximate interval of resources
			
 
				 
			
 
				 \fn void starpu_sched_ctx_delete(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Delete scheduling context \p sched_ctx_id and transfer remaining
			
 
				 workers to the inheritor scheduling context.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_add_workers(int *workerids_ctx, int nworkers_ctx, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 This function adds dynamically the workers in \p workerids_ctx to the
			
 
				 context \p sched_ctx_id. The last argument cannot be greater than
			
 
				 STARPU_NMAX_SCHED_CTXS.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_remove_workers(int *workerids_ctx, int nworkers_ctx, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 This function removes the workers in \p workerids_ctx from the context
			
 
				 \p sched_ctx_id. The last argument cannot be greater than
			
 
				 STARPU_NMAX_SCHED_CTXS.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_set_inheritor(unsigned sched_ctx_id, unsigned inheritor)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Indicate which context whill inherit the resources of this context
			
 
				 when he will be deleted.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_set_context(unsigned *sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Set the scheduling context the subsequent tasks will be submitted to
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_get_context(void)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Return the scheduling context the tasks are currently submitted to
			
 
				 
			
 
				 \fn void starpu_sched_ctx_stop_task_submission(void)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Stop submitting tasks from the empty context list until the next time
			
 
				 the context has time to check the empty context list
			
 
				 
			
 
				 \fn void starpu_sched_ctx_finished_submit(unsigned sched_ctx_id);
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Indicate starpu that the application finished submitting to this
			
 
				 context in order to move the workers to the inheritor as soon as
			
 
				 possible.
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_get_nworkers(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Return the number of workers managed by the specified contexts
			
 
				 (Usually needed to verify if it manages any workers or if it should be
			
 
				 blocked)
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_get_nshared_workers(unsigned sched_ctx_id, unsigned sched_ctx_id2)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				     Return the number of workers shared by two contexts.
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_contains_worker(int workerid, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Return 1 if the worker belongs to the context and 0 otherwise
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_overlapping_ctxs_on_worker(int workerid)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Check if a worker is shared between several contexts
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_is_ctxs_turn(int workerid, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Manage sharing of resources between contexts: checkOB which ctx has
			
 
				 its turn to pop.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_set_turn_to_other_ctx(int workerid, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Manage sharing of resources between contexts: by default a round_robin
			
 
				 strategy is executed but the user can interfere to tell which ctx has
			
 
				 its turn to pop.
			
 
				 
			
 
				 \fn double starpu_sched_ctx_get_max_time_worker_on_ctx(void)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Time sharing a resources, indicate how long a worker has been active
			
 
				 in the current sched_ctx.
			
 
				 
			
 
				 @name Scheduling Context Priorities
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 
			
 
				 \def STARPU_MIN_PRIO
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Provided for legacy reasons.
			
 
				 
			
 
				 \def STARPU_MAX_PRIO
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Provided for legacy reasons.
			
 
				 
			
 
				 \def STARPU_DEFAULT_PRIO
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 By convention, the default priority level should be 0 so that we can
			
 
				 statically allocate tasks with a default priority.
			
 
				 
			
 
				 \fn int starpu_sched_ctx_set_min_priority(unsigned sched_ctx_id, int min_prio)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Defines the minimum task priority level supported by the scheduling
			
 
				 policy of the given scheduler context. The default minimum priority
			
 
				 level is the same as the default priority level which is 0 by
			
@@ -157,7 +157,7 @@ be called from the initialization method of the scheduling policy, and
 
				 should not be used directly from the application.
			
 
				 
			
 
				 \fn int starpu_sched_ctx_set_max_priority(unsigned sched_ctx_id, int max_prio)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Defines the maximum priority level supported by the scheduling policy
			
 
				 of the given scheduler context. The default maximum priority level is
			
 
				 1. The application may access that value by calling the
			
@@ -166,26 +166,26 @@ be called from the initialization method of the scheduling policy, and
 
				 should not be used directly from the application.
			
 
				 
			
 
				 \fn int starpu_sched_ctx_get_min_priority(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Returns the current minimum priority level supported by the scheduling
			
 
				 policy of the given scheduler context.
			
 
				 
			
 
				 \fn int starpu_sched_ctx_get_max_priority(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Returns the current maximum priority level supported by the scheduling
			
 
				 policy of the given scheduler context.
			
 
				 
			
 
				 @name Scheduling Context Worker Collection
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 
			
 
				 \struct starpu_sched_ctx_iterator
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 \brief todo
			
 
				 \var starpu_sched_ctx_iterator::cursor
			
 
				 todo
			
 
				 
			
 
				 \struct starpu_worker_collection
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 \brief A scheduling context manages a collection of workers that can
			
 
				 be memorized using different data structures. Thus, a generic
			
 
				 structure is available in order to simplify the choice of its type.
			
@@ -213,36 +213,36 @@ structures(like tree) implementations are foreseen.
 
				         Initialize the cursor if there is one
			
 
				 
			
 
				 \fn struct starpu_worker_collection* starpu_sched_ctx_create_worker_collection(unsigned sched_ctx_id, enum starpu_worker_collection_type type)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Create a worker collection of the type indicated by the last parameter
			
 
				 for the context specified through the first parameter.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_delete_worker_collection(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Delete the worker collection of the specified scheduling context
			
 
				 
			
 
				 \fn struct starpu_worker_collection* starpu_sched_ctx_get_worker_collection(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Return the worker collection managed by the indicated context
			
 
				 
			
 
				 @name Scheduling Context Link with Hypervisor
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 
			
 
				 \fn void starpu_sched_ctx_set_perf_counters(unsigned sched_ctx_id, struct starpu_sched_ctx_performance_counters *perf_counters)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Indicates to starpu the pointer to the performance counter
			
 
				 
			
 
				 \fn void starpu_sched_ctx_call_pushed_task_cb(int workerid, unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Callback that lets the scheduling policy tell the hypervisor that a
			
 
				 task was pushed on a worker
			
 
				 
			
 
				 \fn void starpu_sched_ctx_notify_hypervisor_exists(void)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Allow the hypervisor to let starpu know he's initialised
			
 
				 
			
 
				 \fn unsigned starpu_sched_ctx_check_if_hypervisor_exists(void)
			
 
				-\ingroup Scheduling_Contexts
			
 
				+\ingroup API_Scheduling_Contexts
			
 
				 Ask starpu if he is informed if the hypervisor is initialised
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/scheduling_policy.doxy
+++ b/doc/doxygen/chapters/api/scheduling_policy.doxy
@@ -6,7 +6,7 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Scheduling_Policy Scheduling Policy
			
 
				+/*! \defgroup API_Scheduling_Policy Scheduling Policy
			
 
				 
			
 
				 \brief TODO. While StarPU comes with a variety of scheduling policies
			
 
				 (see \ref Task_scheduling_policy), it may sometimes be desirable to
			
@@ -14,7 +14,7 @@ implement custom policies to address specific problems. The API
 
				 described below allows users to write their own scheduling policy.
			
 
				 
			
 
				 \struct starpu_sched_policy
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 \brief This structure contains all the methods that implement a
			
 
				 scheduling policy. An application may specify which scheduling
			
 
				 strategy in the field starpu_conf::sched_policy passed to the function
			
@@ -59,12 +59,12 @@ starpu_init().
 
				         Optional field. Human readable description of the policy.
			
 
				 
			
 
				 \fn struct starpu_sched_policy ** starpu_sched_get_predefined_policies()
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Return an NULL-terminated array of all the predefined scheduling
			
 
				 policies.
			
 
				 
			
 
				 \fn void starpu_worker_get_sched_condition(int workerid, starpu_pthread_mutex_t **sched_mutex, starpu_pthread_cond_t **sched_cond)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 When there is no available task for a worker, StarPU blocks this
			
 
				 worker on a condition variable. This function specifies which
			
 
				 condition variable (and the associated mutex) should be used to block
			
@@ -74,17 +74,17 @@ with a single task queue, the same condition variable would be used to
 
				 block and wake up all workers.
			
 
				 
			
 
				 \fn void starpu_sched_ctx_set_policy_data(unsigned sched_ctx_id, void * policy_data)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Each scheduling policy uses some specific data (queues, variables,
			
 
				 additional condition variables). It is memorize through a local
			
 
				 structure. This function assigns it to a scheduling context.
			
 
				 
			
 
				 \fn void* starpu_sched_ctx_get_policy_data(unsigned sched_ctx_id)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns the policy data previously assigned to a context
			
 
				 
			
 
				 \fn int starpu_sched_set_min_priority(int min_prio)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Defines the minimum task priority level supported by the scheduling
			
 
				 policy. The default minimum priority level is the same as the default
			
 
				 priority level which is 0 by convention. The application may access
			
@@ -94,7 +94,7 @@ the scheduling policy, and should not be used directly from the
 
				 application.
			
 
				 
			
 
				 \fn int starpu_sched_set_max_priority(int max_prio)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Defines the maximum priority level supported by the scheduling policy.
			
 
				 The default maximum priority level is 1. The application may access
			
 
				 that value by calling the function starpu_sched_get_max_priority().
			
@@ -103,17 +103,17 @@ the scheduling policy, and should not be used directly from the
 
				 application.
			
 
				 
			
 
				 \fn int starpu_sched_get_min_priority(void)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns the current minimum priority level supported by the scheduling
			
 
				 policy
			
 
				 
			
 
				 \fn int starpu_sched_get_max_priority(void)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns the current maximum priority level supported by the scheduling
			
 
				 policy
			
 
				 
			
 
				 \fn int starpu_push_local_task(int workerid, struct starpu_task *task, int back)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 The scheduling policy may put tasks directly into a worker’s local
			
 
				 queue so that it is not always necessary to create its own queue when
			
 
				 the local queue is sufficient. If \p back is not 0, \p task is put
			
@@ -121,54 +121,54 @@ at the back of the queue where the worker will pop tasks first.
 
				 Setting \p back to 0 therefore ensures a FIFO ordering.
			
 
				 
			
 
				 \fn int starpu_push_task_end(struct starpu_task *task)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 This function must be called by a scheduler to notify that the given
			
 
				 task has just been pushed.
			
 
				 
			
 
				 \fn int starpu_worker_can_execute_task(unsigned workerid, struct starpu_task *task, unsigned nimpl)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Check if the worker specified by workerid can execute the codelet.
			
 
				 Schedulers need to call it before assigning a task to a worker,
			
 
				 otherwise the task may fail to execute.
			
 
				 
			
 
				 \fn double starpu_timing_now(void)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Return the current date in micro-seconds.
			
 
				 
			
 
				 \fn uint32_t starpu_task_footprint(struct starpu_perfmodel *model, struct starpu_task * task, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns the footprint for a given task
			
 
				 
			
 
				 \fn double starpu_task_expected_length(struct starpu_task *task, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns expected task duration in micro-seconds.
			
 
				 
			
 
				 \fn double starpu_worker_get_relative_speedup(enum starpu_perfmodel_archtype perf_archtype)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns an estimated speedup factor relative to CPU speed
			
 
				 
			
 
				 \fn double starpu_task_expected_data_transfer_time(unsigned memory_node, struct starpu_task *task)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns expected data transfer time in micro-seconds.
			
 
				 
			
 
				 \fn double starpu_data_expected_transfer_time(starpu_data_handle_t handle, unsigned memory_node, enum starpu_data_access_mode mode)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Predict the transfer time (in micro-seconds) to move \p handle to a memory node
			
 
				 
			
 
				 \fn double starpu_task_expected_power(struct starpu_task *task, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns expected power consumption in J
			
 
				 
			
 
				 \fn double starpu_task_expected_conversion_time(struct starpu_task *task, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Returns expected conversion time in ms (multiformat interface only)
			
 
				 
			
 
				 \fn int starpu_get_prefetch_flag(void)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Whether STARPU_PREFETCH was set
			
 
				 
			
 
				 \fn int starpu_prefetch_task_input_on_node(struct starpu_task *task, unsigned node)
			
 
				-\ingroup Scheduling_Policy
			
 
				+\ingroup API_Scheduling_Policy
			
 
				 Prefetch data for a given task on a given node
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/standard_memory_library.doxy
+++ b/doc/doxygen/chapters/api/standard_memory_library.doxy
@@ -6,14 +6,14 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Standard_Memory_Library Standard Memory Library
			
 
				+/*! \defgroup API_Standard_Memory_Library Standard Memory Library
			
 
				 
			
 
				 \def STARPU_MALLOC_PINNED
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief Value passed to the function starpu_malloc_flags() to indicate the memory allocation should be pinned. 
			
 
				 
			
 
				 \def STARPU_MALLOC_COUNT
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief Value passed to the function starpu_malloc_flags() to indicate
			
 
				 the memory allocation should be in the limit defined by the
			
 
				 environment variables <c>STARPU_LIMIT_CUDA_devid_MEM</c>,
			
@@ -25,19 +25,19 @@ Memory allocated this way needs to be freed by calling the
 
				 starpu_free_flags() function with the same flag. 
			
 
				 
			
 
				 \fn int starpu_malloc_flags(void **A, size_t dim, int flags)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief Performs a memory allocation based on the constraints defined
			
 
				 by the given flag.
			
 
				 
			
 
				 \fn void starpu_malloc_set_align(size_t align)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief This function sets an alignment constraints for starpu_malloc()
			
 
				 allocations. align must be a power of two. This is for instance called
			
 
				 automatically by the OpenCL driver to specify its own alignment
			
 
				 constraints.
			
 
				 
			
 
				 \fn int starpu_malloc(void **A, size_t dim)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief This function allocates data of the given size in main memory.
			
 
				 It will also try to pin it in CUDA or OpenCL, so that data transfers
			
 
				 from this buffer can be asynchronous, and thus permit data transfer
			
@@ -45,18 +45,18 @@ and computation overlapping. The allocated buffer must be freed thanks
 
				 to the starpu_free() function.
			
 
				 
			
 
				 \fn int starpu_free(void *A)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief This function frees memory which has previously been allocated
			
 
				 with starpu_malloc().
			
 
				 
			
 
				 \fn int starpu_free_flags(void *A, size_t dim, int flags)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief This function frees memory by specifying its size. The given
			
 
				 flags should be consistent with the ones given to starpu_malloc_flags()
			
 
				 when allocating the memory.
			
 
				 
			
 
				 \fn ssize_t starpu_memory_get_available(unsigned node)
			
 
				-\ingroup Standard_Memory_Library
			
 
				+\ingroup API_Standard_Memory_Library
			
 
				 \brief If a memory limit is defined on the given node (see Section \ref
			
 
				 How_to_limit_memory_per_node), return the amount of available memory
			
 
				 on the node. Otherwise return -1.
			
--- a/doc/doxygen/chapters/api/task_bundles.doxy
+++ b/doc/doxygen/chapters/api/task_bundles.doxy
@@ -6,22 +6,22 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Task_Bundles Task Bundles
			
 
				+/*! \defgroup API_Task_Bundles Task Bundles
			
 
				 
			
 
				 \typedef starpu_task_bundle_t
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Opaque structure describing a list of tasks that should be scheduled
			
 
				 on the same worker whenever it’s possible. It must be considered as a
			
 
				 hint given to the scheduler as there is no guarantee that they will be
			
 
				 executed on the same worker.
			
 
				 
			
 
				 \fn void starpu_task_bundle_create (starpu_task_bundle_t *bundle)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Factory function creating and initializing \p bundle, when the call
			
 
				 returns, memory needed is allocated and \p bundle is ready to use.
			
 
				 
			
 
				 \fn int starpu_task_bundle_insert (starpu_task_bundle_t bundle, struct starpu_task *task)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Insert \p task in \p bundle. Until \p task is removed from \p bundle
			
 
				 its expected length and data transfer time will be considered along
			
 
				 those of the other tasks of bundle. This function must not be called
			
@@ -31,7 +31,7 @@ is already closed it returns <c>-EPERM</c>, if \p task was already
 
				 submitted it returns <c>-EINVAL</c>.
			
 
				 
			
 
				 \fn int starpu_task_bundle_remove (starpu_task_bundle_t bundle, struct starpu_task *task)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Remove \p task from \p bundle. Of course \p task must have been
			
 
				 previously inserted in \p bundle. This function must not be called if
			
 
				 \p bundle is already closed and/or \p task is already submitted. Doing
			
@@ -39,21 +39,21 @@ so would result in undefined behaviour. On success, it returns 0. If
 
				 \p bundle is already closed it returns <c>-ENOENT</c>.
			
 
				 
			
 
				 \fn void starpu_task_bundle_close (starpu_task_bundle_t bundle)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Inform the runtime that the user will not modify \p bundle anymore, it
			
 
				 means no more inserting or removing task. Thus the runtime can destroy
			
 
				 it when possible.
			
 
				 
			
 
				 \fn double starpu_task_bundle_expected_length (starpu_task_bundle_t bundle, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Return the expected duration of \p bundle in micro-seconds.
			
 
				 
			
 
				 \fn double starpu_task_bundle_expected_power (starpu_task_bundle_t bundle, enum starpu_perfmodel_archtype arch, unsigned nimpl)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Return the expected power consumption of \p bundle in J.
			
 
				 
			
 
				 \fn double starpu_task_bundle_expected_data_transfer_time (starpu_task_bundle_t bundle, unsigned memory_node)
			
 
				-\ingroup Task_Bundles
			
 
				+\ingroup API_Task_Bundles
			
 
				 Return the time (in micro-seconds) expected to transfer all data used within \p bundle.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/task_lists.doxy
+++ b/doc/doxygen/chapters/api/task_lists.doxy
@@ -6,62 +6,62 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Task_Lists Task Lists
			
 
				+/*! \defgroup API_Task_Lists Task Lists
			
 
				 
			
 
				 \struct starpu_task_list
			
 
				 \brief Stores a double-chained list of tasks
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 \var starpu_task_list::head
			
 
				 head of the list
			
 
				 \var starpu_task_list::tail
			
 
				 tail of the list
			
 
				 
			
 
				 \fn void starpu_task_list_init(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Initialize a list structure
			
 
				 
			
 
				 \fn void starpu_task_list_push_front(struct starpu_task_list *list, struct starpu_task *task)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Push \p task at the front of \p list
			
 
				 
			
 
				 \fn void starpu_task_list_push_back(struct starpu_task_list *list, struct starpu_task *task)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Push \p task at the back of \p list
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_front(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Get the front of \p list (without removing it)
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_back(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Get the back of \p list (without removing it)
			
 
				 
			
 
				 \fn int starpu_task_list_empty(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Test if \p list is empty
			
 
				 
			
 
				 \fn void starpu_task_list_erase(struct starpu_task_list *list, struct starpu_task *task)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Remove \p task from \p list
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_pop_front(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Remove the element at the front of \p list
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_pop_back(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Remove the element at the back of \p list
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_begin(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Get the first task of \p list.
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_end(struct starpu_task_list *list)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Get the end of \p list.
			
 
				 
			
 
				 \fn struct starpu_task * starpu_task_list_next(struct starpu_task *task)
			
 
				-\ingroup Task_Lists
			
 
				+\ingroup API_Task_Lists
			
 
				 Get the next task of \p list. This is not erase-safe.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/top.doxy
+++ b/doc/doxygen/chapters/api/top.doxy
@@ -6,65 +6,65 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup StarPU-Top_Interface StarPU-Top Interface
			
 
				+/*! \defgroup API_StarPU-Top_Interface StarPU-Top Interface
			
 
				 
			
 
				 \enum starpu_top_data_type
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 \brief StarPU-Top Data type
			
 
				 \var starpu_top_data_type::STARPU_TOP_DATA_BOOLEAN
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_data_type::STARPU_TOP_DATA_INTEGER
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_data_type::STARPU_TOP_DATA_FLOAT
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 
			
 
				 \enum starpu_top_param_type
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 \brief StarPU-Top Parameter type
			
 
				 \var starpu_top_param_type::STARPU_TOP_PARAM_BOOLEAN
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_param_type::STARPU_TOP_PARAM_INTEGER
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_param_type::STARPU_TOP_PARAM_FLOAT
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_param_type::STARPU_TOP_PARAM_ENUM
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 
			
 
				 \enum starpu_top_message_type
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 \brief StarPU-Top Message type
			
 
				 \var starpu_top_message_type::TOP_TYPE_GO
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_SET
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_CONTINUE
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_ENABLE
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_DISABLE
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_DEBUG
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 \var starpu_top_message_type::TOP_TYPE_UNKNOW
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 todo
			
 
				 
			
 
				 \struct starpu_top_data
			
 
				 \brief todo
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 \var starpu_top_data::id
			
 
				 todo
			
 
				 \var starpu_top_data::name
			
@@ -86,7 +86,7 @@ todo
 
				 
			
 
				 \struct starpu_top_param
			
 
				 \brief todo
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 \var starpu_top_param::id
			
 
				 todo
			
 
				 \var starpu_top_param::name
			
@@ -113,98 +113,98 @@ todo
 
				 todo
			
 
				 
			
 
				 @name Functions to call before the initialisation
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 
			
 
				 \fn struct starpu_top_data *starpu_top_add_data_boolean(const char* data_name, int active)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a data named data_name of type boolean.
			
 
				 If \p active=0, the value will NOT be displayed to user by default.
			
 
				 Any other value will make the value displayed by default.
			
 
				 
			
 
				 \fn struct starpu_top_data * starpu_top_add_data_integer(const char* data_name, int minimum_value, int maximum_value, int active)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a data named \p data_name of type integer. The
			
 
				 minimum and maximum value will be usefull to define the scale in UI.
			
 
				 If \p active=0, the value will NOT be displayed to user by default.
			
 
				 Any other value will make the value displayed by default.
			
 
				 
			
 
				 \fn struct starpu_top_data* starpu_top_add_data_float(const char* data_name, double minimum_value, double maximum_value, int active)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a data named data_name of type float. The
			
 
				 minimum and maximum value will be usefull to define the scale in UI.
			
 
				 If \p active=0, the value will NOT be displayed to user by default.
			
 
				 Any other value will make the value displayed by default.
			
 
				 
			
 
				 \fn struct starpu_top_param* starpu_top_register_parameter_boolean(const char* param_name, int* parameter_field, void (*callback)(struct starpu_top_param*))
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a parameter named \p parameter_name, of type
			
 
				 boolean. The \p callback fonction will be called when the parameter is
			
 
				 modified by UI, and can be null.
			
 
				 
			
 
				 \fn struct starpu_top_param* starpu_top_register_parameter_float(const char* param_name, double* parameter_field, double minimum_value, double maximum_value, void (*callback)(struct starpu_top_param*))
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 his fonction register a parameter named \p param_name, of type
			
 
				 integer. Minimum and maximum value will be used to prevent user seting
			
 
				 incorrect value. The \p callback fonction will be called when the
			
 
				 parameter is modified by UI, and can be null.
			
 
				 
			
 
				 \fn struct starpu_top_param* starpu_top_register_parameter_integer(const char* param_name, int* parameter_field, int minimum_value, int maximum_value, void (*callback)(struct starpu_top_param*))
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a parameter named \p param_name, of type float.
			
 
				 Minimum and maximum value will be used to prevent user seting
			
 
				 incorrect value. The \p callback fonction will be called when the
			
 
				 parameter is modified by UI, and can be null.
			
 
				 
			
 
				 \fn struct starpu_top_param* starpu_top_register_parameter_enum(const char* param_name, int* parameter_field, char** values, int nb_values, void (*callback)(struct starpu_top_param*))
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This fonction register a parameter named \p param_name, of type enum.
			
 
				 Minimum and maximum value will be used to prevent user seting
			
 
				 incorrect value. The \p callback fonction will be called when the
			
 
				 parameter is modified by UI, and can be null.
			
 
				 
			
 
				 @name Initialisation
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 
			
 
				 \fn void starpu_top_init_and_wait(const char *server_name)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function must be called when all parameters and data have been
			
 
				 registered AND initialised (for parameters). This function will wait
			
 
				 for a TOP to connect, send initialisation sentences, and wait for the
			
 
				 GO message.
			
 
				 
			
 
				 @name To call after initialisation
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 
			
 
				 \fn void starpu_top_update_parameter(const struct starpu_top_param *param)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function should be called after every modification of a parameter
			
 
				 from something other than starpu_top. This fonction notice UI that the
			
 
				 configuration changed.
			
 
				 
			
 
				 \fn void starpu_top_update_data_boolean(const struct starpu_top_data *data, int value)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function updates the value of the starpu_top_data on UI.
			
 
				 
			
 
				 \fn void starpu_top_update_data_integer(const struct starpu_top_data *data, int value)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function updates the value of the starpu_top_data on UI.
			
 
				 
			
 
				 \fn void starpu_top_update_data_float(const struct starpu_top_data *data, double value)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function updates the value of the starpu_top_data on UI.
			
 
				 
			
 
				 \fn void starpu_top_task_prevision(struct starpu_task *task, int devid, unsigned long long start, unsigned long long end)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function notifies UI than the task have been planed to run from start to end, on computation-core.
			
 
				 
			
 
				 \fn void starpu_top_debug_log(const char *message)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function is useful in debug mode. The starpu developper doesn't
			
 
				 need to check if the debug mode is active. This is checked by
			
 
				 starpu_top itsefl. It just send a message to display by UI.
			
 
				 
			
 
				 \fn void starpu_top_debug_lock(const char *message)
			
 
				-\ingroup StarPU-Top_Interface
			
 
				+\ingroup API_StarPU-Top_Interface
			
 
				 This function is useful in debug mode. The starpu developper doesn't
			
 
				 need to check if the debug mode is active. This is checked by
			
 
				 starpu_top itsefl. It send a message and wait for a continue message
			
--- a/doc/doxygen/chapters/api/versioning.doxy
+++ b/doc/doxygen/chapters/api/versioning.doxy
@@ -6,22 +6,22 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Versioning Versioning
			
 
				+/*! \defgroup API_Versioning Versioning
			
 
				 
			
 
				 \def STARPU_MAJOR_VERSION
			
 
				-\ingroup Versioning
			
 
				+\ingroup API_Versioning
			
 
				 \brief Define the major version of StarPU. This is the version used when compiling the application.
			
 
				 
			
 
				 \def STARPU_MINOR_VERSION
			
 
				-\ingroup Versioning
			
 
				+\ingroup API_Versioning
			
 
				 \brief Define the minor version of StarPU. This is the version used when compiling the application.
			
 
				 
			
 
				 \def STARPU_RELEASE_VERSION
			
 
				-\ingroup Versioning
			
 
				+\ingroup API_Versioning
			
 
				 \brief Define the release version of StarPU. This is the version used when compiling the application.
			
 
				 
			
 
				 \fn void starpu_get_version(int *major, int *minor, int *release)
			
 
				-\ingroup Versioning
			
 
				+\ingroup API_Versioning
			
 
				 \brief Return as 3 integers the version of StarPU used when running the application.
			
 
				 
			
 
				 */
			
--- a/doc/doxygen/chapters/api/workers.doxy
+++ b/doc/doxygen/chapters/api/workers.doxy
@@ -6,87 +6,87 @@
 
				  * See the file version.doxy for copying conditions.
			
 
				  */
			
 
				 
			
 
				-/*! \defgroup Workers_Properties Workers’ Properties
			
 
				+/*! \defgroup API_Workers_Properties Workers’ Properties
			
 
				 
			
 
				 \enum starpu_node_kind
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 \var starpu_node_kind::STARPU_UNUSED
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \var starpu_node_kind::STARPU_CPU_RAM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 \var starpu_node_kind::STARPU_CUDA_RAM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 \var starpu_node_kind::STARPU_OPENCL_RAM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 \var starpu_node_kind::STARPU_MIC_RAM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 \var starpu_node_kind::STARPU_SCC_RAM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 This node kind is not used anymore, but implementations in interfaces
			
 
				 will be useful for MPI.
			
 
				 \var starpu_node_kind::STARPU_SCC_SHM
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 TODO
			
 
				 
			
 
				 \enum starpu_worker_archtype
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief Worker Architecture Type
			
 
				 \var starpu_worker_archtype::STARPU_ANY_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 any worker, used in the hypervisor
			
 
				 \var starpu_worker_archtype::STARPU_CPU_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 CPU core
			
 
				 \var starpu_worker_archtype::STARPU_CUDA_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 NVIDIA CUDA device
			
 
				 \var starpu_worker_archtype::STARPU_OPENCL_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 OpenCL device
			
 
				 \var starpu_worker_archtype::STARPU_MIC_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 Intel MIC device
			
 
				 \var starpu_worker_archtype::STARPU_SCC_WORKER
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 Intel SCC device
			
 
				 
			
 
				 
			
 
				 \fn unsigned starpu_worker_get_count(void)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the number of workers (i.e. processing
			
 
				 units executing StarPU tasks). The returned value should be at most
			
 
				 STARPU_NMAXWORKERS.
			
 
				 
			
 
				 \fn int starpu_worker_get_count_by_type(enum starpu_worker_archtype type)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief Returns the number of workers of the given type. A positive (or
			
 
				 NULL) value is returned in case of success, -EINVAL indicates that the
			
 
				 type is not valid otherwise.
			
 
				 
			
 
				 \fn unsigned starpu_cpu_worker_get_count(void)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the number of CPUs controlled by StarPU. The
			
 
				 returned value should be at most STARPU_MAXCPUS.
			
 
				 
			
 
				 \fn unsigned starpu_cuda_worker_get_count(void)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the number of CUDA devices controlled by
			
 
				 StarPU. The returned value should be at most STARPU_MAXCUDADEVS.
			
 
				 
			
 
				 \fn unsigned starpu_opencl_worker_get_count(void)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the number of OpenCL devices controlled by
			
 
				 StarPU. The returned value should be at most STARPU_MAXOPENCLDEVS.
			
 
				 
			
 
				 \fn int starpu_worker_get_id (void)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the identifier of the current worker, i.e
			
 
				 the one associated to the calling thread. The returned value is either
			
 
				 -1 if the current context is not a StarPU worker (i.e. when called
			
@@ -94,7 +94,7 @@ from the application outside a task or a callback), or an integer
 
				 between 0 and starpu_worker_get_count() - 1.
			
 
				 
			
 
				 \fn int starpu_worker_get_ids_by_type(enum starpu_worker_archtype type, int *workerids, int maxsize)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function gets the list of identifiers of workers with the
			
 
				 given type. It fills the workerids array with the identifiers of the
			
 
				 workers that have the type indicated in the first argument. The
			
@@ -108,18 +108,18 @@ function, or by passing a value greater or equal to
 
				 STARPU_NMAXWORKERS.
			
 
				 
			
 
				 \fn int starpu_worker_get_by_type(enum starpu_worker_archtype type, int num)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This returns the identifier of the num-th worker that has the
			
 
				 specified type type. If there are no such worker, -1 is returned.
			
 
				 
			
 
				 \fn int starpu_worker_get_by_devid(enum starpu_worker_archtype type, int devid)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This returns the identifier of the worker that has the specified type
			
 
				 type and devid devid (which may not be the n-th, if some devices are
			
 
				 skipped for instance). If there are no such worker, -1 is returned.
			
 
				 
			
 
				 \fn int starpu_worker_get_devid(int id)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the device id of the given worker. The
			
 
				 worker should be identified with the value returned by the
			
 
				 starpu_worker_get_id() function. In the case of a CUDA worker, this
			
@@ -130,7 +130,7 @@ which the worker was bound; this identifier is either provided by the
 
				 OS or by the hwloc library in case it is available.
			
 
				 
			
 
				 \fn enum starpu_worker_archtype starpu_worker_get_type(int id)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the type of processing unit associated to
			
 
				 a worker. The worker identifier is a value returned by the
			
 
				 starpu_worker_get_id() function). The returned value indicates the
			
@@ -140,7 +140,7 @@ OpenCL device. The value returned for an invalid identifier is
 
				 unspecified.
			
 
				 
			
 
				 \fn void starpu_worker_get_name(int id, char *dst, size_t maxlen)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function allows to get the name of a given worker. StarPU
			
 
				 associates a unique human readable string to each processing unit.
			
 
				 This function copies at most the maxlen first bytes of the unique
			
@@ -150,12 +150,12 @@ valid pointer to a buffer of maxlen bytes at least. Calling this
 
				 function on an invalid identifier results in an unspecified behaviour.
			
 
				 
			
 
				 \fn unsigned starpu_worker_get_memory_node(unsigned workerid)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief This function returns the identifier of the memory node
			
 
				 associated to the worker identified by workerid.
			
 
				 
			
 
				 \fn enum starpu_node_kind starpu_node_get_kind(unsigned node)
			
 
				-\ingroup Workers_Properties
			
 
				+\ingroup API_Workers_Properties
			
 
				 \brief Returns the type of the given node as defined by
			
 
				 ::starpu_node_kind. For example, when defining a new data interface,
			
 
				 this function should be used in the allocation function to determine
			
--- a/doc/doxygen/chapters/basic_examples.doxy
+++ b/doc/doxygen/chapters/basic_examples.doxy
@@ -52,7 +52,7 @@ section using StarPU's standard C API.
 
				 
			
 
				 \subsection Required_Headers Required Headers
			
 
				 
			
 
				-The starpu.h header should be included in any code using StarPU.
			
 
				+The header starpu.h should be included in any code using StarPU.
			
 
				 
			
 
				 \code{.c}
			
 
				 #include <starpu.h>
			
@@ -85,29 +85,28 @@ A codelet is a structure that represents a computational kernel. Such a codelet
 
				 may contain an implementation of the same kernel on different architectures
			
 
				 (e.g. CUDA, x86, ...). For compatibility, make sure that the whole
			
 
				 structure is properly initialized to zero, either by using the
			
 
				-function starpu_codelet_init (@pxref{starpu_codelet_init}), or by letting the
			
 
				+function starpu_codelet_init(), or by letting the
			
 
				 compiler implicitly do it as examplified above.
			
 
				 
			
 
				 The <c>nbuffers</c> field specifies the number of data buffers that are
			
 
				 manipulated by the codelet: here the codelet does not access or modify any data
			
 
				 that is controlled by our data management library. Note that the argument
			
 
				-passed to the codelet (the <c>cl_arg</c> field of the <c>starpu_task</c>
			
 
				-structure) does not count as a buffer since it is not managed by our data
			
 
				-management library, but just contain trivial parameters.
			
 
				+passed to the codelet (the field starpu_task::cl_arg) does not count
			
 
				+as a buffer since it is not managed by our data management library,
			
 
				+but just contain trivial parameters.
			
 
				 
			
 
				 \internal
			
 
				 TODO need a crossref to the proper description of "where" see bla for more ...
			
 
				 \endinternal
			
 
				 
			
 
				-We create a codelet which may only be executed on the CPUs. The <c>where</c>
			
 
				-field is a bitmask that defines where the codelet may be executed. Here, the
			
 
				-<c>STARPU_CPU</c> value means that only CPUs can execute this codelet
			
 
				-(@pxref{Codelets and Tasks} for more details on this field). Note that
			
 
				-the <c>where</c> field is optional, when unset its value is
			
 
				-automatically set based on the availability of the different
			
 
				-<c>XXX_funcs</c> fields.
			
 
				-When a CPU core executes a codelet, it calls the <c>cpu_func</c> function,
			
 
				-which \em must have the following prototype:
			
 
				+We create a codelet which may only be executed on the CPUs. The field
			
 
				+starpu_codelet::where is a bitmask that defines where the codelet may
			
 
				+be executed. Here, the value ::STARPU_CPU means that only CPUs can
			
 
				+execute this codelet. Note that field starpu_codelet::where is
			
 
				+optional, when unset its value is automatically set based on the
			
 
				+availability of the different fields <c>XXX_funcs</c>.
			
 
				+When a CPU core executes a codelet, it calls the function
			
 
				+<c>cpu_func</c>, which \em must have the following prototype:
			
 
				 
			
 
				 \code{.c}
			
 
				 void (*cpu_func)(void *buffers[], void *cl_arg);
			
@@ -117,8 +116,7 @@ In this example, we can ignore the first argument of this function which gives a
 
				 description of the input and output buffers (e.g. the size and the location of
			
 
				 the matrices) since there is none.
			
 
				 The second argument is a pointer to a buffer passed as an
			
 
				-argument to the codelet by the means of the <c>cl_arg</c> field of the
			
 
				-<c>starpu_task</c> structure.
			
 
				+argument to the codelet by the means of the field starpu_task::cl_arg.
			
 
				 
			
 
				 \internal
			
 
				 TODO rewrite so that it is a little clearer ?
			
@@ -175,8 +173,8 @@ starpu_shutdown().
 
				 
			
 
				 In the example above, a task structure is allocated by a call to
			
 
				 starpu_task_create(). This function only allocates and fills the
			
 
				-corresponding structure with the default settings (@pxref{Codelets and
			
 
				-Tasks, starpu_task_create}), but it does not submit the task to StarPU.
			
 
				+corresponding structure with the default settings, but it does not
			
 
				+submit the task to StarPU.
			
 
				 
			
 
				 \internal
			
 
				 not really clear ;)
			
@@ -210,13 +208,14 @@ void (*callback_function)(void *);
 
				 \endcode
			
 
				 
			
 
				 If the <c>synchronous</c> field is non-zero, task submission will be
			
 
				-synchronous: the starpu_task_submit() function will not return until the
			
 
				-task was executed. Note that the starpu_shutdown() function does not
			
 
				-guarantee that asynchronous tasks have been executed before it returns,
			
 
				-starpu_task_wait_for_all() can be used to that effect, or data can be
			
 
				-unregistered (starpu_data_unregister()), which will
			
 
				-implicitly wait for all the tasks scheduled to work on it, unless explicitly
			
 
				-disabled thanks to starpu_data_set_default_sequential_consistency_flag() or
			
 
				+synchronous: the function starpu_task_submit() will not return until
			
 
				+the task was executed. Note that the function starpu_shutdown() does
			
 
				+not guarantee that asynchronous tasks have been executed before it
			
 
				+returns, starpu_task_wait_for_all() can be used to that effect, or
			
 
				+data can be unregistered (starpu_data_unregister()), which will
			
 
				+implicitly wait for all the tasks scheduled to work on it, unless
			
 
				+explicitly disabled thanks to
			
 
				+starpu_data_set_default_sequential_consistency_flag() or
			
 
				 starpu_data_set_sequential_consistency_flag().
			
 
				 
			
 
				 \subsection Execution_of_Hello_World Execution of Hello World
			
@@ -458,11 +457,12 @@ modified by a task, and StarPU makes sure that when a computational kernel
 
				 starts somewhere (e.g. on a GPU), its data are available locally.
			
 
				 
			
 
				 Before submitting those tasks, the programmer first needs to declare the
			
 
				-different pieces of data to StarPU using the <c>starpu_*_data_register</c>
			
 
				-functions. To ease the development of applications for StarPU, it is possible
			
 
				-to describe multiple types of data layout. A type of data layout is called an
			
 
				-<b>interface</b>. There are different predefined interfaces available in StarPU:
			
 
				-here we will consider the <b>vector interface</b>.
			
 
				+different pieces of data to StarPU using the functions
			
 
				+<c>starpu_*_data_register</c>. To ease the development of applications
			
 
				+for StarPU, it is possible to describe multiple types of data layout.
			
 
				+A type of data layout is called an <b>interface</b>. There are
			
 
				+different predefined interfaces available in StarPU: here we will
			
 
				+consider the <b>vector interface</b>.
			
 
				 
			
 
				 The following lines show how to declare an array of <c>NX</c> elements of type
			
 
				 <c>float</c> using the vector interface:
			
@@ -503,8 +503,8 @@ can just be passed through the <c>cl_arg</c> pointer like in the previous
 
				 example.  The vector parameter is described by its handle.
			
 
				 There are two fields in each element of the <c>buffers</c> array.
			
 
				 <c>handle</c> is the handle of the data, and <c>mode</c> specifies how the
			
 
				-kernel will access the data (<c>STARPU_R</c> for read-only, <c>STARPU_W</c> for
			
 
				-write-only and <c>STARPU_RW</c> for read and write access).
			
 
				+kernel will access the data (::STARPU_R for read-only, ::STARPU_W for
			
 
				+write-only and ::STARPU_RW for read and write access).
			
 
				 
			
 
				 The definition of the codelet can be written as follows:
			
 
				 
			
@@ -563,9 +563,9 @@ only be executed by the CPUs, but also by a CUDA device.
 
				 
			
 
				 The CUDA implementation can be written as follows. It needs to be compiled with
			
 
				 a CUDA compiler such as nvcc, the NVIDIA CUDA compiler driver. It must be noted
			
 
				-that the vector pointer returned by STARPU_VECTOR_GET_PTR is here a pointer in GPU
			
 
				-memory, so that it can be passed as such to the <c>vector_mult_cuda</c> kernel
			
 
				-call.
			
 
				+that the vector pointer returned by ::STARPU_VECTOR_GET_PTR is here a
			
 
				+pointer in GPU memory, so that it can be passed as such to the
			
 
				+<c>vector_mult_cuda</c> kernel call.
			
 
				 
			
 
				 \code{.c}
			
 
				 #include <starpu.h>
			
@@ -589,10 +589,10 @@ extern "C" void scal_cuda_func(void *buffers[], void *_args)
 
				     unsigned threads_per_block = 64;
			
 
				     unsigned nblocks = (n + threads_per_block-1) / threads_per_block;
			
 
				 
			
 
				-@i{    vector_mult_cuda<<<nblocks,threads_per_block, 0, starpu_cuda_get_local_stream()>>>}
			
 
				-@i{                    (n, val, *factor);}
			
 
				+    vector_mult_cuda<<<nblocks,threads_per_block, 0, starpu_cuda_get_local_stream()>>>
			
 
				+                    (n, val, *factor);
			
 
				 
			
 
				-@i{    cudaStreamSynchronize(starpu_cuda_get_local_stream());}
			
 
				+    cudaStreamSynchronize(starpu_cuda_get_local_stream());
			
 
				 }
			
 
				 \endcode
			
 
				 
			
@@ -611,10 +611,10 @@ __kernel void vector_mult_opencl(int nx, __global float* val, float factor)
 
				 }
			
 
				 \endcode
			
 
				 
			
 
				-Contrary to CUDA and CPU, <c>STARPU_VECTOR_GET_DEV_HANDLE</c> has to be used,
			
 
				+Contrary to CUDA and CPU, ::TARPU_VECTOR_GET_DEV_HANDLE has to be used,
			
 
				 which returns a <c>cl_mem</c> (which is not a device pointer, but an OpenCL
			
 
				 handle), which can be passed as such to the OpenCL kernel. The difference is
			
 
				-important when using partitioning, see @ref{Partitioning Data}.
			
 
				+important when using partitioning, see \ref Partitioning_Data.
			
 
				 
			
 
				 \code{.c}
			
 
				 #include <starpu.h>
			
@@ -764,7 +764,7 @@ The Makefile given at the beginning of the section must be extended to
 
				 give the rules to compile the CUDA source code. Note that the source
			
 
				 file of the OpenCL kernel does not need to be compiled now, it will
			
 
				 be compiled at run-time when calling the function
			
 
				-starpu_opencl_load_opencl_from_file() (@pxref{starpu_opencl_load_opencl_from_file}).
			
 
				+starpu_opencl_load_opencl_from_file().
			
 
				 
			
 
				 \verbatim
			
 
				 CFLAGS  += $(shell pkg-config --cflags starpu-1.1)
			
--- a/doc/doxygen/chapters/c_extensions.doxy
+++ b/doc/doxygen/chapters/c_extensions.doxy
@@ -42,8 +42,7 @@ in source files.
 
				 This section describes the C extensions implemented by StarPU's GCC
			
 
				 plug-in.  It does not require detailed knowledge of the StarPU library.
			
 
				 
			
 
				-Note: as of StarPU @value{VERSION}, this is still an area under
			
 
				-development and subject to change.
			
 
				+Note: this is still an area under development and subject to change.
			
 
				 
			
 
				 \section Defining_Tasks Defining Tasks
			
 
				 
			
@@ -85,22 +84,22 @@ actual definition of a task's body is automatically generated by the
 
				 compiler.
			
 
				 
			
 
				 Under the hood, declaring a task leads to the declaration of the
			
 
				-corresponding <c>codelet</c> (@pxref{Codelet and Tasks}).  If one or
			
 
				+corresponding <c>codelet</c> (\ref Codelet_and_Tasks).  If one or
			
 
				 more task implementations are declared in the same compilation unit,
			
 
				 then the codelet and the function itself are also defined; they inherit
			
 
				 the scope of the task.
			
 
				 
			
 
				 Scalar arguments to the task are passed by value and copied to the
			
 
				 target device if need be---technically, they are passed as the
			
 
				-<c>cl_arg</c> buffer (@pxref{Codelets and Tasks, <c>cl_arg</c>}).
			
 
				+<c>cl_arg</c> buffer (\ref Codelets_and_Tasks).
			
 
				 
			
 
				 Pointer arguments are assumed to be registered data buffers---the
			
 
				-<c>buffers</c> argument of a task (@pxref{Codelets and Tasks,
			
 
				-<c>buffers</c>}); <c>const</c>-qualified pointer arguments are viewed as
			
 
				-read-only buffers (<c>STARPU_R</c>), and non-<c>const</c>-qualified
			
 
				-buffers are assumed to be used read-write (<c>STARPU_RW</c>).  In
			
 
				-addition, the <c>output</c> type attribute can be as a type qualifier
			
 
				-for output pointer or array parameters (<c>STARPU_W</c>).
			
 
				+buffer argument of a task); <c>const</c>-qualified
			
 
				+pointer arguments are viewed as read-only buffers (::STARPU_R), and
			
 
				+non-<c>const</c>-qualified buffers are assumed to be used read-write
			
 
				+(::STARPU_RW).  In addition, the <c>output</c> type attribute can be
			
 
				+as a type qualifier for output pointer or array parameters
			
 
				+(::STARPU_W).
			
 
				 </dd>
			
 
				 
			
 
				 <dt><c>task_implementation (target, task)</c></dt>
			
@@ -197,8 +196,8 @@ static void matmul_opencl (const float *A, const float *B, float *C,
 
				 \endcode
			
 
				 
			
 
				 The CUDA and OpenCL implementations typically either invoke a kernel
			
 
				-written in CUDA or OpenCL (for similar code, @pxref{CUDA Kernel}, and
			
 
				-@pxref{OpenCL Kernel}), or call a library function that uses CUDA or
			
 
				+written in CUDA or OpenCL (for similar code, \ref CUDA_Kernel, and
			
 
				+\ref OpenCL_Kernel), or call a library function that uses CUDA or
			
 
				 OpenCL under the hood, such as CUBLAS functions:
			
 
				 
			
 
				 \code{.c}
			
--- a/doc/doxygen/chapters/fft_support.doxy
+++ b/doc/doxygen/chapters/fft_support.doxy
@@ -54,7 +54,7 @@ the task completion, and thus permits to enqueue a series of tasks.
 
				 </li>
			
 
				 </ul>
			
 
				 
			
 
				-All functions are defined in @ref{FFT Support}.
			
 
				+All functions are defined in \ref FFT_Support.
			
 
				 
			
 
				 \section Compilation Compilation
			
 
				 
			
--- a/doc/doxygen/chapters/mpi_support.doxy
+++ b/doc/doxygen/chapters/mpi_support.doxy
@@ -12,7 +12,7 @@ The integration of MPI transfers within task parallelism is done in a
 
				 very natural way by the means of asynchronous interactions between the
			
 
				 application and StarPU.  This is implemented in a separate libstarpumpi library
			
 
				 which basically provides "StarPU" equivalents of <c>MPI_*</c> functions, where
			
 
				-<c>void *</c> buffers are replaced with <c>starpu_data_handle_t</c>s, and all
			
 
				+<c>void *</c> buffers are replaced with starpu_data_handle_t, and all
			
 
				 GPU-RAM-NIC transfers are handled efficiently by StarPU-MPI.  The user has to
			
 
				 use the usual <c>mpirun</c> command of the MPI implementation to start StarPU on
			
 
				 the different MPI nodes.
			
@@ -131,8 +131,8 @@ hashmap if it is a receive request.
 
				 
			
 
				 Internally, all MPI communications submitted by StarPU uses a unique
			
 
				 tag which has a default value, and can be accessed with the functions
			
 
				-@ref{starpu_mpi_get_communication_tag} and
			
 
				-@ref{starpu_mpi_set_communication_tag}.
			
 
				+starpu_mpi_get_communication_tag() and
			
 
				+starpu_mpi_set_communication_tag().
			
 
				 
			
 
				 The matching of tags with corresponding requests is done into StarPU-MPI.
			
 
				 To handle this, any communication is a double-communication based on a
			
@@ -165,20 +165,19 @@ will arrive just after, so as when the corresponding receive request
 
				 will be submitted by the application, it'll copy this temporary handle
			
 
				 into its one instead of submitting a new StarPU-MPI request.
			
 
				 
			
 
				-@ref{Communication} gives the list of all the point to point
			
 
				+\ref Communication gives the list of all the point to point
			
 
				 communications defined in StarPU-MPI.
			
 
				 
			
 
				 \section Exchanging_User_Defined_Data_Interface Exchanging User Defined Data Interface
			
 
				 
			
 
				-New data interfaces defined as explained in @ref{Defining a New Data
			
 
				-Interface} can also be used within StarPU-MPI and exchanged between
			
 
				-nodes. Two functions needs to be defined through
			
 
				-the type <c>struct starpu_data_interface_ops</c> (@pxref{Defining
			
 
				-Interface}). The pack function takes a handle and returns a
			
 
				-contiguous memory buffer along with its size where data to be conveyed to another node
			
 
				-should be copied. The reversed operation is implemented in the unpack
			
 
				-function which takes a contiguous memory buffer and recreates the data
			
 
				-handle.
			
 
				+New data interfaces defined as explained in \ref
			
 
				+Defining_a_New_Data_Interface can also be used within StarPU-MPI and
			
 
				+exchanged between nodes. Two functions needs to be defined through
			
 
				+the type starpu_data_interface_ops. The pack function takes a handle
			
 
				+and returns a contiguous memory buffer along with its size where data
			
 
				+to be conveyed to another node should be copied. The reversed
			
 
				+operation is implemented in the unpack function which takes a
			
 
				+contiguous memory buffer and recreates the data handle.
			
 
				 
			
 
				 \code{.c}
			
 
				 static int complex_pack_data(starpu_data_handle_t handle, unsigned node, void **ptr, ssize_t *count)
			
@@ -230,7 +229,7 @@ exchange the content of the handle. All MPI nodes then process the whole task
 
				 graph, and StarPU automatically determines which node actually execute which
			
 
				 task, and trigger the required MPI transfers.
			
 
				 
			
 
				-The list of functions is described in @ref{MPI Insert Task}.
			
 
				+The list of functions is described in \ref MPI_Insert_Task.
			
 
				 
			
 
				 Here an stencil example showing how to use starpu_mpi_insert_task(). One
			
 
				 first needs to define a distribution function which specifies the
			
@@ -320,7 +319,7 @@ execute them, or to send the required data).
 
				 
			
 
				 \section MPI_Collective_Operations MPI Collective Operations
			
 
				 
			
 
				-The functions are described in @ref{Collective Operations}.
			
 
				+The functions are described in \ref Collective_Operations.
			
 
				 
			
 
				 \code{.c}
			
 
				 if (rank == root)
			
--- a/doc/doxygen/chapters/optimize_performance.doxy
+++ b/doc/doxygen/chapters/optimize_performance.doxy
@@ -16,13 +16,13 @@ few additional changes are needed.
 
				 
			
 
				 \section Data_management Data management
			
 
				 
			
 
				-When the application allocates data, whenever possible it should use the
			
 
				-starpu_malloc() function, which will ask CUDA or
			
 
				-OpenCL to make the allocation itself and pin the corresponding allocated
			
 
				-memory. This is needed to permit asynchronous data transfer, i.e. permit data
			
 
				-transfer to overlap with computations. Otherwise, the trace will show that the
			
 
				-<c>DriverCopyAsync</c> state takes a lot of time, this is because CUDA or OpenCL
			
 
				-then reverts to synchronous transfers.
			
 
				+When the application allocates data, whenever possible it should use
			
 
				+the function starpu_malloc(), which will ask CUDA or OpenCL to make
			
 
				+the allocation itself and pin the corresponding allocated memory. This
			
 
				+is needed to permit asynchronous data transfer, i.e. permit data
			
 
				+transfer to overlap with computations. Otherwise, the trace will show
			
 
				+that the <c>DriverCopyAsync</c> state takes a lot of time, this is
			
 
				+because CUDA or OpenCL then reverts to synchronous transfers.
			
 
				 
			
 
				 By default, StarPU leaves replicates of data wherever they were used, in case they
			
 
				 will be re-used by other tasks, thus saving the data transfer time. When some
			
@@ -54,11 +54,11 @@ Implicit data dependency computation can become expensive if a lot
 
				 of tasks access the same piece of data. If no dependency is required
			
 
				 on some piece of data (e.g. because it is only accessed in read-only
			
 
				 mode, or because write accesses are actually commutative), use the
			
 
				-starpu_data_set_sequential_consistency_flag() function to disable implicit
			
 
				-dependencies on that data.
			
 
				+function starpu_data_set_sequential_consistency_flag() to disable
			
 
				+implicit dependencies on that data.
			
 
				 
			
 
				 In the same vein, accumulation of results in the same data can become a
			
 
				-bottleneck. The use of the <c>STARPU_REDUX</c> mode permits to optimize such
			
 
				+bottleneck. The use of the mode ::STARPU_REDUX permits to optimize such
			
 
				 accumulation (see \ref Data_reduction).
			
 
				 
			
 
				 Applications often need a data just for temporary results.  In such a case,
			
@@ -184,13 +184,15 @@ to configure a performance model for the codelets of the application (see
 
				 use on-line calibration.  StarPU will automatically calibrate codelets
			
 
				 which have never been calibrated yet, and save the result in
			
 
				 <c>$STARPU_HOME/.starpu/sampling/codelets</c>.
			
 
				-The models are indexed by machine name. To share the models between machines (e.g. for a homogeneous cluster), use <c>export STARPU_HOSTNAME=some_global_name</c>. To force continuing calibration, use
			
 
				-<c>export STARPU_CALIBRATE=1</c> . This may be necessary if your application
			
 
				+The models are indexed by machine name. To share the models between
			
 
				+machines (e.g. for a homogeneous cluster), use <c>export
			
 
				+STARPU_HOSTNAME=some_global_name</c>. To force continuing calibration,
			
 
				+use <c>export STARPU_CALIBRATE=1</c> . This may be necessary if your application
			
 
				 has not-so-stable performance. StarPU will force calibration (and thus ignore
			
 
				-the current result) until 10 (_STARPU_CALIBRATION_MINIMUM) measurements have been
			
 
				+the current result) until 10 (<c>_STARPU_CALIBRATION_MINIMUM</c>) measurements have been
			
 
				 made on each architecture, to avoid badly scheduling tasks just because the
			
 
				 first measurements were not so good. Details on the current performance model status
			
 
				-can be obtained from the <c>starpu_perfmodel_display</c> command: the <c>-l</c>
			
 
				+can be obtained from the command <c>starpu_perfmodel_display</c>: the <c>-l</c>
			
 
				 option lists the available performance models, and the <c>-s</c> option permits
			
 
				 to choose the performance model to be displayed. The result looks like:
			
 
				 
			
@@ -208,7 +210,7 @@ execution time on CPUs was about 11ms, with a 3ms standard deviation, over
 
				 1240 samples. It is a good idea to check this before doing actual performance
			
 
				 measurements.
			
 
				 
			
 
				-A graph can be drawn by using the <c>starpu_perfmodel_plot</c>:
			
 
				+A graph can be drawn by using the tool <c>starpu_perfmodel_plot</c>:
			
 
				 
			
 
				 \verbatim
			
 
				 $ starpu_perfmodel_plot -s starpu_dlu_lu_model_22
			
@@ -278,12 +280,13 @@ tries to minimize is <c>alpha * T_execution + beta * T_data_transfer</c>, where
 
				 <c>T_execution</c> is the estimated execution time of the codelet (usually
			
 
				 accurate), and <c>T_data_transfer</c> is the estimated data transfer time. The
			
 
				 latter is estimated based on bus calibration before execution start,
			
 
				-i.e. with an idle machine, thus without contention. You can force bus re-calibration by running
			
 
				-<c>starpu_calibrate_bus</c>. The beta parameter defaults to 1, but it can be
			
 
				-worth trying to tweak it by using <c>export STARPU_SCHED_BETA=2</c> for instance,
			
 
				-since during real application execution, contention makes transfer times bigger.
			
 
				-This is of course imprecise, but in practice, a rough estimation already gives
			
 
				-the good results that a precise estimation would give.
			
 
				+i.e. with an idle machine, thus without contention. You can force bus
			
 
				+re-calibration by running the tool <c>starpu_calibrate_bus</c>. The
			
 
				+beta parameter defaults to 1, but it can be worth trying to tweak it
			
 
				+by using <c>export STARPU_SCHED_BETA=2</c> for instance, since during
			
 
				+real application execution, contention makes transfer times bigger.
			
 
				+This is of course imprecise, but in practice, a rough estimation
			
 
				+already gives the good results that a precise estimation would give.
			
 
				 
			
 
				 \section Data_prefetch Data prefetch
			
 
				 
			
@@ -299,7 +302,7 @@ setting up an initial statically-computed data distribution on the machine
 
				 before submitting tasks, which will thus guide StarPU toward an initial task
			
 
				 distribution (since StarPU will try to avoid further transfers).
			
 
				 
			
 
				-This can be achieved by giving the starpu_data_prefetch_on_node() function
			
 
				+This can be achieved by giving the function starpu_data_prefetch_on_node()
			
 
				 the handle and the desired target memory node.
			
 
				 
			
 
				 \section Power-based_scheduling Power-based scheduling
			
@@ -326,8 +329,8 @@ The power actually consumed by the total execution can be displayed by setting
 
				 
			
 
				 On-line task consumption measurement is currently only supported through the
			
 
				 <c>CL_PROFILING_POWER_CONSUMED</c> OpenCL extension, implemented in the MoviSim
			
 
				-simulator. Applications can however provide explicit measurements by using the
			
 
				-starpu_perfmodel_update_history() function (examplified in \ref Performance_model_example
			
 
				+simulator. Applications can however provide explicit measurements by
			
 
				+using the function starpu_perfmodel_update_history() (examplified in \ref Performance_model_example
			
 
				 with the <c>power_model</c> performance model. Fine-grain
			
 
				 measurement is often not feasible with the feedback provided by the hardware, so
			
 
				 the user can for instance run a given task a thousand times, measure the global
			
--- a/doc/doxygen/chapters/performance_feedback.doxy
+++ b/doc/doxygen/chapters/performance_feedback.doxy
@@ -29,71 +29,75 @@ Temanejo (so as to distinguish them from tasks).
 
				 
			
 
				 \subsection Enabling_on-line_performance_monitoring Enabling on-line performance monitoring
			
 
				 
			
 
				-In order to enable online performance monitoring, the application can call
			
 
				-<c>starpu_profiling_status_set(STARPU_PROFILING_ENABLE)</c>. It is possible to
			
 
				-detect whether monitoring is already enabled or not by calling
			
 
				-starpu_profiling_status_get(). Enabling monitoring also reinitialize all
			
 
				-previously collected feedback. The <c>STARPU_PROFILING</c> environment variable
			
 
				-can also be set to 1 to achieve the same effect.
			
 
				+In order to enable online performance monitoring, the application can
			
 
				+call starpu_profiling_status_set() with the parameter
			
 
				+::STARPU_PROFILING_ENABLE. It is possible to detect whether monitoring
			
 
				+is already enabled or not by calling starpu_profiling_status_get().
			
 
				+Enabling monitoring also reinitialize all previously collected
			
 
				+feedback. The <c>STARPU_PROFILING</c> environment variable can also be
			
 
				+set to 1 to achieve the same effect.
			
 
				 
			
 
				 Likewise, performance monitoring is stopped by calling
			
 
				-<c>starpu_profiling_status_set(STARPU_PROFILING_DISABLE)</c>. Note that this
			
 
				-does not reset the performance counters so that the application may consult
			
 
				-them later on.
			
 
				+starpu_profiling_status_set() with the parameter
			
 
				+::STARPU_PROFILING_DISABLE. Note that this does not reset the
			
 
				+performance counters so that the application may consult them later
			
 
				+on.
			
 
				 
			
 
				 More details about the performance monitoring API are available in section
			
 
				-@ref{Profiling API}.
			
 
				+\ref Profiling_API.
			
 
				 
			
 
				 \subsection Per-Task_feedback Per-task feedback
			
 
				 
			
 
				-If profiling is enabled, a pointer to a <c>struct starpu_profiling_task_info</c>
			
 
				-is put in the <c>.profiling_info</c> field of the <c>starpu_task</c>
			
 
				-structure when a task terminates.
			
 
				-This structure is automatically destroyed when the task structure is destroyed,
			
 
				-either automatically or by calling starpu_task_destroy().
			
 
				+If profiling is enabled, a pointer to a struct
			
 
				+starpu_profiling_task_info is put in the field
			
 
				+starpu_task::profiling_info when a task terminates. This structure is
			
 
				+automatically destroyed when the task structure is destroyed, either
			
 
				+automatically or by calling starpu_task_destroy().
			
 
				 
			
 
				-The <c>struct starpu_profiling_task_info</c> indicates the date when the
			
 
				+The structure starpu_profiling_task_info indicates the date when the
			
 
				 task was submitted (<c>submit_time</c>), started (<c>start_time</c>), and
			
 
				 terminated (<c>end_time</c>), relative to the initialization of
			
 
				 StarPU with starpu_init(). It also specifies the identifier of the worker
			
 
				 that has executed the task (<c>workerid</c>).
			
 
				 These date are stored as <c>timespec</c> structures which the user may convert
			
 
				-into micro-seconds using the starpu_timing_timespec_to_us() helper
			
 
				-function.
			
 
				+into micro-seconds using the helper function
			
 
				+starpu_timing_timespec_to_us().
			
 
				 
			
 
				 It it worth noting that the application may directly access this structure from
			
 
				-the callback executed at the end of the task. The <c>starpu_task</c> structure
			
 
				+the callback executed at the end of the task. The structure starpu_task
			
 
				 associated to the callback currently being executed is indeed accessible with
			
 
				-the starpu_task_get_current() function.
			
 
				+the function starpu_task_get_current().
			
 
				 
			
 
				 \subsection Per-codelet_feedback Per-codelet feedback
			
 
				 
			
 
				-The <c>per_worker_stats</c> field of the <c>struct starpu_codelet</c> structure is
			
 
				+The field starpu_codelet::per_worker_stats is
			
 
				 an array of counters. The i-th entry of the array is incremented every time a
			
 
				 task implementing the codelet is executed on the i-th worker.
			
 
				 This array is not reinitialized when profiling is enabled or disabled.
			
 
				 
			
 
				 \subsection Per-worker_feedback Per-worker feedback
			
 
				 
			
 
				-The second argument returned by the starpu_profiling_worker_get_info()
			
 
				-function is a <c>struct starpu_profiling_worker_info</c> that gives
			
 
				-statistics about the specified worker. This structure specifies when StarPU
			
 
				-started collecting profiling information for that worker (<c>start_time</c>),
			
 
				-the duration of the profiling measurement interval (<c>total_time</c>), the
			
 
				-time spent executing kernels (<c>executing_time</c>), the time spent sleeping
			
 
				-because there is no task to execute at all (<c>sleeping_time</c>), and the
			
 
				-number of tasks that were executed while profiling was enabled.
			
 
				-These values give an estimation of the proportion of time spent do real work,
			
 
				-and the time spent either sleeping because there are not enough executable
			
 
				-tasks or simply wasted in pure StarPU overhead.
			
 
				+The second argument returned by the function
			
 
				+starpu_profiling_worker_get_info() is a structure
			
 
				+starpu_profiling_worker_info that gives statistics about the specified
			
 
				+worker. This structure specifies when StarPU started collecting
			
 
				+profiling information for that worker (<c>start_time</c>), the
			
 
				+duration of the profiling measurement interval (<c>total_time</c>),
			
 
				+the time spent executing kernels (<c>executing_time</c>), the time
			
 
				+spent sleeping because there is no task to execute at all
			
 
				+(<c>sleeping_time</c>), and the number of tasks that were executed
			
 
				+while profiling was enabled. These values give an estimation of the
			
 
				+proportion of time spent do real work, and the time spent either
			
 
				+sleeping because there are not enough executable tasks or simply
			
 
				+wasted in pure StarPU overhead.
			
 
				 
			
 
				 Calling starpu_profiling_worker_get_info() resets the profiling
			
 
				 information associated to a worker.
			
 
				 
			
 
				 When an FxT trace is generated (see \ref Generating_traces_with_FxT), it is also
			
 
				-possible to use the <c>starpu_workers_activity</c> script (see \ref Monitoring_activity) to
			
 
				-generate a graphic showing the evolution of these values during the time, for
			
 
				-the different workers.
			
 
				+possible to use the tool <c>starpu_workers_activity</c> (see \ref
			
 
				+Monitoring_activity) to generate a graphic showing the evolution of
			
 
				+these values during the time, for the different workers.
			
 
				 
			
 
				 \subsection Bus-related_feedback Bus-related feedback
			
 
				 
			
@@ -104,8 +108,8 @@ how to enable/disable performance monitoring
 
				 what kind of information do we get ?
			
 
				 \endinternal
			
 
				 
			
 
				-The bus speed measured by StarPU can be displayed by using the
			
 
				-<c>starpu_machine_display</c> tool, for instance:
			
 
				+The bus speed measured by StarPU can be displayed by using the tool
			
 
				+<c>starpu_machine_display</c>, for instance:
			
 
				 
			
 
				 \verbatim
			
 
				 StarPU has found:
			
@@ -125,9 +129,9 @@ CUDA 2  4534.229519     2417.069025     2417.060863     0.000000
 
				 StarPU-Top is an interface which remotely displays the on-line state of a StarPU
			
 
				 application and permits the user to change parameters on the fly.
			
 
				 
			
 
				-Variables to be monitored can be registered by calling the
			
 
				+Variables to be monitored can be registered by calling the functions
			
 
				 starpu_top_add_data_boolean(), starpu_top_add_data_integer(),
			
 
				-starpu_top_add_data_float() functions, e.g.:
			
 
				+starpu_top_add_data_float(), e.g.:
			
 
				 
			
 
				 \code{.c}
			
 
				 starpu_top_data *data = starpu_top_add_data_integer("mynum", 0, 100, 1);
			
@@ -342,7 +346,7 @@ schedulable yet are shown in grey.
 
				 \section Performance_of_codelets Performance of codelets
			
 
				 
			
 
				 The performance model of codelets (see \ref Performance_model_example) can be examined by using the
			
 
				-<c>starpu_perfmodel_display</c> tool:
			
 
				+tool <c>starpu_perfmodel_display</c>:
			
 
				 
			
 
				 \verbatim
			
 
				 $ starpu_perfmodel_display -l
			
@@ -401,13 +405,13 @@ a3d3725e	4096           	4.763200e+00   	7.650928e-01   	100
 
				 \endverbatim
			
 
				 
			
 
				 The same can also be achieved by using StarPU's library API, see
			
 
				-@ref{Performance Model API} and notably the starpu_perfmodel_load_symbol()
			
 
				-function. The source code of the <c>starpu_perfmodel_display</c> tool can be a
			
 
				-useful example.
			
 
				+\ref Performance_Model_API and notably the function
			
 
				+starpu_perfmodel_load_symbol(). The source code of the tool
			
 
				+<c>starpu_perfmodel_display</c> can be a useful example.
			
 
				 
			
 
				-The <c>starpu_perfmodel_plot</c> tool can be used to draw performance models.
			
 
				-It writes a <c>.gp</c> file in the current directory, to be run in the
			
 
				-<c>gnuplot</c> tool, which shows the corresponding curve.
			
 
				+The tool <c>starpu_perfmodel_plot</c> can be used to draw performance
			
 
				+models. It writes a <c>.gp</c> file in the current directory, to be
			
 
				+run in the <c>gnuplot</c> tool, which shows the corresponding curve.
			
 
				 
			
 
				 When the <c>flops</c> field of tasks is set, <c>starpu_perfmodel_plot</c> can
			
 
				 directly draw a GFlops curve, by simply adding the <c>-f</c> option:
			
@@ -431,7 +435,7 @@ This will create profiling data files, and a <c>.gp</c> file in the current
 
				 directory, which draws the distribution of codelet time over the application
			
 
				 execution, according to data input size.
			
 
				 
			
 
				-This is also available in the <c>starpu_perfmodel_plot</c> tool, by passing it
			
 
				+This is also available in the tool <c>starpu_perfmodel_plot</c>, by passing it
			
 
				 the fxt trace:
			
 
				 
			
 
				 \verbatim
			
--- a/doc/doxygen/chapters/scheduling_context_hypervisor.doxy
+++ b/doc/doxygen/chapters/scheduling_context_hypervisor.doxy
@@ -10,36 +10,65 @@
 
				 
			
 
				 \section What_is_the_Hypervisor What is the Hypervisor
			
 
				 
			
 
				-StarPU proposes a platform for constructing Scheduling Contexts, for deleting and modifying them dynamically.
			
 
				-A parallel kernel, can thus be isolated into a scheduling context and interferences between several parallel kernels are avoided.
			
 
				-If the user knows exactly how many workers each scheduling context needs, he can assign them to the contexts at their creation time or modify them during the execution of the program.
			
 
				-
			
 
				-The Scheduling Context Hypervisor Plugin is available for the users who do not dispose of a regular parallelism, who cannot know in advance the exact size of the context and need to resize the contexts according to the behavior of the parallel kernels.
			
 
				-The Hypervisor receives information from StarPU concerning the execution of the tasks, the efficiency of the resources, etc. and it decides accordingly when and how the contexts can be resized.
			
 
				-Basic strategies of resizing scheduling contexts already exist but a platform for implementing additional custom ones is available.
			
 
				+StarPU proposes a platform for constructing Scheduling Contexts, for
			
 
				+deleting and modifying them dynamically. A parallel kernel, can thus
			
 
				+be isolated into a scheduling context and interferences between
			
 
				+several parallel kernels are avoided. If the user knows exactly how
			
 
				+many workers each scheduling context needs, he can assign them to the
			
 
				+contexts at their creation time or modify them during the execution of
			
 
				+the program.
			
 
				+
			
 
				+The Scheduling Context Hypervisor Plugin is available for the users
			
 
				+who do not dispose of a regular parallelism, who cannot know in
			
 
				+advance the exact size of the context and need to resize the contexts
			
 
				+according to the behavior of the parallel kernels.
			
 
				+
			
 
				+The Hypervisor receives information from StarPU concerning the
			
 
				+execution of the tasks, the efficiency of the resources, etc. and it
			
 
				+decides accordingly when and how the contexts can be resized. Basic
			
 
				+strategies of resizing scheduling contexts already exist but a
			
 
				+platform for implementing additional custom ones is available.
			
 
				 
			
 
				 \section Start_the_Hypervisor Start the Hypervisor
			
 
				 
			
 
				-The Hypervisor must be initialised once at the beging of the application. At this point a resizing policy should be indicated. This strategy depends on the information the application is able to provide to the hypervisor as well
			
 
				-as on the accuracy needed for the resizing procedure. For exemple, the application may be able to provide an estimation of the workload of the contexts. In this situation the hypervisor may decide what resources the contexts need.
			
 
				-However, if no information is provided the hypervisor evaluates the behavior of the resources and of the application and makes a guess about the future.
			
 
				+The Hypervisor must be initialised once at the beging of the
			
 
				+application. At this point a resizing policy should be indicated. This
			
 
				+strategy depends on the information the application is able to provide
			
 
				+to the hypervisor as well as on the accuracy needed for the resizing
			
 
				+procedure. For exemple, the application may be able to provide an
			
 
				+estimation of the workload of the contexts. In this situation the
			
 
				+hypervisor may decide what resources the contexts need. However, if no
			
 
				+information is provided the hypervisor evaluates the behavior of the
			
 
				+resources and of the application and makes a guess about the future.
			
 
				 The hypervisor resizes only the registered contexts.
			
 
				 
			
 
				 \section Interrogate_the_runtime Interrrogate the runtime
			
 
				 
			
 
				-The runtime provides the hypervisor with information concerning the behavior of the resources and the application. This is done by using the performance_counters, some callbacks indicating when the resources are idle or not efficient, when the application submits tasks or when it becames to slow.
			
 
				+The runtime provides the hypervisor with information concerning the
			
 
				+behavior of the resources and the application. This is done by using
			
 
				+the performance_counters, some callbacks indicating when the resources
			
 
				+are idle or not efficient, when the application submits tasks or when
			
 
				+it becames to slow.
			
 
				 
			
 
				 \section Trigger_the_Hypervisor Trigger the Hypervisor
			
 
				 
			
 
				-The resizing is triggered either when the application requires it or when the initials distribution of resources alters the performance of the application( the application is to slow or the resource are idle for too long time, threashold indicated by the user). When this happens different resizing strategy are applied that target minimising the total execution of the application, the instant speed or the idle time of the resources.
			
 
				+The resizing is triggered either when the application requires it or
			
 
				+when the initials distribution of resources alters the performance of
			
 
				+the application( the application is to slow or the resource are idle
			
 
				+for too long time, threashold indicated by the user). When this
			
 
				+happens different resizing strategy are applied that target minimising
			
 
				+the total execution of the application, the instant speed or the idle
			
 
				+time of the resources.
			
 
				 
			
 
				 \section Resizing_strategies Resizing strategies
			
 
				 
			
 
				 The plugin proposes several strategies for resizing the scheduling context.
			
 
				 
			
 
				 The <b>Application driven</b> strategy uses the user's input concerning the moment when he wants to resize the contexts.
			
 
				-Thus, the users tags the task that should trigger the resizing process. We can set directly the corresponding field in the <c>starpu_task</c> data structure is <c>hypervisor_tag</c> or
			
 
				-use the macro <c>STARPU_HYPERVISOR_TAG</c> in the <c>starpu_insert_task</c> function.
			
 
				+Thus, the users tags the task that should trigger the resizing
			
 
				+process. We can set directly the field starpu_task::hypervisor_tag or
			
 
				+use the macro ::STARPU_HYPERVISOR_TAG in the function
			
 
				+starpu_insert_task().
			
 
				 
			
 
				 \code{.c}
			
 
				 task.hypervisor_tag = 2;
			
@@ -72,7 +101,7 @@ sc_hypervisor_ioctl(sched_ctx,
 
				 
			
 
				 
			
 
				 The <b>Idleness</b> based strategy resizes the scheduling contexts every time one of their workers stays idle
			
 
				-for a period longer than the one imposed by the user (see @pxref{The user's input in the resizing process})
			
 
				+for a period longer than the one imposed by the user (see \ref The_user_input_in_the_resizing_process)
			
 
				 
			
 
				 \code{.c}
			
 
				 int workerids[3] = {1, 3, 10};
			
@@ -86,12 +115,16 @@ sc_hypervisor_ioctl(sched_ctx_id,
 
				 The <b>Gflops rate</b> based strategy resizes the scheduling contexts such that they all finish at the same time.
			
 
				 The velocity of each of them is considered and once one of them is significantly slower the resizing process is triggered.
			
 
				 In order to do these computations the user has to input the total number of instructions needed to be executed by the
			
 
				-parallel kernels and the number of instruction to be executed by each task.
			
 
				-The number of flops to be executed by a context are passed as parameter when they are registered to the hypervisor,
			
 
				- (<c>sc_hypervisor_register_ctx(sched_ctx_id, flops)</c>) and the one to be executed by each task are passed when the task is submitted.
			
 
				-The corresponding field in the <c>starpu_task</c> data structure is <c>flops</c> and
			
 
				-the corresponding macro in the starpu_insert_task() function is <c>STARPU_FLOPS</c>. When the task is executed
			
 
				-the resizing process is triggered.
			
 
				+parallel kernels and the number of instruction to be executed by each
			
 
				+task.
			
 
				+
			
 
				+The number of flops to be executed by a context are passed as
			
 
				+ parameter when they are registered to the hypervisor,
			
 
				+ (<c>sc_hypervisor_register_ctx(sched_ctx_id, flops)</c>) and the one
			
 
				+ to be executed by each task are passed when the task is submitted.
			
 
				+ The corresponding field is starpu_task::flops and the corresponding
			
 
				+ macro in the function starpu_insert_task() ::STARPU_FLOPS. When the
			
 
				+ task is executed the resizing process is triggered.
			
 
				 
			
 
				 \code{.c}
			
 
				 task.flops = 100;
			
--- a/doc/doxygen/chapters/scheduling_contexts.doxy
+++ b/doc/doxygen/chapters/scheduling_contexts.doxy
@@ -12,17 +12,31 @@ TODO: improve!
 
				 
			
 
				 \section General_Idea General Idea
			
 
				 
			
 
				-Scheduling contexts represent abstracts sets of workers that allow the programmers to control the distribution of computational resources (i.e. CPUs and
			
 
				-GPUs) to concurrent parallel kernels. The main goal is to minimize interferences between the execution of multiple parallel kernels, by partitioning the underlying pool of workers using contexts.
			
 
				+Scheduling contexts represent abstracts sets of workers that allow the
			
 
				+programmers to control the distribution of computational resources
			
 
				+(i.e. CPUs and GPUs) to concurrent parallel kernels. The main goal is
			
 
				+to minimize interferences between the execution of multiple parallel
			
 
				+kernels, by partitioning the underlying pool of workers using
			
 
				+contexts.
			
 
				 
			
 
				 \section Create_a_Context Create a Context
			
 
				 
			
 
				-By default, the application submits tasks to an initial context, which disposes of all the computation ressources available to StarPU (all the workers).
			
 
				-If the application programmer plans to launch several parallel kernels simultaneusly, by default these kernels will be executed within this initial context, using a single scheduler policy(see \ref Task_scheduling_policy).
			
 
				-Meanwhile, if the application programmer is aware of the demands of these kernels and of the specificity of the machine used to execute them, the workers can be divided between several contexts.
			
 
				-These scheduling contexts will isolate the execution of each kernel and they will permit the use of a scheduling policy proper to each one of them.
			
 
				-In order to create the contexts, you have to know the indentifiers of the workers running within StarPU.
			
 
				-By passing a set of workers together with the scheduling policy to the function starpu_sched_ctx_create(), you will get an identifier of the context created which you will use to indicate the context you want to submit the tasks to.
			
 
				+By default, the application submits tasks to an initial context, which
			
 
				+disposes of all the computation ressources available to StarPU (all
			
 
				+the workers). If the application programmer plans to launch several
			
 
				+parallel kernels simultaneusly, by default these kernels will be
			
 
				+executed within this initial context, using a single scheduler
			
 
				+policy(see \ref Task_scheduling_policy). Meanwhile, if the application
			
 
				+programmer is aware of the demands of these kernels and of the
			
 
				+specificity of the machine used to execute them, the workers can be
			
 
				+divided between several contexts. These scheduling contexts will
			
 
				+isolate the execution of each kernel and they will permit the use of a
			
 
				+scheduling policy proper to each one of them. In order to create the
			
 
				+contexts, you have to know the indentifiers of the workers running
			
 
				+within StarPU. By passing a set of workers together with the
			
 
				+scheduling policy to the function starpu_sched_ctx_create(), you will
			
 
				+get an identifier of the context created which you will use to
			
 
				+indicate the context you want to submit the tasks to.
			
 
				 
			
 
				 \code{.c}
			
 
				 /* the list of ressources the context will manage */
			
@@ -44,8 +58,13 @@ Combined workers are constructed depending on the entire topology of the machine
 
				 
			
 
				 \section Modify_a_Context Modify a Context
			
 
				 
			
 
				-A scheduling context can be modified dynamically. The applications may change its requirements during the execution and the programmer can add additional workers to a context or remove if no longer needed.
			
 
				-In the following example we have two scheduling contexts <c>sched_ctx1</c> and <c>sched_ctx2</c>. After executing a part of the tasks some of the workers of <c>sched_ctx1</c> will be moved to context <c>sched_ctx2</c>.
			
 
				+A scheduling context can be modified dynamically. The applications may
			
 
				+change its requirements during the execution and the programmer can
			
 
				+add additional workers to a context or remove if no longer needed. In
			
 
				+the following example we have two scheduling contexts
			
 
				+<c>sched_ctx1</c> and <c>sched_ctx2</c>. After executing a part of the
			
 
				+tasks some of the workers of <c>sched_ctx1</c> will be moved to
			
 
				+context <c>sched_ctx2</c>.
			
 
				 
			
 
				 \code{.c}
			
 
				 /* the list of ressources that context 1 will give away */
			
@@ -60,9 +79,14 @@ starpu_sched_ctx_remove_workers(workerids, 3, sched_ctx1);
 
				 
			
 
				 \section Delete_a_Context Delete a Context
			
 
				 
			
 
				-When a context is no longer needed it must be deleted. The application can indicate which context should keep the resources of a deleted one.
			
 
				-All the tasks of the context should be executed before doing this. If the application need to avoid a barrier before moving the resources from the deleted context to the inheritor one, the application can just indicate
			
 
				-when the last task was submitted. Thus, when this last task was submitted the resources will be move, but the context should still be deleted at some point of the application.
			
 
				+When a context is no longer needed it must be deleted. The application
			
 
				+can indicate which context should keep the resources of a deleted one.
			
 
				+All the tasks of the context should be executed before doing this. If
			
 
				+the application need to avoid a barrier before moving the resources
			
 
				+from the deleted context to the inheritor one, the application can
			
 
				+just indicate when the last task was submitted. Thus, when this last
			
 
				+task was submitted the resources will be move, but the context should
			
 
				+still be deleted at some point of the application.
			
 
				 
			
 
				 \code{.c}
			
 
				 /* when the context 2 will be deleted context 1 will be keep its resources */
			
@@ -89,14 +113,24 @@ starpu_sched_ctx_delete(sched_ctx1);
 
				 
			
 
				 \section Empty_Context Empty Context
			
 
				 
			
 
				-A context may not have any resources at the begining or at a certain moment of the execution. Task can still be submitted to these contexts and they will execute them as soon as they will have resources.
			
 
				-A list of tasks pending to be executed is kept and when workers are added to the contexts the tasks are submitted. However, if no resources are allocated the program will not terminate.
			
 
				-If these tasks have not much priority the programmer can forbid the application to submitted them by calling the function starpu_sched_ctx_stop_task_submission().
			
 
				+A context may not have any resources at the begining or at a certain
			
 
				+moment of the execution. Task can still be submitted to these contexts
			
 
				+and they will execute them as soon as they will have resources. A list
			
 
				+of tasks pending to be executed is kept and when workers are added to
			
 
				+the contexts the tasks are submitted. However, if no resources are
			
 
				+allocated the program will not terminate. If these tasks have not much
			
 
				+priority the programmer can forbid the application to submitted them
			
 
				+by calling the function starpu_sched_ctx_stop_task_submission().
			
 
				 
			
 
				 \section Contexts_Sharing_Workers Contexts Sharing Workers
			
 
				 
			
 
				-Contexts may share workers when a single context cannot execute efficiently enough alone on these workers or when the application decides to express a hierarchy of contexts. The workers apply
			
 
				-an alogrithm of ``Round-Robin'' to chose the context on which they will ``pop'' next. By using the function <c>void starpu_sched_ctx_set_turn_to_other_ctx(int workerid, unsigned sched_ctx_id)</c>
			
 
				-the programmer can impose the <c>workerid</c> to ``pop'' in the context <c>sched_ctx_id</c> next.
			
 
				+Contexts may share workers when a single context cannot execute
			
 
				+efficiently enough alone on these workers or when the application
			
 
				+decides to express a hierarchy of contexts. The workers apply an
			
 
				+alogrithm of ``Round-Robin'' to chose the context on which they will
			
 
				+``pop'' next. By using the function
			
 
				+starpu_sched_ctx_set_turn_to_other_ctx(), the programmer can impose
			
 
				+the <c>workerid</c> to ``pop'' in the context <c>sched_ctx_id</c>
			
 
				+next.
			
 
				 
			
 
				 */