/*
 * This file is part of the StarPU Handbook.
 * Copyright (C) 2009--2011  Universit@'e de Bordeaux
 * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017  CNRS
 * Copyright (C) 2011, 2012, 2017  INRIA
 * See the file version.doxy for copying conditions.
 */

/*! \defgroup API_Codelet_And_Tasks Codelet And Tasks

\brief This section describes the interface to manipulate codelets and tasks.

\enum starpu_codelet_type
\ingroup API_Codelet_And_Tasks
Describes the type of parallel task. See \ref ParallelTasks for details.
\var starpu_codelet_type::STARPU_SEQ
    (default) for classical sequential tasks.
\var starpu_codelet_type::STARPU_SPMD
    for a parallel task whose threads are handled by StarPU, the code
    has to use starpu_combined_worker_get_size() and
    starpu_combined_worker_get_rank() to distribute the work.
\var starpu_codelet_type::STARPU_FORKJOIN
    for a parallel task whose threads are started by the codelet
    function, which has to use starpu_combined_worker_get_size() to
    determine how many threads should be started.

\enum starpu_task_status
\ingroup API_Codelet_And_Tasks
Task status
\var starpu_task_status::STARPU_TASK_INVALID
    The task has just been initialized.
\var starpu_task_status::STARPU_TASK_BLOCKED
    The task has just been submitted, and its dependencies has not
    been checked yet.
\var starpu_task_status::STARPU_TASK_READY
    The task is ready for execution.
\var starpu_task_status::STARPU_TASK_RUNNING
    The task is running on some worker.
\var starpu_task_status::STARPU_TASK_FINISHED
    The task is finished executing.
\var starpu_task_status::STARPU_TASK_BLOCKED_ON_TAG
    The task is waiting for a tag.
\var starpu_task_status::STARPU_TASK_BLOCKED_ON_TASK
    The task is waiting for a task.
\var starpu_task_status::STARPU_TASK_BLOCKED_ON_DATA
    The task is waiting for some data.
\var starpu_task_status::STARPU_TASK_STOPPED
    The task is stopped.

\def STARPU_NOWHERE
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where
to specify that the codelet has no computation part, and thus does not need
to be scheduled, and data does not need to be actually loaded. This is thus
essentially used for synchronization tasks.

\def STARPU_CPU
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where)
to specify the codelet (or the task) may be executed on a CPU processing unit.

\def STARPU_CUDA
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where)
to specify the codelet (or the task) may be executed on a CUDA processing unit.

\def STARPU_OPENCL
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where) to
specify the codelet (or the task) may be executed on a OpenCL processing unit.

\def STARPU_MIC
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where) to
specify the codelet (or the task) may be executed on a MIC processing unit.

\def STARPU_MPI_MS
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where) to
specify the codelet (or the task) may be executed on a MPI Slave processing unit.

\def STARPU_SCC
\ingroup API_Codelet_And_Tasks
This macro is used when setting the field starpu_codelet::where (or starpu_task::where) to
specify the codelet (or the task) may be executed on an SCC processing unit.

\def STARPU_MAIN_RAM
\ingroup API_Codelet_And_Tasks
This macro is used when the RAM memory node is specified.

\def STARPU_MULTIPLE_CPU_IMPLEMENTATIONS
\deprecated
\ingroup API_Codelet_And_Tasks
Setting the field starpu_codelet::cpu_func with this macro
indicates the codelet will have several implementations. The use of
this macro is deprecated. One should always only define the field
starpu_codelet::cpu_funcs.

\def STARPU_MULTIPLE_CUDA_IMPLEMENTATIONS
\deprecated
\ingroup API_Codelet_And_Tasks
Setting the field starpu_codelet::cuda_func with this macro
indicates the codelet will have several implementations. The use of
this macro is deprecated. One should always only define the field
starpu_codelet::cuda_funcs.

\def STARPU_MULTIPLE_OPENCL_IMPLEMENTATIONS
\deprecated
\ingroup API_Codelet_And_Tasks
Setting the field starpu_codelet::opencl_func with
this macro indicates the codelet will have several implementations.
The use of this macro is deprecated. One should always only define the
field starpu_codelet::opencl_funcs.

\def STARPU_NMAXBUFS
\ingroup API_Codelet_And_Tasks
Defines the maximum number of buffers that tasks will be able to take
as parameters. The default value is 8, it can be changed by using the
configure option \ref enable-maxbuffers "--enable-maxbuffers".

\def STARPU_VARIABLE_NBUFFERS
\ingroup API_Codelet_And_Tasks
Value to set in starpu_codelet::nbuffers to specify that the codelet can accept
a variable number of buffers, specified in starpu_task::nbuffers.

\def STARPU_CUDA_ASYNC
\ingroup API_Codelet_And_Tasks
Value to be set in starpu_codelet::cuda_flags to allow asynchronous CUDA kernel execution.

\def STARPU_OPENCL_ASYNC
\ingroup API_Codelet_And_Tasks
Value to be set in starpu_codelet::opencl_flags to allow asynchronous OpenCL kernel execution.

\def STARPU_CODELET_SIMGRID_EXECUTE
\ingroup API_Codelet_And_Tasks
Value to be set in starpu_codelet::flags to execute the codelet functions even in simgrid mode.

\def STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT
\ingroup API_Codelet_And_Tasks
Value to be set in starpu_codelet::flags to execute the codelet functions even in simgrid mode,
and later inject the measured timing inside the simulation.

\typedef starpu_cpu_func_t
\ingroup API_Codelet_And_Tasks
CPU implementation of a codelet.

\typedef starpu_cuda_func_t
\ingroup API_Codelet_And_Tasks
CUDA implementation of a codelet.

\typedef starpu_opencl_func_t
\ingroup API_Codelet_And_Tasks
OpenCL implementation of a codelet.

\typedef starpu_mic_func_t
\ingroup API_Codelet_And_Tasks
MIC implementation of a codelet.

\typedef starpu_mpi_ms_func_t
\ingroup API_Codelet_And_Tasks
MPI Master Slave implementation of a codelet.

\typedef starpu_scc_func_t
\ingroup API_Codelet_And_Tasks
SCC implementation of a codelet.

\typedef starpu_mic_kernel_t
\ingroup API_Codelet_And_Tasks
MIC kernel for a codelet

\typedef starpu_mpi_ms_kernel_t
\ingroup API_Codelet_And_Tasks
MPI Master Slave kernel for a codelet

\typedef starpu_scc_kernel_t
\ingroup API_Codelet_And_Tasks
SCC kernel for a codelet

\struct starpu_codelet
\ingroup API_Codelet_And_Tasks
The codelet structure describes a kernel that is possibly
implemented on various targets. For compatibility, make sure to
initialize the whole structure to zero, either by using explicit
memset, or the function starpu_codelet_init(), or by letting the
compiler implicitly do it in e.g. static storage case.
\var uint32_t starpu_codelet::where
    Optional field to indicate which types of processing units are
    able to execute the codelet. The different values ::STARPU_CPU,
    ::STARPU_CUDA, ::STARPU_OPENCL can be combined to specify on which
    types of processing units the codelet can be executed.
    ::STARPU_CPU|::STARPU_CUDA for instance indicates that the codelet
    is implemented for both CPU cores and CUDA devices while
    ::STARPU_OPENCL indicates that it is only available on OpenCL
    devices. If the field is unset, its value will be automatically
    set based on the availability of the XXX_funcs fields defined
    below. It can also be set to ::STARPU_NOWHERE to specify that no
    computation has to be actually done.

\var int (*starpu_codelet::can_execute)(unsigned workerid, struct starpu_task *task, unsigned nimpl)
    Define a function which should return 1 if the worker designated
    by \p workerid can execute the \p nimpl -th implementation of \p
    task, 0 otherwise.

\var enum starpu_codelet_type starpu_codelet::type
    Optional field to specify the type of the codelet. The default is
    ::STARPU_SEQ, i.e. usual sequential implementation. Other values
    (::STARPU_SPMD or ::STARPU_FORKJOIN declare that a parallel
    implementation is also available. See \ref ParallelTasks for
    details.

\var int starpu_codelet::max_parallelism
    Optional field. If a parallel implementation is available, this
    denotes the maximum combined worker size that StarPU will use to
    execute parallel tasks for this codelet.

\var starpu_cpu_func_t starpu_codelet::cpu_func
\deprecated
    Optional field which has been made deprecated. One should use
    instead the field starpu_codelet::cpu_funcs.

\var starpu_cuda_func_t starpu_codelet::cuda_func
\deprecated
    Optional field which has been made deprecated. One should use
    instead the starpu_codelet::cuda_funcs field.

\var starpu_opencl_func_t starpu_codelet::opencl_func
\deprecated
    Optional field which has been made deprecated. One should use
    instead the starpu_codelet::opencl_funcs field.

\var starpu_cpu_func_t starpu_codelet::cpu_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to the CPU implementations of
    the codelet. The functions prototype must be:
    \code{.c}
    void cpu_func(void *buffers[], void *cl_arg)
    \endcode
    The first argument being the array of data managed by the data
    management library, and the second argument is a pointer to the
    argument passed from the field starpu_task::cl_arg. If the field
    starpu_codelet::where is set, then the field
    starpu_codelet::cpu_funcs is ignored if ::STARPU_CPU does not
    appear in the field starpu_codelet::where, it must be
    non-<c>NULL</c> otherwise.

\var char *starpu_codelet::cpu_funcs_name[STARPU_MAXIMPLEMENTATIONS]
    Optional array of strings which provide the name of the CPU
    functions referenced in the array starpu_codelet::cpu_funcs. This
    can be used when running on MIC devices or the SCC platform, for
    StarPU to simply look up the MIC function implementation through
    its name.

\var starpu_cuda_func_t starpu_codelet::cuda_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to the CUDA implementations of
    the codelet. The functions must be host-functions written in the
    CUDA runtime API. Their prototype must be:
    \code{.c}
    void cuda_func(void *buffers[], void *cl_arg)
    \endcode
    If the field starpu_codelet::where is set, then the field
    starpu_codelet::cuda_funcs is ignored if ::STARPU_CUDA does not
    appear in the field starpu_codelet::where, it must be
    non-<c>NULL</c> otherwise.

\var char starpu_codelet::cuda_flags[STARPU_MAXIMPLEMENTATIONS]
    Optional array of flags for CUDA execution. They specify some
    semantic details about CUDA kernel execution, such as asynchronous
    execution.

\var starpu_opencl_func_t starpu_codelet::opencl_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to the OpenCL implementations
    of the codelet. The functions prototype must be:
    \code{.c}
    void opencl_func(void *buffers[], void *cl_arg)
    \endcode
    If the field starpu_codelet::where field is set, then the field
    starpu_codelet::opencl_funcs is ignored if ::STARPU_OPENCL does
    not appear in the field starpu_codelet::where, it must be
    non-<c>NULL</c> otherwise.

\var char starpu_codelet::opencl_flags[STARPU_MAXIMPLEMENTATIONS]
    Optional array of flags for OpenCL execution. They specify some
    semantic details about OpenCL kernel execution, such as
    asynchronous execution.

\var starpu_mic_func_t starpu_codelet::mic_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to a function which returns
    the MIC implementation of the codelet. The functions prototype
    must be:
    \code{.c}
    starpu_mic_kernel_t mic_func(struct starpu_codelet *cl, unsigned nimpl)
    \endcode
    If the field starpu_codelet::where is set, then the field
    starpu_codelet::mic_funcs is ignored if ::STARPU_MIC does not
    appear in the field starpu_codelet::where. It can be <c>NULL</c>
    if starpu_codelet::cpu_funcs_name is non-<c>NULL</c>, in which
    case StarPU will simply make a symbol lookup to get the
    implementation.

\var starpu_mpi_ms_func_t starpu_codelet::mpi_ms_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to a function which returns
    the MPI Master Slave implementation of the codelet. The functions
    prototype must be:
    \code{.c}
    starpu_mpi_ms_kernel_t mpi_ms_func(struct starpu_codelet *cl, unsigned nimpl)
    \endcode
    If the field starpu_codelet::where is set, then the field
    starpu_codelet::mpi_ms_funcs is ignored if ::STARPU_MPI_MS does
    not appear in the field starpu_codelet::where. It can be
    <c>NULL</c> if starpu_codelet::cpu_funcs_name is non-<c>NULL</c>,
    in which case StarPU will simply make a symbol lookup to get the
    implementation.

\var starpu_scc_func_t starpu_codelet::scc_funcs[STARPU_MAXIMPLEMENTATIONS]
    Optional array of function pointers to a function which returns
    the SCC implementation of the codelet. The functions prototype
    must be:
    \code{.c}
    starpu_scc_kernel_t scc_func(struct starpu_codelet *cl, unsigned nimpl)
    \endcode
    If the field starpu_codelet::where is set, then the field
    starpu_codelet::scc_funcs is ignored if ::STARPU_SCC does not
    appear in the field starpu_codelet::where. It can be <c>NULL</c>
    if starpu_codelet::cpu_funcs_name is non-<c>NULL</c>, in which
    case StarPU will simply make a symbol lookup to get the
    implementation.

\var int starpu_codelet::nbuffers
    Specify the number of arguments taken by the codelet. These
    arguments are managed by the DSM and are accessed from the <c>void
    *buffers[]</c> array. The constant argument passed with the field
    starpu_task::cl_arg is not counted in this number. This value
    should not be above \ref STARPU_NMAXBUFS. It may be set to \ref
    STARPU_VARIABLE_NBUFFERS to specify that the number of buffers and
    their access modes will be set in starpu_task::nbuffers and
    starpu_task::modes or starpu_task::dyn_modes, which thus permits
    to define codelets with a varying number of data.

\var enum starpu_data_access_mode starpu_codelet::modes[STARPU_NMAXBUFS]
    Is an array of ::starpu_data_access_mode. It describes the
    required access modes to the data neeeded by the codelet (e.g.
    ::STARPU_RW). The number of entries in this array must be
    specified in the field starpu_codelet::nbuffers, and should not
    exceed \ref STARPU_NMAXBUFS. If unsufficient, this value can be
    set with the configure option
    \ref enable-maxbuffers "--enable-maxbuffers".

\var enum starpu_data_access_mode *starpu_codelet::dyn_modes
    Is an array of ::starpu_data_access_mode. It describes the
    required access modes to the data needed by the codelet (e.g.
    ::STARPU_RW). The number of entries in this array must be
    specified in the field starpu_codelet::nbuffers. This field should
    be used for codelets having a number of datas greater than
    \ref STARPU_NMAXBUFS (see \ref SettingManyDataHandlesForATask).
    When defining a codelet, one should either define this field or
    the field starpu_codelet::modes defined above.

\var unsigned starpu_codelet::specific_nodes
    Default value is 0. If this flag is set, StarPU will not
    systematically send all data to the memory node where the task will
    be executing, it will read the starpu_codelet::nodes or
    starpu_codelet::dyn_nodes array to determine, for each data,
    whether to send it on the memory node where the task will be
    executing (-1), or on a specific node (!= -1).

\var int starpu_codelet::nodes[STARPU_NMAXBUFS]
    Optional field. When starpu_codelet::specific_nodes is 1, this
    specifies the memory nodes where each data should be sent to for
    task execution. The number of entries in this array is
    starpu_codelet::nbuffers, and should not exceed \ref STARPU_NMAXBUFS.

\var int *starpu_codelet::dyn_nodes
    Optional field. When starpu_codelet::specific_nodes is 1, this
    specifies the memory nodes where each data should be sent to for
    task execution. The number of entries in this array is
    starpu_codelet::nbuffers. This field should be used for codelets
    having a number of datas greater than \ref STARPU_NMAXBUFS
    (see \ref SettingManyDataHandlesForATask). When defining a
    codelet, one should either define this field or the field
    starpu_codelet::nodes defined above.

\var struct starpu_perfmodel *starpu_codelet::model
    Optional pointer to the task duration performance model associated
    to this codelet. This optional field is ignored when set to
    <c>NULL</c> or when its field starpu_perfmodel::symbol is not set.

\var struct starpu_perfmodel *starpu_codelet::energy_model
    Optional pointer to the task energy consumption performance model
    associated to this codelet. This optional field is ignored when
    set to <c>NULL</c> or when its field starpu_perfmodel::symbol is
    not set. In the case of parallel codelets, this has to account for
    all processing units involved in the parallel execution.

\var unsigned long starpu_codelet::per_worker_stats[STARPU_NMAXWORKERS]
    Optional array for statistics collected at runtime: this is filled
    by StarPU and should not be accessed directly, but for example by
    calling the function starpu_codelet_display_stats() (See
    starpu_codelet_display_stats() for details).

\var const char *starpu_codelet::name
    Optional name of the codelet. This can be useful for debugging
    purposes.

\var const char *starpu_codelet::flags
    Various flags for the codelet.

\fn void starpu_codelet_init(struct starpu_codelet *cl)
\ingroup API_Codelet_And_Tasks
Initialize \p cl with default values. Codelets should
preferably be initialized statically as shown in
\ref DefiningACodelet. However such a initialisation is not always
possible, e.g. when using C++.

\struct starpu_data_descr
\ingroup API_Codelet_And_Tasks
This type is used to describe a data handle along with an access mode.
\var starpu_data_handle_t starpu_data_descr::handle
    describes a data
\var enum starpu_data_access_mode starpu_data_descr::mode
    describes its access mode

\struct starpu_task
\ingroup API_Codelet_And_Tasks
The structure describes a task that can be offloaded on the
various processing units managed by StarPU. It instantiates a codelet.
It can either be allocated dynamically with the function
starpu_task_create(), or declared statically. In the latter case, the
programmer has to zero the structure starpu_task and to fill the
different fields properly. The indicated default values correspond to
the configuration of a task allocated with starpu_task_create().

\var const char *starpu_task::name
    Optional name of the task. This can be useful for debugging
    purposes.

\var struct starpu_codelet *starpu_task::cl
    Is a pointer to the corresponding structure starpu_codelet. This
    describes where the kernel should be executed, and supplies the
    appropriate implementations. When set to <c>NULL</c>, no code is
    executed during the tasks, such empty tasks can be useful for
    synchronization purposes.
    This field has been made deprecated. One should use instead the
    field starpu_task::handles to specify the data handles accessed by
    the task. The access modes are now defined in the field
    starpu_codelet::modes.

\var uint32_t starpu_task::where
    When set, specifies where the task is allowed to be executed.
    When unset, it takes the value of starpu_codelet::where.

\var int starpu_task::nbuffers
    Specifies the number of buffers. This is only used when
    starpu_codelet::nbuffers is \ref STARPU_VARIABLE_NBUFFERS.

\var starpu_data_handle_t starpu_task::handles[STARPU_NMAXBUFS]
    Is an array of ::starpu_data_handle_t. It specifies the handles to
    the different pieces of data accessed by the task. The number of
    entries in this array must be specified in the field
    starpu_codelet::nbuffers, and should not exceed
    \ref STARPU_NMAXBUFS. If unsufficient, this value can be set with
    the configure option \ref enable-maxbuffers "--enable-maxbuffers".

\var starpu_data_handle_t *starpu_task::dyn_handles
    Is an array of ::starpu_data_handle_t. It specifies the handles to
    the different pieces of data accessed by the task. The number of
    entries in this array must be specified in the field
    starpu_codelet::nbuffers. This field should be used for tasks
    having a number of datas greater than \ref STARPU_NMAXBUFS (see
    \ref SettingManyDataHandlesForATask).
    When defining a task, one should either define this field or the
    field starpu_task::handles defined above.

\var void *starpu_task::interfaces[STARPU_NMAXBUFS]
    The actual data pointers to the memory node where execution will
    happen, managed by the DSM.

\var void **starpu_task::dyn_interfaces
    The actual data pointers to the memory node where execution will
    happen, managed by the DSM. Is used when the field
    starpu_task::dyn_handles is defined.

\var enum starpu_data_access_mode starpu_task::modes[STARPU_NMAXBUFS]
    Is used only when starpu_codelet::nbuffers is
    \ref STARPU_VARIABLE_NBUFFERS.
    It is an array of ::starpu_data_access_mode. It describes the
    required access modes to the data neeeded by the codelet (e.g.
    ::STARPU_RW). The number of entries in this array must be
    specified in the field starpu_task::nbuffers, and should not
    exceed \ref STARPU_NMAXBUFS. If unsufficient, this value can be
    set with the configure option
    \ref enable-maxbuffers "--enable-maxbuffers".

\var enum starpu_data_access_mode *starpu_task::dyn_modes
    Is used only when starpu_codelet::nbuffers is \ref
    STARPU_VARIABLE_NBUFFERS. It is an array of
    ::starpu_data_access_mode. It describes the required access modes
    to the data needed by the codelet (e.g. ::STARPU_RW). The number
    of entries in this array must be specified in the field
    starpu_codelet::nbuffers. This field should be used for codelets
    having a number of datas greater than \ref STARPU_NMAXBUFS
    (see \ref SettingManyDataHandlesForATask). When defining a
    codelet, one should either define this field or the field
    starpu_task::modes defined above.

\var void *starpu_task::cl_arg
    Optional pointer which is passed to the codelet through the second
    argument of the codelet implementation (e.g.
    starpu_codelet::cpu_func or starpu_codelet::cuda_func). The
    default value is <c>NULL</c>. starpu_codelet_pack_args() and
    starpu_codelet_unpack_args() are helpers that can can be used to
    respectively pack and unpack data into and from it, but the
    application can manage it any way, the only requirement is that
    the size of the data must be set in starpu_task::cl_arg_size .

\var size_t starpu_task::cl_arg_size
    Optional field. For some specific drivers, the pointer
    starpu_task::cl_arg cannot not be directly given to the driver
    function. A buffer of size starpu_task::cl_arg_size needs to be
    allocated on the driver. This buffer is then filled with the
    starpu_task::cl_arg_size bytes starting at address
    starpu_task::cl_arg. In this case, the argument given to the
    codelet is therefore not the starpu_task::cl_arg pointer, but the
    address of the buffer in local store (LS) instead. This field is
    ignored for CPU, CUDA and OpenCL codelets, where the
    starpu_task::cl_arg pointer is given as such.

\var unsigned starpu_task::cl_arg_free
    Optional field. In case starpu_task::cl_arg was allocated by the
    application through <c>malloc()</c>, setting
    starpu_task::cl_arg_free to 1 makes StarPU automatically call
    <c>free(cl_arg)</c> when destroying the task. This saves the user
    from defining a callback just for that. This is mostly useful when
    targetting MIC or SCC, where the codelet does not execute in the
    same memory space as the main thread.

\var void (*starpu_task::callback_func)(void *)
    Optional field, the default value is <c>NULL</c>. This is a
    function pointer of prototype <c>void (*f)(void *)</c> which
    specifies a possible callback. If this pointer is non-<c>NULL</c>,
    the callback function is executed on the host after the execution
    of the task. Tasks which depend on it might already be executing.
    The callback is passed the value contained in the
    starpu_task::callback_arg field. No callback is executed if the
    field is set to <c>NULL</c>.

\var void *starpu_task::callback_arg (optional) (default: <c>NULL</c>)
    Optional field, the default value is <c>NULL</c>. This is the
    pointer passed to the callback function. This field is ignored if
    the field starpu_task::callback_func is set to <c>NULL</c>.

\var unsigned starpu_task::callback_arg_free
    Optional field. In case starpu_task::callback_arg was allocated by
    the application through <c>malloc()</c>, setting
    starpu_task::callback_arg_free to 1 makes StarPU automatically
    call <c>free(callback_arg)</c> when destroying the task.

\var void (*starpu_task::prologue_callback_func)(void *)
    Optional field, the default value is <c>NULL</c>. This is a
    function pointer of prototype <c>void (*f)(void *)</c> which
    specifies a possible callback.
    If this pointer is non-<c>NULL</c>, the callback function is
    executed on the host when the task becomes ready for execution,
    before getting scheduled. The callback is passed the value
    contained in the starpu_task::prologue_callback_arg field. No
    callback is executed if the field is set to <c>NULL</c>.

\var void *starpu_task::prologue_callback_arg (optional) (default: <c>NULL</c>)
    Optional field, the default value is <c>NULL</c>. This is the
    pointer passed to the prologue callback function. This field is
    ignored if the field starpu_task::prologue_callback_func is set to
    <c>NULL</c>.

\var unsigned starpu_task::prologue_callback_arg_free
    Optional field. In case starpu_task::prologue_callback_arg was
    allocated by the application through <c>malloc()</c>, setting
    starpu_task::prologue_callback_arg_free to 1 makes StarPU
    automatically call <c>free(prologue_callback_arg)</c> when
    destroying the task.

\var void (*starpu_task::prologue_callback_pop_func)(void *)
    todo
\var void *starpu_task::prologue_callback_pop_arg (optional) (default: <c>NULL</c>)
    todo
\var unsigned starpu_task::prologue_callback_pop_arg_free
    todo

\var unsigned starpu_task::use_tag
    Optional field, the default value is 0. If set, this flag
    indicates that the task should be associated with the tag
    contained in the starpu_task::tag_id field. Tag allow the
    application to synchronize with the task and to express task
    dependencies easily.

\var starpu_tag_t starpu_task::tag_id
    This optional field contains the tag associated to the task if the
    field starpu_task::use_tag is set, it is ignored otherwise.

\var unsigned starpu_task::sequential_consistency
    If this flag is set (which is the default), sequential consistency
    is enforced for the data parameters of this task for which
    sequential consistency is enabled. Clearing this flag permits to
    disable sequential consistency for this task, even if data have it
    enabled.

\var unsigned starpu_task::synchronous
    If this flag is set, the function starpu_task_submit() is blocking
    and returns only when the task has been executed (or if no worker
    is able to process the task). Otherwise, starpu_task_submit()
    returns immediately.

\var int starpu_task::priority
    Optional field, the default value is ::STARPU_DEFAULT_PRIO. This
    field indicates a level of priority for the task. This is an
    integer value that must be set between the return values of the
    function starpu_sched_get_min_priority() for the least important
    tasks, and that of the function starpu_sched_get_max_priority()
    for the most important tasks (included). The ::STARPU_MIN_PRIO and
    ::STARPU_MAX_PRIO macros are provided for convenience and
    respectively returns the value of starpu_sched_get_min_priority()
    and starpu_sched_get_max_priority(). Default priority is
    ::STARPU_DEFAULT_PRIO, which is always defined as 0 in order to
    allow static task initialization. Scheduling strategies that take
    priorities into account can use this parameter to take better
    scheduling decisions, but the scheduling policy may also ignore
    it.

\var unsigned starpu_task::execute_on_a_specific_worker
    Default value is 0. If this flag is set, StarPU will bypass the
    scheduler and directly affect this task to the worker specified by
    the field starpu_task::workerid.

\var unsigned starpu_task::workerid
    Optional field. If the field
    starpu_task::execute_on_a_specific_worker is set, this field
    indicates the identifier of the worker that should process this
    task (as returned by starpu_worker_get_id()). This field is
    ignored if the field starpu_task::execute_on_a_specific_worker is
    set to 0.

\var unsigned starpu_task::workerorder
    Optional field. If the field
    starpu_task::execute_on_a_specific_worker is set, this field
    indicates the per-worker consecutive order in which tasks should
    be executed on the worker. Tasks will be executed in consecutive
    starpu_task::workerorder values, thus ignoring the availability
    order or task priority. See \ref StaticScheduling for more
    details. This field is ignored if the field
    starpu_task::execute_on_a_specific_worker is set to 0.

\var starpu_task_bundle_t starpu_task::bundle
    Optional field. The bundle that includes this task. If no bundle
    is used, this should be <c>NULL</c>.

\var unsigned starpu_task::detach
    Optional field, default value is 1. If this flag is set, it is not
    possible to synchronize with the task by the means of
    starpu_task_wait() later on. Internal data structures are only
    guaranteed to be freed once starpu_task_wait() is called if the
    flag is not set.

\var unsigned starpu_task::destroy
    Optional value. Default value is 0 for starpu_task_init(), and 1
    for starpu_task_create(). If this flag is set, the task structure
    will automatically be freed, either after the execution of the
    callback if the task is detached, or during starpu_task_wait()
    otherwise. If this flag is not set, dynamically allocated data
    structures will not be freed until starpu_task_destroy() is called
    explicitly. Setting this flag for a statically allocated task
    structure will result in undefined behaviour. The flag is set to 1
    when the task is created by calling starpu_task_create(). Note
    that starpu_task_wait_for_all() will not free any task.

\var unsigned starpu_task::regenerate
    Optional field. If this flag is set, the task will be re-submitted
    to StarPU once it has been executed. This flag must not be set if
    the flag starpu_task::destroy is set. This flag must be set before
    making another task depend on this one.

\var enum starpu_task_status starpu_task::status
    Optional field. Current state of the task.

\var struct starpu_profiling_task_info *starpu_task::profiling_info
    Optional field. Profiling information for the task.

\var double starpu_task::predicted
    Output field. Predicted duration of the task. This field is only
    set if the scheduling strategy uses performance models.

\var double starpu_task::predicted_transfer
    Optional field. Predicted data transfer duration for the task in
    microseconds. This field is only valid if the scheduling strategy
    uses performance models.

\var double starpu_task::predicted_start
    todo

\var struct starpu_task *starpu_task::prev
\private
    A pointer to the previous task. This should only be used by
    StarPU.

\var struct starpu_task *starpu_task::next
\private
    A pointer to the next task. This should only be used by StarPU.

\var unsigned int starpu_task::mf_skip
\private
    This is only used for tasks that use multiformat handle. This
    should only be used by StarPU.

\var double starpu_task::flops
    This can be set to the number of floating points operations that
    the task will have to achieve. This is useful for easily getting
    GFlops curves from the tool <c>starpu_perfmodel_plot</c>, and for
    the hypervisor load balancing.

\var void *starpu_task::starpu_private
\private
    This is private to StarPU, do not modify. If the task is allocated
    by hand (without starpu_task_create()), this field should be set
    to <c>NULL</c>.

\var int starpu_task::magic
\private
    This field is set when initializing a task. The function
    starpu_task_submit() will fail if the field does not have the
    right value. This will hence avoid submitting tasks which have not
    been properly initialised.

\var unsigned starpu_task::sched_ctx
    Scheduling context.

\var int starpu_task::hypervisor_tag
    Helps the hypervisor monitor the execution of this task.

\var unsigned starpu_task::possibly_parallel
    todo

\var unsigned starpu_task::prefetched
    todo

\var unsigned starpu_task::scheduled
    Whether the scheduler has pushed the task on some queue

\var struct starpu_omp_task *starpu_task::omp_task
    todo

\fn void starpu_task_init(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Initialize \p task with default values. This function is
implicitly called by starpu_task_create(). By default, tasks initialized
with starpu_task_init() must be deinitialized explicitly with
starpu_task_clean(). Tasks can also be initialized statically, using
::STARPU_TASK_INITIALIZER.

\def STARPU_TASK_INITIALIZER
\ingroup API_Codelet_And_Tasks
It is possible to initialize statically allocated tasks with
this value. This is equivalent to initializing a structure starpu_task
with the function starpu_task_init().

\def STARPU_TASK_GET_NBUFFERS(task)
\ingroup API_Codelet_And_Tasks
Return the number of buffers for \p task, i.e. starpu_codelet::nbuffers, or
starpu_task::nbuffers if the former is \ref STARPU_VARIABLE_NBUFFERS.

\def STARPU_TASK_GET_HANDLE(task, i)
\ingroup API_Codelet_And_Tasks
Return the \p i -th data handle of \p task. If \p task
is defined with a static or dynamic number of handles, will either
return the \p i -th element of the field starpu_task::handles or the \p
i -th element of the field starpu_task::dyn_handles
(see \ref SettingManyDataHandlesForATask)

\def STARPU_TASK_SET_HANDLE(task, handle, i)
\ingroup API_Codelet_And_Tasks
Set the \p i -th data handle of \p task with \p handle.
If \p task is defined with a static or dynamic number of
handles, will either set the \p i -th element of the field
starpu_task::handles or the \p i -th element of the field
starpu_task::dyn_handles
(see \ref SettingManyDataHandlesForATask)

\def STARPU_CODELET_GET_MODE(codelet, i)
\ingroup API_Codelet_And_Tasks
Return the access mode of the \p i -th data handle of \p codelet.
If \p codelet is defined with a static or dynamic number of
handles, will either return the \p i -th element of the field
starpu_codelet::modes or the \p i -th element of the field
starpu_codelet::dyn_modes
(see \ref SettingManyDataHandlesForATask)

\def STARPU_CODELET_SET_MODE(codelet, mode, i)
\ingroup API_Codelet_And_Tasks
Set the access mode of the \p i -th data handle of \p codelet.
If \p codelet is defined with a static or dynamic number of
handles, will either set the \p i -th element of the field
starpu_codelet::modes or the \p i -th element of the field
starpu_codelet::dyn_modes
(see \ref SettingManyDataHandlesForATask)

\def STARPU_TASK_GET_MODE(task, i)
\ingroup API_Codelet_And_Tasks
Return the access mode of the \p i -th data handle of \p task.
If \p task is defined with a static or dynamic number of
handles, will either return the \p i -th element of the field
starpu_task::modes or the \p i -th element of the field
starpu_task::dyn_modes
(see \ref SettingManyDataHandlesForATask)

\def STARPU_TASK_SET_MODE(task, mode, i)
\ingroup API_Codelet_And_Tasks
Set the access mode of the \p i -th data handle of \p task.
If \p task is defined with a static or dynamic number of
handles, will either set the \p i -th element of the field
starpu_task::modes or the \p i -th element of the field
starpu_task::dyn_modes
(see \ref SettingManyDataHandlesForATask)

\fn struct starpu_task *starpu_task_create(void)
\ingroup API_Codelet_And_Tasks
Allocate a task structure and initialize it with default
values. Tasks allocated dynamically with starpu_task_create() are
automatically freed when the task is terminated. This means that the
task pointer can not be used any more once the task is submitted,
since it can be executed at any time (unless dependencies make it
wait) and thus freed at any time. If the field starpu_task::destroy is
explicitly unset, the resources used by the task have to be freed by
calling starpu_task_destroy().

\fn struct starpu_task *starpu_task_dup(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Allocate a task structure which is the exact duplicate of \p task.

\fn void starpu_task_clean(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Release all the structures automatically allocated to execute
\p task, but not the task structure itself and values set by the user
remain unchanged. It is thus useful for statically allocated tasks for
instance. It is also useful when users want to execute the same
operation several times with as least overhead as possible. It is
called automatically by starpu_task_destroy(). It has to be called
only after explicitly waiting for the task or after starpu_shutdown()
(waiting for the callback is not enough, since StarPU still
manipulates the task after calling the callback).

\fn void starpu_task_destroy(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Free the resource allocated during starpu_task_create() and
associated with \p task. This function is called automatically
after the execution of a task when the field starpu_task::destroy is
set, which is the default for tasks created by starpu_task_create().
Calling this function on a statically allocated task results in an
undefined behaviour.

\fn int starpu_task_wait(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Block until \p task has been executed. It is not
possible to synchronize with a task more than once. It is not possible
to wait for synchronous or detached tasks. Upon successful completion,
this function returns 0. Otherwise, <c>-EINVAL</c> indicates that the
specified task was either synchronous or detached.

\fn int starpu_task_wait_array(struct starpu_task **tasks, unsigned nb_tasks)
\ingroup API_Codelet_And_Tasks
Allow to wait for an array of tasks. Upon successful completion,
this function returns 0. Otherwise, <c>-EINVAL</c> indicates that one of the tasks
was either synchronous or detached.

\fn int starpu_task_submit(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Submit \p task to StarPU. Calling this function
does not mean that the task will be executed immediately as there can
be data or task (tag) dependencies that are not fulfilled yet: StarPU
will take care of scheduling this task with respect to such
dependencies. This function returns immediately if the field
starpu_task::synchronous is set to 0, and block until the
termination of the task otherwise.
It is also possible to synchronize
the application with asynchronous tasks by the means of tags, using
the function starpu_tag_wait() function for instance. In case of
success, this function returns 0, a return value of <c>-ENODEV</c>
means that there is no worker able to process this task (e.g. there is
no GPU available and this task is only implemented for CUDA devices).
starpu_task_submit() can be called from anywhere, including codelet
functions and callbacks, provided that the field
starpu_task::synchronous is set to 0.

\fn int starpu_task_submit_to_ctx(struct starpu_task *task, unsigned sched_ctx_id)
\ingroup API_Codelet_And_Tasks
Submit \p task to the context \p sched_ctx_id.
By default, starpu_task_submit() submits the task to a global context that is
created automatically by StarPU.

\fn int starpu_task_wait_for_all(void)
\ingroup API_Codelet_And_Tasks
Block until all the tasks that were submitted (to the
current context or the global one if there is no current context) are terminated.
It does not destroy these tasks.

\fn int starpu_task_wait_for_all_in_ctx(unsigned sched_ctx_id)
\ingroup API_Codelet_And_Tasks
Wait until all the tasks that were already submitted to the context \p
sched_ctx_id have been terminated.

\fn int starpu_task_wait_for_n_submitted(unsigned n)
\ingroup API_Codelet_And_Tasks
Block until there are \p n submitted tasks left (to the
current context or the global one if there is no current context) to be executed. It does
not destroy these tasks.

\fn int starpu_task_wait_for_n_submitted_in_ctx(unsigned sched_ctx_id, unsigned n)
\ingroup API_Codelet_And_Tasks
Wait until there are \p n tasks submitted left to be
executed that were already submitted to the context \p sched_ctx_id.

\fn int starpu_task_nready(void)
\ingroup API_Codelet_And_Tasks
TODO

\fn int starpu_task_nsubmitted(void)
\ingroup API_Codelet_And_Tasks
Return the number of submitted tasks which have not completed yet.

\fn int starpu_task_nready(void)
\ingroup API_Codelet_And_Tasks
Return the number of submitted tasks which are ready for
execution are already executing. It thus does not include tasks
waiting for dependencies.

\fn struct starpu_task *starpu_task_get_current(void)
\ingroup API_Codelet_And_Tasks
Return the task currently executed by the
worker, or <c>NULL</c> if it is called either from a thread that is not a
task or simply because there is no task being executed at the moment.

\fn const char *starpu_task_get_name(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Return the name of \p task, i.e. either its starpu_task::name field, or
the name of the corresponding performance model.

\fn const char *starpu_task_get_model_name(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Return the name of the performance model of \p task.

\fn void starpu_codelet_display_stats(struct starpu_codelet *cl)
\ingroup API_Codelet_And_Tasks
Output on \c stderr some statistics on the codelet \p cl.

\fn int starpu_task_wait_for_no_ready(void)
\ingroup API_Codelet_And_Tasks
Wait until there is no more ready task.

\fn void starpu_task_set_implementation(struct starpu_task *task, unsigned impl)
\ingroup API_Codelet_And_Tasks
This function should be called by schedulers to specify the
codelet implementation to be executed when executing \p task.

\fn unsigned starpu_task_get_implementation(struct starpu_task *task)
\ingroup API_Codelet_And_Tasks
Return the codelet implementation to be executed
when executing \p task.

\fn void starpu_iteration_push(unsigned long iteration)
\ingroup API_Codelet_And_Tasks
Sets the iteration number for all the tasks to be submitted after
this call. This is typically called at the beginning of a task
submission loop. This number will then show up in tracing tools. A
corresponding starpu_iteration_pop() call must be made to match the call to
starpu_iteration_push(), at the end of the same task submission loop, typically.

Nested calls to starpu_iteration_push and starpu_iteration_pop are allowed, to
describe a loop nest for instance, provided that they match properly.

\fn void starpu_iteration_pop(void)
\ingroup API_Codelet_And_Tasks
Drops the iteration number for submitted tasks. This must match a previous
call to starpu_iteration_push(), and is typically called at the end of a task
submission loop.

\fn void starpu_create_sync_task(starpu_tag_t sync_tag, unsigned ndeps, starpu_tag_t *deps, void (*callback)(void *), void *callback_arg)
\ingroup API_Codelet_And_Tasks
Create (and submit) an empty task that unlocks a tag once all its dependencies are fulfilled.

*/