/*
 * This file is part of the StarPU Handbook.
 * Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 * Copyright (C) 2010, 2011, 2012, 2013, 2014  Centre National de la Recherche Scientifique
 * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 * See the file version.doxy for copying conditions.
 */

/*! \defgroup API_Data_Management Data Management

\brief This section describes the data management facilities provided
by StarPU. We show how to use existing data interfaces in \ref
API_Data_Interfaces, but developers can design their own data interfaces if
required.

\typedef starpu_data_handle_t
\ingroup API_Data_Management
StarPU uses ::starpu_data_handle_t as an opaque handle to
manage a piece of data. Once a piece of data has been registered to
StarPU, it is associated to a ::starpu_data_handle_t which keeps track
of the state of the piece of data over the entire machine, so that we
can maintain data consistency and locate data replicates for instance.

\enum starpu_data_access_mode
\ingroup API_Data_Management
This datatype describes a data access mode.
\var starpu_data_access_mode::STARPU_NONE
\ingroup API_Data_Management
TODO
\var starpu_data_access_mode::STARPU_R
\ingroup API_Data_Management
read-only mode.
\var starpu_data_access_mode::STARPU_W
\ingroup API_Data_Management
write-only mode.
\var starpu_data_access_mode::STARPU_RW
\ingroup API_Data_Management
read-write mode. This is equivalent to ::STARPU_R|::STARPU_W
\var starpu_data_access_mode::STARPU_SCRATCH
\ingroup API_Data_Management
A temporary buffer is allocated for the task, but StarPU does not
enforce data consistency---i.e. each device has its own buffer,
independently from each other (even for CPUs), and no data transfer is
ever performed. This is useful for temporary variables to avoid
allocating/freeing buffers inside each task. Currently, no behavior is
defined concerning the relation with the ::STARPU_R and ::STARPU_W modes
and the value provided at registration --- i.e., the value of the
scratch buffer is undefined at entry of the codelet function.  It is
being considered for future extensions at least to define the initial
value.  For now, data to be used in ::STARPU_SCRATCH mode should be
registered with node <c>-1</c> and a <c>NULL</c> pointer, since the
value of the provided buffer is simply ignored for now.
\var starpu_data_access_mode::STARPU_REDUX
\ingroup API_Data_Management
todo
\var starpu_data_access_mode::STARPU_COMMUTE
\ingroup API_Data_Management
In addition to that, ::STARPU_COMMUTE can be passed along ::STARPU_W
or ::STARPU_RW to express that StarPU can let tasks commute, which is
useful e.g. when bringing a contribution into some data, which can be
done in any order (but still require sequential consistency against
reads or non-commutative writes).

@name Basic Data Management API
\ingroup API_Data_Management

Data management is done at a high-level in StarPU: rather than
accessing a mere list of contiguous buffers, the tasks may manipulate
data that are described by a high-level construct which we call data
interface.

An example of data interface is the "vector" interface which describes
a contiguous data array on a spefic memory node. This interface is a
simple structure containing the number of elements in the array, the
size of the elements, and the address of the array in the appropriate
address space (this address may be invalid if there is no valid copy
of the array in the memory node). More informations on the data
interfaces provided by StarPU are given in \ref API_Data_Interfaces.

When a piece of data managed by StarPU is used by a task, the task
implementation is given a pointer to an interface describing a valid
copy of the data that is accessible from the current processing unit.

Every worker is associated to a memory node which is a logical
abstraction of the address space from which the processing unit gets
its data. For instance, the memory node associated to the different
CPU workers represents main memory (RAM), the memory node associated
to a GPU is DRAM embedded on the device. Every memory node is
identified by a logical index which is accessible from the
function starpu_worker_get_memory_node(). When registering a piece of
data to StarPU, the specified memory node indicates where the piece of
data initially resides (we also call this memory node the home node of
a piece of data).

\fn void starpu_data_register(starpu_data_handle_t *handleptr, unsigned home_node, void *data_interface, struct starpu_data_interface_ops *ops)
\ingroup API_Data_Management
Register a piece of data into the handle located at the
\p handleptr address. The \p data_interface buffer contains the initial
description of the data in the \p home_node. The \p ops argument is a
pointer to a structure describing the different methods used to
manipulate this type of interface. See starpu_data_interface_ops for
more details on this structure.
If \p home_node is -1, StarPU will automatically allocate the memory when
it is used for the first time in write-only mode. Once such data
handle has been automatically allocated, it is possible to access it
using any access mode.
Note that StarPU supplies a set of predefined types of interface (e.g.
vector or matrix) which can be registered by the means of helper
functions (e.g. starpu_vector_data_register() or
starpu_matrix_data_register()).

\fn void starpu_data_ptr_register(starpu_data_handle_t handle, unsigned node)
\ingroup API_Data_Management
Register that a buffer for \p handle on \p node will be set. This is typically
used by starpu_*_ptr_register helpers before setting the interface pointers for
this node, to tell the core that that is now allocated.

\fn void starpu_data_register_same(starpu_data_handle_t *handledst, starpu_data_handle_t handlesrc)
\ingroup API_Data_Management
Register a new piece of data into the handle \p handledst with the
same interface as the handle \p handlesrc.

\fn void starpu_data_unregister(starpu_data_handle_t handle)
\ingroup API_Data_Management
This function unregisters a data handle from StarPU. If the
data was automatically allocated by StarPU because the home node was
-1, all automatically allocated buffers are freed. Otherwise, a valid
copy of the data is put back into the home node in the buffer that was
initially registered. Using a data handle that has been unregistered
from StarPU results in an undefined behaviour. In case we do not need
to update the value of the data in the home node, we can use
the function starpu_data_unregister_no_coherency() instead.

\fn void starpu_data_unregister_no_coherency(starpu_data_handle_t handle)
\ingroup API_Data_Management
This is the same as starpu_data_unregister(), except that
StarPU does not put back a valid copy into the home node, in the
buffer that was initially registered.

\fn void starpu_data_unregister_submit(starpu_data_handle_t handle)
\ingroup API_Data_Management
Destroy the data handle once it is not needed anymore by any
submitted task. No coherency is assumed.

\fn void starpu_data_invalidate(starpu_data_handle_t handle)
\ingroup API_Data_Management
Destroy all replicates of the data handle immediately. After
data invalidation, the first access to the handle must be performed in
write-only mode. Accessing an invalidated data in read-mode results in
undefined behaviour.

\fn void starpu_data_invalidate_submit(starpu_data_handle_t handle)
\ingroup API_Data_Management
Submits invalidation of the data handle after completion of
previously submitted tasks.

\fn void starpu_data_set_wt_mask(starpu_data_handle_t handle, uint32_t wt_mask)
\ingroup API_Data_Management
This function sets the write-through mask of a given data (and
its children), i.e. a bitmask of nodes where the data should be always
replicated after modification. It also prevents the data from being
evicted from these nodes when memory gets scarse. When the data is
modified, it is automatically transfered into those memory node. For
instance a <c>1<<0</c> write-through mask means that the CUDA workers
will commit their changes in main memory (node 0).

\fn int starpu_data_prefetch_on_node(starpu_data_handle_t handle, unsigned node, unsigned async)
\ingroup API_Data_Management
Issue a prefetch request for a given data to a given node, i.e.
requests that the data be replicated to the given node, so that it is
available there for tasks. If the \p async parameter is 0, the call will
block until the transfer is achieved, else the call will return immediately,
after having just queued the request. In the latter case, the request will
asynchronously wait for the completion of any task writing on the data.

\fn starpu_data_handle_t starpu_data_lookup(const void *ptr)
\ingroup API_Data_Management
Return the handle corresponding to the data pointed to by the \p ptr host pointer.

\fn int starpu_data_request_allocation(starpu_data_handle_t handle, unsigned node)
\ingroup API_Data_Management
Explicitly ask StarPU to allocate room for a piece of data on
the specified memory node.

\fn void starpu_data_query_status(starpu_data_handle_t handle, int memory_node, int *is_allocated, int *is_valid, int *is_requested)
\ingroup API_Data_Management
Query the status of \p handle on the specified \p memory_node.

\fn void starpu_data_advise_as_important(starpu_data_handle_t handle, unsigned is_important)
\ingroup API_Data_Management
This function allows to specify that a piece of data can be
discarded without impacting the application.

\fn void starpu_data_set_reduction_methods(starpu_data_handle_t handle, struct starpu_codelet *redux_cl, struct starpu_codelet *init_cl)
\ingroup API_Data_Management
This sets the codelets to be used for \p handle when it is
accessed in the mode ::STARPU_REDUX. Per-worker buffers will be initialized with
the codelet \p init_cl, and reduction between per-worker buffers will be
done with the codelet \p redux_cl.

\fn struct starpu_data_interface_ops* starpu_data_get_interface_ops(starpu_data_handle_t handle)
\ingroup API_Data_Management
todo

@name Access registered data from the application
\ingroup API_Data_Management

\fn int starpu_data_acquire(starpu_data_handle_t handle, enum starpu_data_access_mode mode)
\ingroup API_Data_Management
The application must call this function prior to accessing
registered data from main memory outside tasks. StarPU ensures that
the application will get an up-to-date copy of the data in main memory
located where the data was originally registered, and that all
concurrent accesses (e.g. from tasks) will be consistent with the
access mode specified in the mode argument. starpu_data_release() must
be called once the application does not need to access the piece of
data anymore. Note that implicit data dependencies are also enforced
by starpu_data_acquire(), i.e. starpu_data_acquire() will wait for all
tasks scheduled to work on the data, unless they have been disabled
explictly by calling starpu_data_set_default_sequential_consistency_flag() or
starpu_data_set_sequential_consistency_flag(). starpu_data_acquire() is a
blocking call, so that it cannot be called from tasks or from their
callbacks (in that case, starpu_data_acquire() returns <c>-EDEADLK</c>). Upon
successful completion, this function returns 0.

\fn int starpu_data_acquire_cb(starpu_data_handle_t handle, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg)
\ingroup API_Data_Management
Asynchronous equivalent of starpu_data_acquire(). When the data
specified in \p handle is available in the appropriate access
mode, the \p callback function is executed. The application may access
the requested data during the execution of this \p callback. The \p callback
function must call starpu_data_release() once the application does not
need to access the piece of data anymore. Note that implicit data
dependencies are also enforced by starpu_data_acquire_cb() in case they
are not disabled. Contrary to starpu_data_acquire(), this function is
non-blocking and may be called from task callbacks. Upon successful
completion, this function returns 0.

\fn int starpu_data_acquire_cb_sequential_consistency(starpu_data_handle_t handle, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg, int sequential_consistency)
\ingroup API_Data_Management
Equivalent of starpu_data_acquire_cb() with the possibility of enabling or disabling data dependencies.
When the data specified in \p handle is available in the appropriate access
mode, the \p callback function is executed. The application may access
the requested data during the execution of this \p callback. The \p callback
function must call starpu_data_release() once the application does not
need to access the piece of data anymore. Note that implicit data
dependencies are also enforced by starpu_data_acquire_cb_sequential_consistency() in case they
are not disabled specifically for the given \p handle or by the parameter \p sequential_consistency.
Similarly to starpu_data_acquire_cb(), this function is
non-blocking and may be called from task callbacks. Upon successful
completion, this function returns 0.

\fn int starpu_data_acquire_on_node(starpu_data_handle_t handle, unsigned node, enum starpu_data_access_mode mode)
\ingroup API_Data_Management
This is the same as starpu_data_acquire(), except that the data
will be available on the given memory node instead of main memory.

\fn int starpu_data_acquire_on_node_cb(starpu_data_handle_t handle, unsigned node, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg)
\ingroup API_Data_Management
This is the same as starpu_data_acquire_cb(), except that the
data will be available on the given memory node instead of main
memory.

\fn int starpu_data_acquire_on_node_cb_sequential_consistency(starpu_data_handle_t handle, unsigned node, enum starpu_data_access_mode mode, void (*callback)(void *), void *arg, int sequential_consistency)
\ingroup API_Data_Management
This is the same as starpu_data_acquire_cb_sequential_consistency(), except that the
data will be available on the given memory node instead of main
memory.

\def STARPU_DATA_ACQUIRE_CB(handle, mode, code)
\ingroup API_Data_Management
STARPU_DATA_ACQUIRE_CB() is the same as starpu_data_acquire_cb(),
except that the code to be executed in a callback is directly provided
as a macro parameter, and the data \p handle is automatically released
after it. This permits to easily execute code which depends on the
value of some registered data. This is non-blocking too and may be
called from task callbacks.

\fn void starpu_data_release(starpu_data_handle_t handle)
\ingroup API_Data_Management
This function releases the piece of data acquired by the
application either by starpu_data_acquire() or by
starpu_data_acquire_cb().

\fn void starpu_data_release_on_node(starpu_data_handle_t handle, unsigned node)
\ingroup API_Data_Management
This is the same as starpu_data_release(), except that the data
will be available on the given memory \p node instead of main memory.

*/