před 14 roky · cd4cc46770
--- a/doc/starpu.texi
+++ b/doc/starpu.texi
@@ -2264,6 +2264,10 @@ GPU-RAM-NIC transfers are handled efficiently by StarPU-MPI.  The user has to
 
				 use the usual @code{mpirun} command of the MPI implementation to start StarPU on
			
 
				 the different MPI nodes.
			
 
				 
			
 
				+An MPI Insert Task function provides an even more seamless transition to a
			
 
				+distributed application, by automatically issuing all required data transfers
			
 
				+according to the task graph and an application-provided distribution.
			
 
				+
			
 
				 @menu
			
 
				 * The API::                     
			
 
				 * Simple Example::              
			
@@ -2274,6 +2278,16 @@ the different MPI nodes.
 
				 @node The API
			
 
				 @section The API
			
 
				 
			
 
				+@subsection Compilation
			
 
				+
			
 
				+The flags required to compile or link against the MPI layer are then
			
 
				+accessible with the following commands:
			
 
				+
			
 
				+@example
			
 
				+% pkg-config --cflags libstarpumpi  # options for the compiler
			
 
				+% pkg-config --libs libstarpumpi    # options for the linker
			
 
				+@end example
			
 
				+
			
 
				 @subsection Initialisation
			
 
				 
			
 
				 @deftypefun int starpu_mpi_initialize (void)
			
@@ -2432,6 +2446,19 @@ int main(int argc, char **argv)
 
				 @node MPI Insert Task Utility
			
 
				 @section MPI Insert Task Utility
			
 
				 
			
 
				+To save the programmer from having to explicit all communications, StarPU
			
 
				+provides an "MPI Insert Task Utility". The principe is that the application
			
 
				+decides a distribution of the data over the MPI nodes by allocating it and
			
 
				+notifying StarPU of that decision, i.e. tell StarPU which MPI node "owns" which
			
 
				+data. All MPI nodes then process the whole task graph, and StarPU automatically
			
 
				+determines which node actually execute which task, as well as the required MPI
			
 
				+transfers.
			
 
				+
			
 
				+@deftypefun int starpu_data_set_rank(starpu_data_handle handle, int mpi_rank)
			
 
				+Tell StarPU-MPI which MPI node "owns" a given data, that is, the node which will
			
 
				+always keep an up-to-date value, and will by default execute tasks which write
			
 
				+to it.
			
 
				+
			
 
				 @deftypefun void starpu_mpi_insert_task (MPI_Comm @var{comm}, starpu_codelet *@var{cl}, ...)
			
 
				 Create and submit a task corresponding to @var{cl} with the following
			
 
				 arguments.  The argument list must be zero-terminated.
			
@@ -2439,28 +2466,29 @@ arguments.  The argument list must be zero-terminated.
 
				 The arguments following the codelets are the same types as for the
			
 
				 function @code{starpu_insert_task} defined in @ref{Insert Task
			
 
				 Utility}. The extra argument @code{STARPU_EXECUTE_ON_NODE} followed by an
			
 
				-integer allows to specify the node to execute the codelet. It is also
			
 
				+integer allows to specify the MPI node to execute the codelet. It is also
			
 
				 possible to specify that the node owning a specific data will execute
			
 
				 the codelet, by using @code{STARPU_EXECUTE_ON_DATA} followed by a data
			
 
				 handle.
			
 
				 
			
 
				-The algorithm is as follows:
			
 
				+The internal algorithm is as follows:
			
 
				 @enumerate
			
 
				-@item Find out whether we are to execute the codelet because we own the
			
 
				-data to be written to. If different tasks own data to be written to,
			
 
				-the argument @code{STARPU_EXECUTE_ON_NODE} or
			
 
				-@code{STARPU_EXECUTE_ON_DATA} should be used to specify the executing
			
 
				-task @code{ET}.
			
 
				-@item Send and receive data as requested. Tasks owning data which need
			
 
				-to be read by the executing task @code{ET} are sending them to @code{ET}.
			
 
				-@item Execute the codelet. This is done by the task selected in the
			
 
				+@item Find out whether we (as an MPI node) are to execute the codelet
			
 
				+because we own the data to be written to. If different nodes own data
			
 
				+to be written to, the argument @code{STARPU_EXECUTE_ON_NODE} or
			
 
				+@code{STARPU_EXECUTE_ON_DATA} has to be used to specify which MPI node will
			
 
				+execute the task.
			
 
				+@item Send and receive data as requested. Nodes owning data which need to be
			
 
				+read by the task are sending them to the MPI node which will execute it. The
			
 
				+latter receives them.
			
 
				+@item Execute the codelet. This is done by the MPI node selected in the
			
 
				 1st step of the algorithm.
			
 
				-@item In the case when different tasks own data to be written to, send
			
 
				-W data back to their owners.
			
 
				+@item In the case when different MPI nodes own data to be written to, send
			
 
				+written data back to their owners.
			
 
				 @end enumerate
			
 
				 
			
 
				 The algorithm also includes a cache mechanism that allows not to send
			
 
				-data twice to the same task, unless the data has been modified.
			
 
				+data twice to the same MPI node, unless the data has been modified.
			
 
				 
			
 
				 @end deftypefun
			
 
				 
			
@@ -2469,7 +2497,7 @@ data twice to the same task, unless the data has been modified.
 
				 
			
 
				 @page
			
 
				 
			
 
				-Here an example showing how to use @code{starpu_mpi_insert_task}. One
			
 
				+Here an stencil example showing how to use @code{starpu_mpi_insert_task}. One
			
 
				 first needs to define a distribution function which specifies the
			
 
				 locality of the data. Note that that distribution information needs to
			
 
				 be given to StarPU by calling @code{starpu_data_set_rank}.
			
@@ -2492,6 +2520,9 @@ the lazy allocation mechanism, i.e. with a @code{home_node} set to -1.
 
				 StarPU will automatically allocate the memory when it is used for the
			
 
				 first time.
			
 
				 
			
 
				+One can note an optimization here (the @code{else if} test): we only register
			
 
				+data which will be needed by the tasks that we will execute.
			
 
				+
			
 
				 @cartouche
			
 
				 @smallexample
			
 
				     unsigned matrix[X][Y];
			
@@ -2537,6 +2568,11 @@ steps of the application.
 
				 @end smallexample
			
 
				 @end cartouche
			
 
				 
			
 
				+I.e. all MPI nodes process the whole task graph, but as mentioned above, for
			
 
				+each task, only the MPI node which owns the data being written to (here,
			
 
				+@code{data_handles[x][y]}) will actually run the task. The other MPI nodes will
			
 
				+automatically send the required data.
			
 
				+
			
 
				 @node MPI Collective Operations
			
 
				 @section MPI Collective Operations