Преглед изворни кода

add more pieces to the doc for the MPI master-slave

Corentin Salingue пре 8 година
родитељ
комит
74882f89bc

+ 6 - 6
doc/doxygen/chapters/410_mpi_support.doxy

@@ -729,25 +729,25 @@ data transfers and supports data matrices which do not fit in memory (out-of-cor
 
 StarPU includes an other way to execute the application across many nodes. The Master
 Slave support permits to use remote cores without thinking about data distribution. This
-support can be activated with the <c>--enable-mpi-master-slave</c>. However, you should not activate
+support can be activated with the \ref enable-mpi-master-slave "--enable-mpi-master-slave". However, you should not activate
 both MPI support and MPI Master-Slave support.
 
 If a codelet contains a kernel for CPU devices, it is automatically eligible to be executed
 on a MPI Slave device. However, you can decide to execute the codelet on a MPI Slave by filling
-the <c>mpi_ms_funcs</c> variable. The functions have to be globally-visible (i.e. not static ) for
-StarPU to be able to look them up, and -rdynamic must be passed to gcc (or -export-dynamic to ld)
+the \ref starpu_codelet::mpi_ms_funcs variable. The functions have to be globally-visible (i.e. not static ) for
+StarPU to be able to look them up, and <c>-rdynamic</c> must be passed to gcc (or <c>-export-dynamic</c> to ld)
 so that symbols of the main program are visible.
 
 By default, one core is dedicated on the master to manage the entire set of slaves. If MPI
-has a good multiple threads support, you can use <c>--with-mpi-master-slave-multiple-thread</c> to
+has a good multiple threads support, you can use \ref with-mpi-master-slave-multiple-thread "--with-mpi-master-slave-multiple-thread"  to
 dedicate one core per slave.
 
-If you want to chose the number of cores on the slave device, use the <c>STARPU_NMPIMSTHREADS=\<number\></c>
+If you want to chose the number of cores on the slave device, use the \ref STARPU_NMPIMSTHREADS "STARPU_NMPIMSTHREADS=\<number\>"
 with <c>\<number\></c> is the number of cores wanted. The default value is all the slave's cores. To select
 the number of slaves nodes, change the <c>-n</c> parameter when executing the application with mpirun
 or mpiexec.
 
 The node chosen by default is the with the MPI rank 0. To modify this, use the environment variable
-<c>STARPU_MPI_MASTER_NODE=\<number\></c> with <c>\<number\></c> is the MPI rank wanted.
+\ref STARPU_MPI_MASTER_NODE "STARPU_MPI_MASTER_NODE=\<number\>" with <c>\<number\></c> is the MPI rank wanted.
 
 */

+ 29 - 0
doc/doxygen/chapters/501_environment_variables.doxy

@@ -123,6 +123,28 @@ MIC devices to use.
 Number of threads to use on the MIC devices.
 </dd>
 
+<dt>STARPU_NMPI_MS</dt>
+<dd>
+\anchor STARPU_NMPI_MS
+\addindex __env__STARPU_NMPI_MS
+MPI Master Slave equivalent of the environment variable \ref STARPU_NCUDA, i.e. the number of
+MPI Master Slave devices to use.
+</dd>
+
+<dt>STARPU_NMPIMSTHREADS</dt>
+<dd>
+\anchor STARPU_NMPIMSTHREADS
+\addindex __env__STARPU_NMPIMSTHREADS
+Number of threads to use on the MPI Slave devices.
+</dd>
+
+<dt>STARPU_MPI_MASTER_NODE</dt>
+<dd>
+\anchor STARPU_MPI_MASTER_NODE
+\addindex __env__STARPU_MPI_MASTER_NODE
+This variable allows to chose which MPI node (with the MPI ID) will be the master.
+</dd>
+
 <dt>STARPU_NSCC</dt>
 <dd>
 \anchor STARPU_NSCC
@@ -310,6 +332,13 @@ it is therefore necessary to disable asynchronous data transfers.
 Disable asynchronous copies between CPU and MIC devices.
 </dd>
 
+<dt>STARPU_DISABLE_ASYNCHRONOUS_MPI_MS_COPY</dt>
+<dd>
+\anchor STARPU_DISABLE_ASYNCHRONOUS_MPI_MS_COPY 
+\addindex __env__STARPU_DISABLE_ASYNCHRONOUS_MPI_MS_COPY
+Disable asynchronous copies between CPU and MPI Slave devices.
+</dd>
+
 <dt>STARPU_ENABLE_CUDA_GPU_GPU_DIRECT</dt>
 <dd>
 \anchor STARPU_ENABLE_CUDA_GPU_GPU_DIRECT

+ 14 - 0
doc/doxygen/chapters/510_configure_options.doxy

@@ -298,6 +298,13 @@ Specify the maximum number of MIC threads
 Disable asynchronous copies between CPU and MIC devices.
 </dd>
 
+<dt>--disable-asynchronous-mpi-master-slave-copy</dt>
+<dd>
+\anchor disable-asynchronous-mpi-master-slave-copy
+\addindex __configure__--disable-asynchronous-mpi-master-slave-copy
+Disable asynchronous copies between CPU and MPI Slave devices.
+</dd>
+
 <dt>--enable-maxnodes=<c>count</c></dt>
 <dd>
 \anchor enable-maxnodes
@@ -327,6 +334,13 @@ Disable the build of libstarpumpi. By default, it is enabled when MPI is found.
 Enable the MPI Master-Slave support. By default, it is disabled.
 </dd>
 
+<dt>--with-mpi-master-slave-multiple-thread</dt>
+<dd>
+\anchor with-mpi-master-slave-multiple-thread
+\addindex __configure__--with-mpi-master-slave-multiple-thread
+Create one thread per MPI Slave on the MPI master to manage communications.
+</dd>
+
 <dt>--disable-fortran</dt>
 <dd>
 \anchor disable-fortran

+ 26 - 1
doc/doxygen/chapters/api/codelet_and_tasks.doxy

@@ -2,7 +2,7 @@
  * This file is part of the StarPU Handbook.
  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
  * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2015, 2016  CNRS
- * Copyright (C) 2011, 2012 INRIA
+ * Copyright (C) 2011, 2012, 2017  INRIA
  * See the file version.doxy for copying conditions.
  */
 
@@ -83,6 +83,11 @@ specify the codelet may be executed on a OpenCL processing unit.
 This macro is used when setting the field starpu_codelet::where to
 specify the codelet may be executed on a MIC processing unit.
 
+\def STARPU_MPI_MS
+\ingroup API_Codelet_And_Tasks
+This macro is used when setting the field starpu_codelet::where to
+specify the codelet may be executed on a MPI Slave processing unit.
+
 \def STARPU_SCC
 \ingroup API_Codelet_And_Tasks
 This macro is used when setting the field starpu_codelet::where to
@@ -152,6 +157,10 @@ OpenCL implementation of a codelet.
 \ingroup API_Codelet_And_Tasks
 MIC implementation of a codelet.
 
+\typedef starpu_mpi_ms_func_t
+\ingroup API_Codelet_And_Tasks
+MPI Master Slave implementation of a codelet.
+
 \typedef starpu_scc_func_t
 \ingroup API_Codelet_And_Tasks
 SCC implementation of a codelet.
@@ -160,6 +169,10 @@ SCC implementation of a codelet.
 \ingroup API_Codelet_And_Tasks
 MIC kernel for a codelet
 
+\typedef starpu_mpi_ms_kernel_t
+\ingroup API_Codelet_And_Tasks
+MPI Master Slave kernel for a codelet
+
 \typedef starpu_scc_kernel_t
 \ingroup API_Codelet_And_Tasks
 SCC kernel for a codelet
@@ -277,6 +290,18 @@ in the field starpu_codelet::where. It can be null if
 starpu_codelet::cpu_funcs_name is non-NULL, in which case StarPU will
 simply make a symbol lookup to get the implementation.
 
+\var starpu_mpi_ms_func_t starpu_codelet::mpi_ms_funcs[STARPU_MAXIMPLEMENTATIONS]
+Optional array of function pointers to a function which returns the
+MPI Master Slave implementation of the codelet. The functions prototype must be:
+\code{.c}
+starpu_mpi_ms_kernel_t mpi_ms_func(struct starpu_codelet *cl, unsigned nimpl)
+\endcode
+If the field starpu_codelet::where is set, then the field
+starpu_codelet::mpi_ms_funcs is ignored if ::STARPU_MPI_MS does not appear
+in the field starpu_codelet::where. It can be null if
+starpu_codelet::cpu_funcs_name is non-NULL, in which case StarPU will
+simply make a symbol lookup to get the implementation.
+
 \var starpu_scc_func_t starpu_codelet::scc_funcs[STARPU_MAXIMPLEMENTATIONS]
 Optional array of function pointers to a function which returns the
 SCC implementation of the codelet. The functions prototype must be:

+ 33 - 1
doc/doxygen/chapters/api/data_interfaces.doxy

@@ -2,7 +2,7 @@
  * This file is part of the StarPU Handbook.
  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
  * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2016  CNRS
- * Copyright (C) 2011, 2012 INRIA
+ * Copyright (C) 2011, 2012, 2017  INRIA
  * See the file version.doxy for copying conditions.
  */
 
@@ -136,6 +136,19 @@ Must return 0 if the transfer was actually completed completely
 synchronously, or -EAGAIN if at least some transfers are still ongoing
 and should be awaited for by the core.
 
+\var int (*starpu_data_copy_methods::ram_to_mpi_ms)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node)
+Define how to copy data from the \p src_interface interface on the
+\p src_node CPU node to the \p dst_interface interface on the \p dst_node MPI Slave
+node. Return 0 on success.
+\var int (*starpu_data_copy_methods::mpi_ms_to_ram)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node)
+Define how to copy data from the \p src_interface interface on the
+\p src_node MPI Slave node to the \p dst_interface interface on the \p dst_node CPU
+node. Return 0 on success.
+\var int (*starpu_data_copy_methods::mpi_ms_to_mpi_ms)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node)
+Define how to copy data from the \p src_interface interface on the
+\p src_node MPI Slave node to the \p dst_interface interface on the \p dst_node
+MPI Slave node. Return 0 on success.
+
 \var int (*starpu_data_copy_methods::ram_to_cuda_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)
 Define how to copy data from the \p src_interface interface on the
 \p src_node CPU node to the \p dst_interface interface on the \p dst_node CUDA
@@ -180,6 +193,25 @@ actually completed completely synchronously, or -EAGAIN if at least
 some transfers are still ongoing and should be awaited for by the
 core.
 
+\var int (*starpu_data_copy_methods::ram_to_mpi_ms_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, void * event)
+Define how to copy data from the \p src_interface interface on the
+\p src_node CPU node to the \p dst_interface interface on the \p dst_node MPI Slave
+node, with the given even. Must return 0 if the transfer was
+actually completed completely synchronously, or -EAGAIN if at least
+some transfers are still ongoing and should be awaited for by the core.
+\var int (*starpu_data_copy_methods::mpi_ms_to_ram_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, void * event)
+Define how to copy data from the \p src_interface interface on the
+\p src_node MPI Slave node to the \p dst_interface interface on the \p dst_node CPU
+node, with the given event. Must return 0 if the transfer was
+actually completed completely synchronously, or -EAGAIN if at least
+some transfers are still ongoing and should be awaited for by the core.
+\var int (*starpu_data_copy_methods::mpi_ms_to_mpi_ms_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, void * event)
+Define how to copy data from the \p src_interface interface on the
+\p src_node MPI Slave node to the \p dst_interface interface on the \p dst_node MPI Slave 
+node, using the given stream. Must return 0 if the transfer was
+actually completed completely synchronously, or -EAGAIN if at least
+some transfers are still ongoing and should be awaited for by the core.
+
 \var int (*starpu_data_copy_methods::ram_to_mic_async)(void *src_intreface, unsigned src_node, void *dst_interface, unsigned dst_node)
 Define how to copy data from the \p src_interface interface on the
 \p src_node CPU node to the \p dst_interface interface on the \p dst_node

+ 25 - 1
doc/doxygen/chapters/api/initialization.doxy

@@ -2,7 +2,7 @@
  * This file is part of the StarPU Handbook.
  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
  * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2016  CNRS
- * Copyright (C) 2011, 2012 INRIA
+ * Copyright (C) 2011, 2012, 2017  INRIA
  * See the file version.doxy for copying conditions.
  */
 
@@ -52,6 +52,10 @@ be specified with the environment variable \ref STARPU_NMIC.
 This is the number of SCC devices that StarPU can use. This can also
 be specified with the environment variable \ref STARPU_NSCC.
 (default = -1)
+\var int starpu_conf::nmpi_ms
+This is the number of MPI Master Slave devices that StarPU can use. This can also
+be specified with the environment variable \ref STARPU_NMPI_MS.
+(default = -1)
 
 \var unsigned starpu_conf::use_explicit_workers_bindid
 If this flag is set, the starpu_conf::workers_bindid array indicates
@@ -105,6 +109,14 @@ array contains the logical identifiers of the SCC devices to be used.
 Otherwise, StarPU affects the SCC devices in a round-robin fashion.
 This can also be specified with the environment variable
 \ref STARPU_WORKERS_SCCID.
+\var unsigned starpu_conf::use_explicit_workers_mpi_ms_deviceid
+If this flag is set, the MPI Master Slave workers will be attached to the MPI Master Slave
+devices specified in the array starpu_conf::workers_mpi_ms_deviceid.
+Otherwise, StarPU affects the MPI Master Slave devices in a round-robin fashion.
+(default = 0)
+\var unsigned starpu_conf::workers_mpi_ms_deviceid[STARPU_NMAXWORKERS]
+If the flag starpu_conf::use_explicit_workers_mpi_ms_deviceid is set, the
+array contains the logical identifiers of the MPI Master Slave devices to be used.
 
 \var int starpu_conf::bus_calibrate
 If this flag is set, StarPU will recalibrate the bus.  If this value
@@ -176,6 +188,13 @@ environment variable \ref STARPU_DISABLE_ASYNCHRONOUS_MIC_COPY.
 This can also be specified at compilation time by giving to the
 configure script the option \ref disable-asynchronous-mic-copy "--disable-asynchronous-mic-copy".
 (default = 0).
+\var int starpu_conf::disable_asynchronous_mpi_ms_copy
+This flag should be set to 1 to disable asynchronous copies between
+CPUs and MPI Master Slave devices. This can also be specified with the
+environment variable \ref STARPU_DISABLE_ASYNCHRONOUS_MPI_MS_COPY.
+This can also be specified at compilation time by giving to the
+configure script the option \ref disable-asynchronous-mpi-master-slave-copy "--disable-asynchronous-mpi-master-slave-copy".
+(default = 0).
 
 \var unsigned *starpu_conf::cuda_opengl_interoperability
 Enable CUDA/OpenGL interoperation on these CUDA
@@ -269,6 +288,11 @@ accelerators are disabled.
 Return 1 if asynchronous data transfers between CPU and MIC
 devices are disabled.
 
+\fn int starpu_asynchronous_mpi_ms_copy_disabled(void)
+\ingroup API_Initialization_and_Termination
+Return 1 if asynchronous data transfers between CPU and MPI Slave
+devices are disabled.
+
 \fn void starpu_topology_print(FILE *f)
 \ingroup API_Initialization_and_Termination
 Prints a description of the topology on f.

+ 11 - 1
doc/doxygen/chapters/api/mpi.doxy

@@ -2,7 +2,7 @@
  * This file is part of the StarPU Handbook.
  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
  * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2015, 2016  CNRS
- * Copyright (C) 2011, 2012 INRIA
+ * Copyright (C) 2011, 2012, 2017  INRIA
  * See the file version.doxy for copying conditions.
  */
 
@@ -484,4 +484,14 @@ of the collective communication, the \p rcallback function is called
 with the argument \p rarg on the process root, the \p scallback
 function is called with the argument \p sarg on any other process.
 
+@name MPI Master Slave
+\anchor MPIMasterSlave
+\ingroup API_MPI_Support
+
+\def STARPU_USE_MPI_MASTER_SLAVE
+\ingroup API_MPI_Support
+This macro is defined when StarPU has been installed with MPI Master Slave
+support. It should be used in your code to detect the availability of
+MPI Master Slave.
+
 */

+ 14 - 1
doc/doxygen/chapters/api/workers.doxy

@@ -2,7 +2,7 @@
  * This file is part of the StarPU Handbook.
  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
  * Copyright (C) 2010, 2011, 2012, 2013, 2014, 2016  CNRS
- * Copyright (C) 2011, 2012 INRIA
+ * Copyright (C) 2011, 2012, 2017  INRIA
  * See the file version.doxy for copying conditions.
  */
 
@@ -39,6 +39,9 @@ TODO
 \var starpu_node_kind::STARPU_OPENCL_RAM
 \ingroup API_Workers_Properties
 TODO
+\var starpu_node_kind::STARPU_DISK_RAM
+\ingroup API_Workers_Properties
+TODO
 \var starpu_node_kind::STARPU_MIC_RAM
 \ingroup API_Workers_Properties
 TODO
@@ -49,6 +52,9 @@ will be useful for MPI.
 \var starpu_node_kind::STARPU_SCC_SHM
 \ingroup API_Workers_Properties
 TODO
+\var starpu_node_kind::STARPU_MPI_MS_RAM
+\ingroup API_Workers_Properties
+TODO
 
 \enum starpu_worker_archtype
 \ingroup API_Workers_Properties
@@ -71,6 +77,9 @@ Intel MIC device
 \var starpu_worker_archtype::STARPU_SCC_WORKER
 \ingroup API_Workers_Properties
 Intel SCC device
+\var starpu_worker_archtype::STARPU_MPI_MS_WORKER
+\ingroup API_Workers_Properties
+MPI Slave device
 
 
 \struct starpu_worker_collection
@@ -147,6 +156,10 @@ This function returns the number of MIC workers controlled by StarPU.
 This function returns the number of MIC devices controlled by StarPU.
 The returned value should be at most \ref STARPU_MAXMICDEVS.
 
+\fn unsigned starpu_mpi_ms_worker_get_count(void)
+\ingroup API_Workers_Properties
+This function returns the number of MPI Master Slave workers controlled by StarPU.
+
 \fn unsigned starpu_scc_worker_get_count(void)
 \ingroup API_Workers_Properties
 This function returns the number of SCC devices controlled by StarPU.