|
@@ -123,6 +123,20 @@ Allow for at most @var{count} scheduling contexts
|
|
|
This information is then available as the
|
|
|
@code{STARPU_NMAX_SCHED_CTXS} macro.
|
|
|
|
|
|
+@item --disable-asynchronous-copy
|
|
|
+Disable asynchronous copies between CPU and GPU devices.
|
|
|
+The AMD implementation of OpenCL is known to
|
|
|
+fail when copying data asynchronously. When using this implementation,
|
|
|
+it is therefore necessary to disable asynchronous data transfers.
|
|
|
+
|
|
|
+@item --disable-asynchronous-cuda-copy
|
|
|
+Disable asynchronous copies between CPU and CUDA devices.
|
|
|
+
|
|
|
+@item --disable-asynchronous-opencl-copy
|
|
|
+Disable asynchronous copies between CPU and OpenCL devices.
|
|
|
+The AMD implementation of OpenCL is known to
|
|
|
+fail when copying data asynchronously. When using this implementation,
|
|
|
+it is therefore necessary to disable asynchronous data transfers.
|
|
|
@end table
|
|
|
|
|
|
@node Extension configuration
|
|
@@ -222,6 +236,7 @@ Enables the Scheduling Context Hypervisor plugin(@pxref{Scheduling Context Hyper
|
|
|
By default, it is disabled.
|
|
|
|
|
|
@end table
|
|
|
+
|
|
|
@node Execution configuration through environment variables
|
|
|
@section Execution configuration through environment variables
|
|
|
|
|
@@ -234,55 +249,31 @@ By default, it is disabled.
|
|
|
@node Workers
|
|
|
@subsection Configuring workers
|
|
|
|
|
|
-@menu
|
|
|
-* STARPU_NCPU:: Number of CPU workers
|
|
|
-* STARPU_NCUDA:: Number of CUDA workers
|
|
|
-* STARPU_NOPENCL:: Number of OpenCL workers
|
|
|
-* STARPU_NGORDON:: Number of SPU workers (Cell)
|
|
|
-* STARPU_WORKERS_NOBIND:: Do not bind workers
|
|
|
-* STARPU_WORKERS_CPUID:: Bind workers to specific CPUs
|
|
|
-* STARPU_WORKERS_CUDAID:: Select specific CUDA devices
|
|
|
-* STARPU_WORKERS_OPENCLID:: Select specific OpenCL devices
|
|
|
-* STARPU_SINGLE_COMBINED_WORKER:: Do not use concurrent workers
|
|
|
-* STARPU_MIN_WORKERSIZE:: Minimum size of the combined workers
|
|
|
-* STARPU_MAX_WORKERSIZE:: Maximum size of the combined workers
|
|
|
-@end menu
|
|
|
-
|
|
|
-@node STARPU_NCPU
|
|
|
-@subsubsection @code{STARPU_NCPU} -- Number of CPU workers
|
|
|
+@table @code
|
|
|
|
|
|
+@item @code{STARPU_NCPU}
|
|
|
Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
|
|
|
more CPU workers than there are physical CPUs, and that some CPUs are used to control
|
|
|
the accelerators.
|
|
|
|
|
|
-@node STARPU_NCUDA
|
|
|
-@subsubsection @code{STARPU_NCUDA} -- Number of CUDA workers
|
|
|
-
|
|
|
+@item @code{STARPU_NCUDA}
|
|
|
Specify the number of CUDA devices that StarPU can use. If
|
|
|
@code{STARPU_NCUDA} is lower than the number of physical devices, it is
|
|
|
possible to select which CUDA devices should be used by the means of the
|
|
|
@code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
|
|
|
create as many CUDA workers as there are CUDA devices.
|
|
|
|
|
|
-@node STARPU_NOPENCL
|
|
|
-@subsubsection @code{STARPU_NOPENCL} -- Number of OpenCL workers
|
|
|
-
|
|
|
+@item @code{STARPU_NOPENCL}
|
|
|
OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
|
|
|
|
|
|
-@node STARPU_NGORDON
|
|
|
-@subsubsection @code{STARPU_NGORDON} -- Number of SPU workers (Cell)
|
|
|
-
|
|
|
+@item @code{STARPU_NGORDON}
|
|
|
Specify the number of SPUs that StarPU can use.
|
|
|
|
|
|
-@node STARPU_WORKERS_NOBIND
|
|
|
-@subsubsection @code{STARPU_WORKERS_NOBIND} -- Do not bind workers to specific CPUs
|
|
|
-
|
|
|
+@item @code{STARPU_WORKERS_NOBIND}
|
|
|
Setting it to non-zero will prevent StarPU from binding its threads to
|
|
|
CPUs. This is for instance useful when running the testsuite in parallel.
|
|
|
|
|
|
-@node STARPU_WORKERS_CPUID
|
|
|
-@subsubsection @code{STARPU_WORKERS_CPUID} -- Bind workers to specific CPUs
|
|
|
-
|
|
|
+@item @code{STARPU_WORKERS_CPUID}
|
|
|
Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
|
|
|
specifies on which logical CPU the different workers should be
|
|
|
bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
|
|
@@ -306,9 +297,7 @@ third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
|
|
|
This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
|
|
|
@code{starpu_conf} structure passed to @code{starpu_init} is set.
|
|
|
|
|
|
-@node STARPU_WORKERS_CUDAID
|
|
|
-@subsubsection @code{STARPU_WORKERS_CUDAID} -- Select specific CUDA devices
|
|
|
-
|
|
|
+@item @code{STARPU_WORKERS_CUDAID}
|
|
|
Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
|
|
|
possible to select which CUDA devices should be used by StarPU. On a machine
|
|
|
equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
|
|
@@ -319,56 +308,57 @@ the one reported by CUDA).
|
|
|
This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
|
|
|
the @code{starpu_conf} structure passed to @code{starpu_init} is set.
|
|
|
|
|
|
-@node STARPU_WORKERS_OPENCLID
|
|
|
-@subsubsection @code{STARPU_WORKERS_OPENCLID} -- Select specific OpenCL devices
|
|
|
-
|
|
|
+@item @code{STARPU_WORKERS_OPENCLID}
|
|
|
OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
|
|
|
|
|
|
This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
|
|
|
the @code{starpu_conf} structure passed to @code{starpu_init} is set.
|
|
|
|
|
|
-@node STARPU_SINGLE_COMBINED_WORKER
|
|
|
-@subsubsection @code{STARPU_SINGLE_COMBINED_WORKER} -- Do not use concurrent workers
|
|
|
-
|
|
|
+@item @code{STARPU_SINGLE_COMBINED_WORKER}
|
|
|
If set, StarPU will create several workers which won't be able to work
|
|
|
concurrently. It will create combined workers which size goes from 1 to the
|
|
|
total number of CPU workers in the system.
|
|
|
|
|
|
-@node STARPU_MIN_WORKERSIZE
|
|
|
-@subsubsection @code{STARPU_MIN_WORKERSIZE} -- Minimum size of the combined workers
|
|
|
+@item @code{SYNTHESIZE_ARITY_COMBINED_WORKER}
|
|
|
|
|
|
+@item @code{STARPU_MIN_WORKERSIZE}
|
|
|
Let the user give a hint to StarPU about which how many workers
|
|
|
(minimum boundary) the combined workers should contain.
|
|
|
|
|
|
-@node STARPU_MAX_WORKERSIZE
|
|
|
-@subsubsection @code{STARPU_MAX_WORKERSIZE} -- Maximum size of the combined workers
|
|
|
-
|
|
|
+@item @code{STARPU_MAX_WORKERSIZE}
|
|
|
Let the user give a hint to StarPU about which how many workers
|
|
|
(maximum boundary) the combined workers should contain.
|
|
|
|
|
|
+@item @code{STARPU_DISABLE_ASYNCHRONOUS_COPY}
|
|
|
+Disable asynchronous copies between CPU and GPU devices.
|
|
|
+The AMD implementation of OpenCL is known to
|
|
|
+fail when copying data asynchronously. When using this implementation,
|
|
|
+it is therefore necessary to disable asynchronous data transfers.
|
|
|
+
|
|
|
+@item @code{STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY}
|
|
|
+Disable asynchronous copies between CPU and CUDA devices.
|
|
|
+
|
|
|
+@item @code{STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY}
|
|
|
+Disable asynchronous copies between CPU and OpenCL devices.
|
|
|
+The AMD implementation of OpenCL is known to
|
|
|
+fail when copying data asynchronously. When using this implementation,
|
|
|
+it is therefore necessary to disable asynchronous data transfers.
|
|
|
+
|
|
|
+@end table
|
|
|
+
|
|
|
+>>>>>>> .merge-right.r7182
|
|
|
@node Scheduling
|
|
|
@subsection Configuring the Scheduling engine
|
|
|
|
|
|
-@menu
|
|
|
-* STARPU_SCHED:: Scheduling policy
|
|
|
-* STARPU_CALIBRATE:: Calibrate performance models
|
|
|
-* STARPU_BUS_CALIBRATE:: Calibrate bus
|
|
|
-* STARPU_PREFETCH:: Use data prefetch
|
|
|
-* STARPU_SCHED_ALPHA:: Computation factor
|
|
|
-* STARPU_SCHED_BETA:: Communication factor
|
|
|
-@end menu
|
|
|
-
|
|
|
-@node STARPU_SCHED
|
|
|
-@subsubsection @code{STARPU_SCHED} -- Scheduling policy
|
|
|
+@table @code
|
|
|
|
|
|
+@item @code{STARPU_SCHED}
|
|
|
Choose between the different scheduling policies proposed by StarPU: work
|
|
|
random, stealing, greedy, with performance models, etc.
|
|
|
|
|
|
Use @code{STARPU_SCHED=help} to get the list of available schedulers.
|
|
|
|
|
|
-@node STARPU_CALIBRATE
|
|
|
-@subsubsection @code{STARPU_CALIBRATE} -- Calibrate performance models
|
|
|
-
|
|
|
+@item @code{STARPU_CALIBRATE}
|
|
|
If this variable is set to 1, the performance models are calibrated during
|
|
|
the execution. If it is set to 2, the previous values are dropped to restart
|
|
|
calibration from scratch. Setting this variable to 0 disable calibration, this
|
|
@@ -376,14 +366,11 @@ is the default behaviour.
|
|
|
|
|
|
Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
|
|
|
|
|
|
-@node STARPU_BUS_CALIBRATE
|
|
|
-@subsubsection @code{STARPU_BUS_CALIBRATE} -- Calibrate bus
|
|
|
-
|
|
|
+@item @code{STARPU_BUS_CALIBRATE}
|
|
|
If this variable is set to 1, the bus is recalibrated during intialization.
|
|
|
|
|
|
-@node STARPU_PREFETCH
|
|
|
-@subsubsection @code{STARPU_PREFETCH} -- Use data prefetch
|
|
|
-
|
|
|
+@item @code{STARPU_PREFETCH}
|
|
|
+@anchor{STARPU_PREFETCH}
|
|
|
This variable indicates whether data prefetching should be enabled (0 means
|
|
|
that it is disabled). If prefetching is enabled, when a task is scheduled to be
|
|
|
executed e.g. on a GPU, StarPU will request an asynchronous transfer in
|
|
@@ -391,58 +378,42 @@ advance, so that data is already present on the GPU when the task starts. As a
|
|
|
result, computation and data transfers are overlapped.
|
|
|
Note that prefetching is enabled by default in StarPU.
|
|
|
|
|
|
-@node STARPU_SCHED_ALPHA
|
|
|
-@subsubsection @code{STARPU_SCHED_ALPHA} -- Computation factor
|
|
|
-
|
|
|
+@item @code{STARPU_SCHED_ALPHA}
|
|
|
To estimate the cost of a task StarPU takes into account the estimated
|
|
|
computation time (obtained thanks to performance models). The alpha factor is
|
|
|
the coefficient to be applied to it before adding it to the communication part.
|
|
|
|
|
|
-@node STARPU_SCHED_BETA
|
|
|
-@subsubsection @code{STARPU_SCHED_BETA} -- Communication factor
|
|
|
-
|
|
|
+@item @code{STARPU_SCHED_BETA}
|
|
|
To estimate the cost of a task StarPU takes into account the estimated
|
|
|
data transfer time (obtained thanks to performance models). The beta factor is
|
|
|
the coefficient to be applied to it before adding it to the computation part.
|
|
|
|
|
|
+@end table
|
|
|
+
|
|
|
@node Misc
|
|
|
@subsection Miscellaneous and debug
|
|
|
|
|
|
-@menu
|
|
|
-* STARPU_SILENT:: Disable verbose mode
|
|
|
-* STARPU_LOGFILENAME:: Select debug file name
|
|
|
-* STARPU_FXT_PREFIX:: FxT trace location
|
|
|
-* STARPU_LIMIT_GPU_MEM:: Restrict memory size on the GPUs
|
|
|
-* STARPU_GENERATE_TRACE:: Generate a Paje trace when StarPU is shut down
|
|
|
-@end menu
|
|
|
-
|
|
|
-@node STARPU_SILENT
|
|
|
-@subsubsection @code{STARPU_SILENT} -- Disable verbose mode
|
|
|
+@table @code
|
|
|
|
|
|
+@item @code{STARPU_SILENT}
|
|
|
This variable allows to disable verbose mode at runtime when StarPU
|
|
|
has been configured with the option @code{--enable-verbose}.
|
|
|
|
|
|
-@node STARPU_LOGFILENAME
|
|
|
-@subsubsection @code{STARPU_LOGFILENAME} -- Select debug file name
|
|
|
-
|
|
|
+@item @code{STARPU_LOGFILENAME}
|
|
|
This variable specifies in which file the debugging output should be saved to.
|
|
|
|
|
|
-@node STARPU_FXT_PREFIX
|
|
|
-@subsubsection @code{STARPU_FXT_PREFIX} -- FxT trace location
|
|
|
-
|
|
|
+@item @code{STARPU_FXT_PREFIX}
|
|
|
This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
|
|
|
|
|
|
-@node STARPU_LIMIT_GPU_MEM
|
|
|
-@subsubsection @code{STARPU_LIMIT_GPU_MEM} -- Restrict memory size on the GPUs
|
|
|
-
|
|
|
+@item @code{STARPU_LIMIT_GPU_MEM}
|
|
|
This variable specifies the maximum number of megabytes that should be
|
|
|
available to the application on each GPUs. In case this value is smaller than
|
|
|
the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
|
|
|
on the device. This variable is intended to be used for experimental purposes
|
|
|
as it emulates devices that have a limited amount of memory.
|
|
|
|
|
|
-@node STARPU_GENERATE_TRACE
|
|
|
-@subsubsection @code{STARPU_GENERATE_TRACE} -- Generate a Paje trace when StarPU is shut down
|
|
|
-
|
|
|
+@item @code{STARPU_GENERATE_TRACE}
|
|
|
When set to 1, this variable indicates that StarPU should automatically
|
|
|
generate a Paje trace when starpu_shutdown is called.
|
|
|
+
|
|
|
+@end table
|