@c -*-texinfo-*-

@c This file is part of the StarPU Handbook.
@c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
@c Copyright (C) 2010, 2011, 2012  Centre National de la Recherche Scientifique
@c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
@c See the file starpu.texi for copying conditions.

@menu
* Compilation configuration::   
* Execution configuration through environment variables::  
@end menu

@node Compilation configuration
@section Compilation configuration

The following arguments can be given to the @code{configure} script.

@menu
* Common configuration::
* Configuring workers::
* Extension configuration::
* Advanced configuration::
@end menu

@node Common configuration
@subsection Common configuration

@defvr {Configure option} --enable-debug
Enable debugging messages.
@end defvr

@defvr {Configure option} --enable-debug
Enable debugging messages.
@end defvr

@defvr {Configure option} --enable-fast
Disable assertion checks, which saves computation time.
@end defvr

@defvr {Configure option} --enable-verbose
Increase the verbosity of the debugging messages.  This can be disabled
at runtime by setting the environment variable @code{STARPU_SILENT} to
any value.

@smallexample
% STARPU_SILENT=1 ./vector_scal
@end smallexample
@end defvr

@defvr {Configure option} --enable-coverage
Enable flags for the @code{gcov} coverage tool.
@end defvr

@defvr {Configure option} --enable-quick-check
Specify tests and examples should be run on a smaller data set, i.e
allowing a faster execution time
@end defvr

@defvr {Configure option} --with-hwloc
Specify hwloc should be used by StarPU. hwloc should be found by the
means of the tools @code{pkg-config}.
@end defvr

@defvr {Configure option} --with-hwloc=@var{prefix}
Specify hwloc should be used by StarPU. hwloc should be found in the
directory specified by @var{prefix}.
@end defvr

@defvr {Configure option} --without-hwloc
Specify hwloc should not be used by StarPU.
@end defvr

@defvr {Configure option} --disable-build-doc
Disable the creation of the documentation. This should be done on a
machine which does not have the tools @code{makeinfo} and @code{tex}.
@end defvr

Additionally, the @command{configure} script recognize many variables, which
can be listed by typing @code{./configure --help}. For example,
@code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation of
CUDA kernels.

@node Configuring workers
@subsection Configuring workers

@defvr {Configure option} --enable-maxcpus=@var{count}
Use at most @var{count} CPU cores.  This information is then
available as the @code{STARPU_MAXCPUS} macro.
@end defvr

@defvr {Configure option} --disable-cpu
Disable the use of CPUs of the machine. Only GPUs etc. will be used.
@end defvr

@defvr {Configure option} --enable-maxcudadev=@var{count}
Use at most @var{count} CUDA devices.  This information is then
available as the @code{STARPU_MAXCUDADEVS} macro.
@end defvr

@defvr {Configure option} --disable-cuda
Disable the use of CUDA, even if a valid CUDA installation was detected.
@end defvr

@defvr {Configure option} --with-cuda-dir=@var{prefix}
Search for CUDA under @var{prefix}, which should notably contain
@file{include/cuda.h}.
@end defvr

@defvr {Configure option} --with-cuda-include-dir=@var{dir}
Search for CUDA headers under @var{dir}, which should
notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
value given to @code{--with-cuda-dir}.
@end defvr

@defvr {Configure option} --with-cuda-lib-dir=@var{dir}
Search for CUDA libraries under @var{dir}, which should notably contain
the CUDA shared libraries---e.g., @file{libcuda.so}.  This defaults to
@code{/lib} appended to the value given to @code{--with-cuda-dir}.
@end defvr

@defvr {Configure option} --disable-cuda-memcpy-peer
Explicitly disable peer transfers when using CUDA 4.0.
@end defvr

@defvr {Configure option} --enable-maxopencldev=@var{count}
Use at most @var{count} OpenCL devices.  This information is then
available as the @code{STARPU_MAXOPENCLDEVS} macro.
@end defvr

@defvr {Configure option} --disable-opencl
Disable the use of OpenCL, even if the SDK is detected.
@end defvr

@defvr {Configure option} --with-opencl-dir=@var{prefix}
Search for an OpenCL implementation under @var{prefix}, which should
notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
Mac OS).
@end defvr

@defvr {Configure option} --with-opencl-include-dir=@var{dir}
Search for OpenCL headers under @var{dir}, which should notably contain
@file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS).  This defaults to
@code{/include} appended to the value given to @code{--with-opencl-dir}.
@end defvr

@defvr {Configure option} --with-opencl-lib-dir=@var{dir}
Search for an OpenCL library under @var{dir}, which should notably
contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
@code{/lib} appended to the value given to @code{--with-opencl-dir}.
@end defvr

@defvr {Configure option} --enable-opencl-simulator
Enable considering the provided OpenCL implementation as a simulator, i.e. use
the kernel duration returned by OpenCL profiling information as wallclock time
instead of the actual measured real time. This requires simgrid support.
@end defvr

@defvr {Configure option} --enable-maximplementations=@var{count}
Allow for at most @var{count} codelet implementations for the same
target device.  This information is then available as the
@code{STARPU_MAXIMPLEMENTATIONS} macro.
@end defvr

@defvr {Configure option} --enable-max-sched-ctxs=@var{count}
Allow for at most @var{count} scheduling contexts
This information is then available as the
@code{STARPU_NMAX_SCHED_CTXS} macro.
@end defvr

@defvr {Configure option} --disable-asynchronous-copy
Disable asynchronous copies between CPU and GPU devices.
The AMD implementation of OpenCL is known to
fail when copying data asynchronously. When using this implementation,
it is therefore necessary to disable asynchronous data transfers.
@end defvr

@defvr {Configure option} --disable-asynchronous-cuda-copy
Disable asynchronous copies between CPU and CUDA devices.
@end defvr

@defvr {Configure option} --disable-asynchronous-opencl-copy
Disable asynchronous copies between CPU and OpenCL devices.
The AMD implementation of OpenCL is known to
fail when copying data asynchronously. When using this implementation,
it is therefore necessary to disable asynchronous data transfers.
@end defvr

@node Extension configuration
@subsection Extension configuration

@defvr {Configure option} --disable-socl
Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}).  By
default, it is enabled when an OpenCL implementation is found.
@end defvr

@defvr {Configure option} --disable-starpu-top
Disable the StarPU-Top interface (@pxref{StarPU-Top}).  By default, it
is enabled when the required dependencies are found.
@end defvr

@defvr {Configure option} --disable-gcc-extensions
Disable the GCC plug-in (@pxref{C Extensions}).  By default, it is
enabled when the GCC compiler provides a plug-in support.
@end defvr

@defvr {Configure option} --with-mpicc=@var{path}
Use the @command{mpicc} compiler at @var{path}, for starpumpi
(@pxref{StarPU MPI support}).
@end defvr

@node Advanced configuration
@subsection Advanced configuration

@defvr {Configure option} --enable-perf-debug
Enable performance debugging through gprof.
@end defvr

@defvr {Configure option} --enable-model-debug
Enable performance model debugging.
@end defvr

@defvr {Configure option} --enable-stats
@c see ../../src/datawizard/datastats.c
Enable gathering of various data statistics (@pxref{Data statistics}).
@end defvr

@defvr {Configure option} --enable-maxbuffers
Define the maximum number of buffers that tasks will be able to take
as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
@end defvr

@defvr {Configure option} --enable-allocation-cache
Enable the use of a data allocation cache to avoid the cost of it with
CUDA. Still experimental.
@end defvr

@defvr {Configure option} --enable-opengl-render
Enable the use of OpenGL for the rendering of some examples.
@c TODO: rather default to enabled when detected
@end defvr

@defvr {Configure option} --enable-blas-lib
Specify the blas library to be used by some of the examples. The
library has to be 'atlas' or 'goto'.
@end defvr

@defvr {Configure option} --disable-starpufft
Disable the build of libstarpufft, even if fftw or cuFFT is available.
@end defvr

@defvr {Configure option} --with-magma=@var{prefix}
Search for MAGMA under @var{prefix}.  @var{prefix} should notably
contain @file{include/magmablas.h}.
@end defvr

@defvr {Configure option} --with-fxt=@var{prefix}
Search for FxT under @var{prefix}.
@url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
traces of scheduling events, which can then be rendered them using ViTE
(@pxref{Off-line, off-line performance feedback}).  @var{prefix} should
notably contain @code{include/fxt/fxt.h}.
@end defvr

@defvr {Configure option} --with-perf-model-dir=@var{dir}
Store performance models under @var{dir}, instead of the current user's
home.
@end defvr

@defvr {Configure option} --with-goto-dir=@var{prefix}
Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.
@end defvr

@defvr {Configure option} --with-atlas-dir=@var{prefix}
Search for ATLAS under @var{prefix}, which should notably contain
@file{include/cblas.h}.
@end defvr

@defvr {Configure option} --with-mkl-cflags=@var{cflags}
Use @var{cflags} to compile code that uses the MKL library.
@end defvr

@defvr {Configure option} --with-mkl-ldflags=@var{ldflags}
Use @var{ldflags} when linking code that uses the MKL library.  Note
that the
@url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
MKL website} provides a script to determine the linking flags.
@end defvr

@defvr {Configure option} --disable-build-examples
Disable the build of examples.
@end defvr


@defvr {Configure option} --enable-sched-ctx-hypervisor
Enables the Scheduling Context Hypervisor plugin(@pxref{Scheduling Context Hypervisor}).
By default, it is disabled.
@end defvr

@defvr {Configure option} --enable-memory-stats
Enable memory statistics (@pxref{Memory feedback}).
@end defvr

@defvr {Configure option} --enable-simgrid
Enable simulation of execution in simgrid, to allow easy experimentation with
various numbers of cores and GPUs, or amount of memory, etc. Experimental.
@end defvr

@node Execution configuration through environment variables
@section Execution configuration through environment variables

@menu
* Workers::                     Configuring workers
* Scheduling::                  Configuring the Scheduling engine
* Extensions::
* Misc::                        Miscellaneous and debug
@end menu

@node Workers
@subsection Configuring workers

@defvr {Environment variable} STARPU_NCPU
Specify the number of CPU workers (thus not including workers dedicated to control accelerators). Note that by default, StarPU will not allocate
more CPU workers than there are physical CPUs, and that some CPUs are used to control
the accelerators.
@end defvr

@defvr {Environment variable} STARPU_NCPUS
This variable is deprecated. You should use @code{STARPU_NCPU}.
@end defvr

@defvr {Environment variable} STARPU_NCUDA
Specify the number of CUDA devices that StarPU can use. If
@code{STARPU_NCUDA} is lower than the number of physical devices, it is
possible to select which CUDA devices should be used by the means of the
@code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
create as many CUDA workers as there are CUDA devices.
@end defvr

@defvr {Environment variable} STARPU_NOPENCL
OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
@end defvr

@defvr {Environment variable} STARPU_OPENCL_ON_CPUS
By default, the OpenCL driver only enables GPU and accelerator
devices. By setting the environment variable
@code{STARPU_OPENCL_ON_CPUS} to 1, the OpenCL driver will also enable
CPU devices.
@end defvr

@defvr {Environment variable} STARPU_WORKERS_NOBIND
Setting it to non-zero will prevent StarPU from binding its threads to
CPUs. This is for instance useful when running the testsuite in parallel.
@end defvr

@defvr {Environment variable} STARPU_WORKERS_CPUID
Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
specifies on which logical CPU the different workers should be
bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
worker will be bound to logical CPU #0, the second CPU worker will be bound to
logical CPU #1 and so on.  Note that the logical ordering of the CPUs is either
determined by the OS, or provided by the @code{hwloc} library in case it is
available.

Note that the first workers correspond to the CUDA workers, then come the
OpenCL workers, and finally the CPU workers. For example if
we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}
and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
the logical CPUs #1 and #3 will be used by the CPU workers.

If the number of workers is larger than the array given in
@code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).

This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
@code{starpu_conf} structure passed to @code{starpu_init} is set.
@end defvr

@defvr {Environment variable} STARPU_WORKERS_CUDAID
Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
possible to select which CUDA devices should be used by StarPU. On a machine
equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
@code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
they should use CUDA devices #1 and #3 (the logical ordering of the devices is
the one reported by CUDA).

This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
the @code{starpu_conf} structure passed to @code{starpu_init} is set.
@end defvr

@defvr {Environment variable} STARPU_WORKERS_OPENCLID
OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.

This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
the @code{starpu_conf} structure passed to @code{starpu_init} is set.
@end defvr

@defvr {Environment variable} @code{STARPU_SINGLE_COMBINED_WORKER}
If set, StarPU will create several workers which won't be able to work
concurrently. It will create combined workers which size goes from 1 to the
total number of CPU workers in the system.
@end defvr

@defvr {Environment variable} STARPU_SYNTHESIZE_ARITY_COMBINED_WORKER
Let the user decide how many elements are allowed between combined workers
created from hwloc information. For instance, in the case of sockets with 6
cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} is
set to 6, no combined worker will be synthesized beyond one for the socket
and one per core. If it is set to 3, 3 intermediate combined workers will be
synthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to
2, 2 intermediate combined workers will be synthesized, to divide the the socket
cores into 2 chunks of 3 cores, and then 3 additional combined workers will be
synthesized, to divide the former synthesized workers into a bunch of 2 cores,
and the remaining core (for which no combined worker is synthesized since there
is already a normal worker for it).

The default, 2, thus makes StarPU tend to building a binary trees of combined
workers.
@end defvr

@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_COPY
Disable asynchronous copies between CPU and GPU devices.
The AMD implementation of OpenCL is known to
fail when copying data asynchronously. When using this implementation,
it is therefore necessary to disable asynchronous data transfers.
@end defvr

@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY
Disable asynchronous copies between CPU and CUDA devices.
@end defvr

@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY
Disable asynchronous copies between CPU and OpenCL devices.
The AMD implementation of OpenCL is known to
fail when copying data asynchronously. When using this implementation,
it is therefore necessary to disable asynchronous data transfers.
@end defvr

@defvr {Environment variable} STARPU_DISABLE_CUDA_GPU_GPU_DIRECT
Disable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAM
instead. This permits to test the performance effect of GPU-Direct.
@end defvr

@node Scheduling
@subsection Configuring the Scheduling engine

@defvr {Environment variable} STARPU_SCHED
Choose between the different scheduling policies proposed by StarPU: work
random, stealing, greedy, with performance models, etc.

Use @code{STARPU_SCHED=help} to get the list of available schedulers.
@end defvr

@defvr {Environment variable} STARPU_CALIBRATE
If this variable is set to 1, the performance models are calibrated during
the execution. If it is set to 2, the previous values are dropped to restart
calibration from scratch. Setting this variable to 0 disable calibration, this
is the default behaviour.

Note: this currently only applies to @code{dm} and @code{dmda} scheduling policies.
@end defvr

@defvr {Environment variable} STARPU_BUS_CALIBRATE
If this variable is set to 1, the bus is recalibrated during intialization.
@end defvr

@defvr {Environment variable} STARPU_PREFETCH
@anchor{STARPU_PREFETCH}
This variable indicates whether data prefetching should be enabled (0 means
that it is disabled). If prefetching is enabled, when a task is scheduled to be
executed e.g. on a GPU, StarPU will request an asynchronous transfer in
advance, so that data is already present on the GPU when the task starts. As a
result, computation and data transfers are overlapped.
Note that prefetching is enabled by default in StarPU.
@end defvr

@defvr {Environment variable} STARPU_SCHED_ALPHA
To estimate the cost of a task StarPU takes into account the estimated
computation time (obtained thanks to performance models). The alpha factor is
the coefficient to be applied to it before adding it to the communication part.
@end defvr

@defvr {Environment variable} STARPU_SCHED_BETA
To estimate the cost of a task StarPU takes into account the estimated
data transfer time (obtained thanks to performance models). The beta factor is
the coefficient to be applied to it before adding it to the computation part.
@end defvr

@defvr {Environment variable} STARPU_SCHED_GAMMA
Define the execution time penalty of a joule (@pxref{Power-based scheduling}).
@end defvr

@defvr {Environment variable} STARPU_IDLE_POWER
Define the idle power of the machine (@pxref{Power-based scheduling}).
@end defvr

@defvr {Environment variable} STARPU_PROFILING
Enable on-line performance monitoring (@pxref{Enabling on-line performance monitoring}).
@end defvr

@node Extensions
@subsection Extensions

@defvr {Environment variable} SOCL_OCL_LIB_OPENCL
THE SOCL test suite is only run when the environment variable
@code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the location
of the libOpenCL.so file of the OCL ICD implementation.
@end defvr

@defvr {Environment variable} STARPU_COMM_STATS
@anchor{STARPU_COMM_STATS}
Communication statistics for starpumpi (@pxref{StarPU MPI support})
will be enabled when the environment variable @code{STARPU_COMM_STATS}
is defined to an value other than 0.
@end defvr

@defvr {Environment variable} STARPU_MPI_CACHE
@anchor{STARPU_MPI_CACHE}
Communication cache for starpumpi (@pxref{StarPU MPI support}) will be
disabled when the environment variable @code{STARPU_MPI_CACHE} is set
to 0. It is enabled by default or for any other values of the variable
@code{STARPU_MPI_CACHE}.
@end defvr

@node Misc
@subsection Miscellaneous and debug

@defvr {Environment variable} STARPU_OPENCL_PROGRAM_DIR
@anchor{STARPU_OPENCL_PROGRAM_DIR}
This specifies the directory where the OpenCL codelet source files are
located. The function @ref{starpu_opencl_load_program_source} looks
for the codelet in the current directory, in the directory specified
by the environment variable @code{STARPU_OPENCL_PROGRAM_DIR}, in the
directory @code{share/starpu/opencl} of the installation directory of
StarPU, and finally in the source directory of StarPU.
@end defvr

@defvr {Environment variable} STARPU_SILENT
This variable allows to disable verbose mode at runtime when StarPU
has been configured with the option @code{--enable-verbose}. It also
disables the display of StarPU information and warning messages.
@end defvr

@defvr {Environment variable} STARPU_LOGFILENAME
This variable specifies in which file the debugging output should be saved to.
@end defvr

@defvr {Environment variable} STARPU_FXT_PREFIX
This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
@end defvr

@defvr {Environment variable} STARPU_LIMIT_GPU_MEM
This variable specifies the maximum number of megabytes that should be
available to the application on each GPUs. In case this value is smaller than
the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
on the device. This variable is intended to be used for experimental purposes
as it emulates devices that have a limited amount of memory.
@end defvr

@defvr {Environment variable} STARPU_GENERATE_TRACE
When set to 1, this variable indicates that StarPU should automatically
generate a Paje trace when @code{starpu_shutdown()} is called.
@end defvr

@defvr {Environment variable} STARPU_MEMORY_STATS
When set to 0, disable the display of memory statistics on data which
have not been unregistered at the end of the execution (@pxref{Memory
feedback}).
@end defvr

@defvr {Environment variable} STARPU_BUS_STATS
When defined, statistics about data transfers will be displayed when calling
@code{starpu_shutdown()} (@pxref{Profiling}).
@end defvr

@defvr {Environment variable} STARPU_WORKER_STATS
When defined, statistics about the workers will be displayed when calling
@code{starpu_shutdown()} (@pxref{Profiling}). When combined with the
environment variable @code{STARPU_PROFILING}, it displays the power
consumption (@pxref{Power-based scheduling}).
@end defvr

@defvr {Environment variable} STARPU_STATS
When set to 0, data statistics will not be displayed at the
end of the execution of an application (@pxref{Data statistics}).
@end defvr