123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459 |
- @c -*-texinfo-*-
- @c This file is part of the StarPU Handbook.
- @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
- @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
- @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
- @c See the file starpu.texi for copying conditions.
- @menu
- * Compilation configuration::
- * Execution configuration through environment variables::
- @end menu
- @node Compilation configuration
- @section Compilation configuration
- The following arguments can be given to the @code{configure} script.
- @menu
- * Common configuration::
- * Configuring workers::
- * Extension configuration::
- * Advanced configuration::
- @end menu
- @node Common configuration
- @subsection Common configuration
- @table @code
- @item --enable-debug
- Enable debugging messages.
- @item --enable-fast
- Disable assertion checks, which saves computation time.
- @item --enable-verbose
- Increase the verbosity of the debugging messages. This can be disabled
- at runtime by setting the environment variable @code{STARPU_SILENT} to
- any value.
- @smallexample
- % STARPU_SILENT=1 ./vector_scal
- @end smallexample
- @item --enable-coverage
- Enable flags for the @code{gcov} coverage tool.
- @item --with-hwloc
- Specify hwloc should be used by StarPU. hwloc should be found by the
- means of the tools @code{pkg-config}.
- @item --with-hwloc=@var{prefix}
- Specify hwloc should be used by StarPU. hwloc should be found in the
- directory specified by @var{prefix}.
- @item --without-hwloc
- Specify hwloc should not be used by StarPU.
- @end table
- Additionally, the @command{configure} script recognize many variables, which
- can be listed by typing @code{./configure --help}. For example,
- @code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation of
- CUDA kernels.
- @node Configuring workers
- @subsection Configuring workers
- @table @code
- @item --enable-maxcpus=@var{count}
- Use at most @var{count} CPU cores. This information is then
- available as the @code{STARPU_MAXCPUS} macro.
- @item --disable-cpu
- Disable the use of CPUs of the machine. Only GPUs etc. will be used.
- @item --enable-maxcudadev=@var{count}
- Use at most @var{count} CUDA devices. This information is then
- available as the @code{STARPU_MAXCUDADEVS} macro.
- @item --disable-cuda
- Disable the use of CUDA, even if a valid CUDA installation was detected.
- @item --with-cuda-dir=@var{prefix}
- Search for CUDA under @var{prefix}, which should notably contain
- @file{include/cuda.h}.
- @item --with-cuda-include-dir=@var{dir}
- Search for CUDA headers under @var{dir}, which should
- notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
- value given to @code{--with-cuda-dir}.
- @item --with-cuda-lib-dir=@var{dir}
- Search for CUDA libraries under @var{dir}, which should notably contain
- the CUDA shared libraries---e.g., @file{libcuda.so}. This defaults to
- @code{/lib} appended to the value given to @code{--with-cuda-dir}.
- @item --disable-cuda-memcpy-peer
- Explicitly disable peer transfers when using CUDA 4.0.
- @item --enable-maxopencldev=@var{count}
- Use at most @var{count} OpenCL devices. This information is then
- available as the @code{STARPU_MAXOPENCLDEVS} macro.
- @item --disable-opencl
- Disable the use of OpenCL, even if the SDK is detected.
- @item --with-opencl-dir=@var{prefix}
- Search for an OpenCL implementation under @var{prefix}, which should
- notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
- Mac OS).
- @item --with-opencl-include-dir=@var{dir}
- Search for OpenCL headers under @var{dir}, which should notably contain
- @file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS). This defaults to
- @code{/include} appended to the value given to @code{--with-opencl-dir}.
- @item --with-opencl-lib-dir=@var{dir}
- Search for an OpenCL library under @var{dir}, which should notably
- contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
- @code{/lib} appended to the value given to @code{--with-opencl-dir}.
- @item --enable-gordon
- Enable the use of the Gordon runtime for Cell SPUs.
- @c TODO: rather default to enabled when detected
- @item --with-gordon-dir=@var{prefix}
- Search for the Gordon SDK under @var{prefix}.
- @item --enable-maximplementations=@var{count}
- Allow for at most @var{count} codelet implementations for the same
- target device. This information is then available as the
- @code{STARPU_MAXIMPLEMENTATIONS} macro.
- @item --disable-asynchronous-copy
- Disable asynchronous copies between CPU and GPU devices.
- The AMD implementation of OpenCL is known to
- fail when copying data asynchronously. When using this implementation,
- it is therefore necessary to disable asynchronous data transfers.
- @item --disable-asynchronous-cuda-copy
- Disable asynchronous copies between CPU and CUDA devices.
- @item --disable-asynchronous-opencl-copy
- Disable asynchronous copies between CPU and OpenCL devices.
- The AMD implementation of OpenCL is known to
- fail when copying data asynchronously. When using this implementation,
- it is therefore necessary to disable asynchronous data transfers.
- @end table
- @node Extension configuration
- @subsection Extension configuration
- @table @code
- @item --disable-socl
- Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}). By
- default, it is enabled when an OpenCL implementation is found.
- @item --disable-starpu-top
- Disable the StarPU-Top interface (@pxref{StarPU-Top}). By default, it
- is enabled when the required dependencies are found.
- @item --disable-gcc-extensions
- Disable the GCC plug-in (@pxref{C Extensions}). By default, it is
- enabled when the GCC compiler provides a plug-in support.
- @item --with-mpicc=@var{path}
- Use the @command{mpicc} compiler at @var{path}, for starpumpi
- (@pxref{StarPU MPI support}).
- @item --enable-comm-stats
- @anchor{enable-comm-stats}
- Enable communication statistics for starpumpi (@pxref{StarPU MPI
- support}).
- @end table
- @node Advanced configuration
- @subsection Advanced configuration
- @table @code
- @item --enable-perf-debug
- Enable performance debugging through gprof.
- @item --enable-model-debug
- Enable performance model debugging.
- @item --enable-stats
- @c see ../../src/datawizard/datastats.c
- Enable gathering of memory transfer statistics.
- @item --enable-maxbuffers
- Define the maximum number of buffers that tasks will be able to take
- as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
- @item --enable-allocation-cache
- Enable the use of a data allocation cache to avoid the cost of it with
- CUDA. Still experimental.
- @item --enable-opengl-render
- Enable the use of OpenGL for the rendering of some examples.
- @c TODO: rather default to enabled when detected
- @item --enable-blas-lib
- Specify the blas library to be used by some of the examples. The
- library has to be 'atlas' or 'goto'.
- @item --disable-starpufft
- Disable the build of libstarpufft, even if fftw or cuFFT is available.
- @item --with-magma=@var{prefix}
- Search for MAGMA under @var{prefix}. @var{prefix} should notably
- contain @file{include/magmablas.h}.
- @item --with-fxt=@var{prefix}
- Search for FxT under @var{prefix}.
- @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
- traces of scheduling events, which can then be rendered them using ViTE
- (@pxref{Off-line, off-line performance feedback}). @var{prefix} should
- notably contain @code{include/fxt/fxt.h}.
- @item --with-perf-model-dir=@var{dir}
- Store performance models under @var{dir}, instead of the current user's
- home.
- @item --with-goto-dir=@var{prefix}
- Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.
- @item --with-atlas-dir=@var{prefix}
- Search for ATLAS under @var{prefix}, which should notably contain
- @file{include/cblas.h}.
- @item --with-mkl-cflags=@var{cflags}
- Use @var{cflags} to compile code that uses the MKL library.
- @item --with-mkl-ldflags=@var{ldflags}
- Use @var{ldflags} when linking code that uses the MKL library. Note
- that the
- @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
- MKL website} provides a script to determine the linking flags.
- @item --disable-build-examples
- Disable the build of examples.
- @end table
- @node Execution configuration through environment variables
- @section Execution configuration through environment variables
- @menu
- * Workers:: Configuring workers
- * Scheduling:: Configuring the Scheduling engine
- * Extensions::
- * Misc:: Miscellaneous and debug
- @end menu
- @node Workers
- @subsection Configuring workers
- @table @code
- @item @code{STARPU_NCPU}
- Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
- more CPU workers than there are physical CPUs, and that some CPUs are used to control
- the accelerators.
- @item @code{STARPU_NCUDA}
- Specify the number of CUDA devices that StarPU can use. If
- @code{STARPU_NCUDA} is lower than the number of physical devices, it is
- possible to select which CUDA devices should be used by the means of the
- @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
- create as many CUDA workers as there are CUDA devices.
- @item @code{STARPU_NOPENCL}
- OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
- @item @code{STARPU_NGORDON}
- Specify the number of SPUs that StarPU can use.
- @item @code{STARPU_WORKERS_NOBIND}
- Setting it to non-zero will prevent StarPU from binding its threads to
- CPUs. This is for instance useful when running the testsuite in parallel.
- @item @code{STARPU_WORKERS_CPUID}
- Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
- specifies on which logical CPU the different workers should be
- bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
- worker will be bound to logical CPU #0, the second CPU worker will be bound to
- logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
- determined by the OS, or provided by the @code{hwloc} library in case it is
- available.
- Note that the first workers correspond to the CUDA workers, then come the
- OpenCL and the SPU, and finally the CPU workers. For example if
- we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}
- and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
- by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
- the logical CPUs #1 and #3 will be used by the CPU workers.
- If the number of workers is larger than the array given in
- @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
- round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
- third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
- This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
- @code{starpu_conf} structure passed to @code{starpu_init} is set.
- @item @code{STARPU_WORKERS_CUDAID}
- Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
- possible to select which CUDA devices should be used by StarPU. On a machine
- equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
- @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
- they should use CUDA devices #1 and #3 (the logical ordering of the devices is
- the one reported by CUDA).
- This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
- the @code{starpu_conf} structure passed to @code{starpu_init} is set.
- @item @code{STARPU_WORKERS_OPENCLID}
- OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
- This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
- the @code{starpu_conf} structure passed to @code{starpu_init} is set.
- @item @code{STARPU_SINGLE_COMBINED_WORKER}
- If set, StarPU will create several workers which won't be able to work
- concurrently. It will create combined workers which size goes from 1 to the
- total number of CPU workers in the system.
- @item @code{SYNTHESIZE_ARITY_COMBINED_WORKER}
- Let the user decide how many elements are allowed between combined workers
- created from hwloc information. For instance, in the case of sockets with 6
- cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} is
- set to 6, no combined worker will be synthesized beyond one for the socket
- and one per core. If it is set to 3, 3 intermediate combined workers will be
- synthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to
- 2, 2 intermediate combined workers will be synthesized, to divide the the socket
- cores into 2 chunks of 3 cores, and then 3 additional combined workers will be
- synthesized, to divide the former synthesized workers into a bunch of 2 cores,
- and the remaining core (for which no combined worker is synthesized since there
- is already a normal worker for it).
- The default, 2, thus makes StarPU tend to building a binary trees of combined
- workers.
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_COPY}
- Disable asynchronous copies between CPU and GPU devices.
- The AMD implementation of OpenCL is known to
- fail when copying data asynchronously. When using this implementation,
- it is therefore necessary to disable asynchronous data transfers.
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY}
- Disable asynchronous copies between CPU and CUDA devices.
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY}
- Disable asynchronous copies between CPU and OpenCL devices.
- The AMD implementation of OpenCL is known to
- fail when copying data asynchronously. When using this implementation,
- it is therefore necessary to disable asynchronous data transfers.
- @item @code{STARPU_DISABLE_CUDA_GPU_GPU_DIRECT}
- Disable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAM
- instead. This permits to test the performance effect of GPU-Direct.
- @end table
- @node Scheduling
- @subsection Configuring the Scheduling engine
- @table @code
- @item @code{STARPU_SCHED}
- Choose between the different scheduling policies proposed by StarPU: work
- random, stealing, greedy, with performance models, etc.
- Use @code{STARPU_SCHED=help} to get the list of available schedulers.
- @item @code{STARPU_CALIBRATE}
- If this variable is set to 1, the performance models are calibrated during
- the execution. If it is set to 2, the previous values are dropped to restart
- calibration from scratch. Setting this variable to 0 disable calibration, this
- is the default behaviour.
- Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
- @item @code{STARPU_BUS_CALIBRATE}
- If this variable is set to 1, the bus is recalibrated during intialization.
- @item @code{STARPU_PREFETCH}
- @anchor{STARPU_PREFETCH}
- This variable indicates whether data prefetching should be enabled (0 means
- that it is disabled). If prefetching is enabled, when a task is scheduled to be
- executed e.g. on a GPU, StarPU will request an asynchronous transfer in
- advance, so that data is already present on the GPU when the task starts. As a
- result, computation and data transfers are overlapped.
- Note that prefetching is enabled by default in StarPU.
- @item @code{STARPU_SCHED_ALPHA}
- To estimate the cost of a task StarPU takes into account the estimated
- computation time (obtained thanks to performance models). The alpha factor is
- the coefficient to be applied to it before adding it to the communication part.
- @item @code{STARPU_SCHED_BETA}
- To estimate the cost of a task StarPU takes into account the estimated
- data transfer time (obtained thanks to performance models). The beta factor is
- the coefficient to be applied to it before adding it to the computation part.
- @end table
- @node Extensions
- @subsection Extensions
- @table @code
- @item @code{SOCL_OCL_LIB_OPENCL}
- THE SOCL test suite is only run when the environment variable
- @code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the location
- of the libOpenCL.so file of the OCL ICD implementation.
- @item @code{STARPU_COMM_STATS}
- Communication statistics for starpumpi (@pxref{StarPU MPI support})
- will be enabled when the environment variable @code{STARPU_COMM_STATS}
- is defined. The statistics can also be enabled by configuring StarPU
- with the option @code{--enable-comm-stats} (@pxref{enable-comm-stats}).
- @end table
- @node Misc
- @subsection Miscellaneous and debug
- @table @code
- @item @code{STARPU_SILENT}
- This variable allows to disable verbose mode at runtime when StarPU
- has been configured with the option @code{--enable-verbose}.
- @item @code{STARPU_LOGFILENAME}
- This variable specifies in which file the debugging output should be saved to.
- @item @code{STARPU_FXT_PREFIX}
- This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
- @item @code{STARPU_LIMIT_GPU_MEM}
- This variable specifies the maximum number of megabytes that should be
- available to the application on each GPUs. In case this value is smaller than
- the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
- on the device. This variable is intended to be used for experimental purposes
- as it emulates devices that have a limited amount of memory.
- @item @code{STARPU_GENERATE_TRACE}
- When set to 1, this variable indicates that StarPU should automatically
- generate a Paje trace when starpu_shutdown is called.
- @end table
|