| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452 | 
							- @c -*-texinfo-*-
 
- @c This file is part of the StarPU Handbook.
 
- @c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 
- @c Copyright (C) 2010, 2011, 2012  Centre National de la Recherche Scientifique
 
- @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 
- @c See the file starpu.texi for copying conditions.
 
- @menu
 
- * Compilation configuration::   
 
- * Execution configuration through environment variables::  
 
- @end menu
 
- @node Compilation configuration
 
- @section Compilation configuration
 
- The following arguments can be given to the @code{configure} script.
 
- @menu
 
- * Common configuration::        
 
- * Configuring workers::         
 
- * Extension configuration::     
 
- * Advanced configuration::      
 
- @end menu
 
- @node Common configuration
 
- @subsection Common configuration
 
- @table @code
 
- @item --enable-debug
 
- Enable debugging messages.
 
- @item --enable-fast
 
- Disable assertion checks, which saves computation time.
 
- @item --enable-verbose
 
- Increase the verbosity of the debugging messages.  This can be disabled
 
- at runtime by setting the environment variable @code{STARPU_SILENT} to
 
- any value.
 
- @smallexample
 
- % STARPU_SILENT=1 ./vector_scal
 
- @end smallexample
 
- @item --enable-coverage
 
- Enable flags for the @code{gcov} coverage tool.
 
- @item --with-hwloc
 
- Specify hwloc should be used by StarPU. hwloc should be found by the
 
- means of the tools @code{pkg-config}.
 
- @item --with-hwloc=@var{prefix}
 
- Specify hwloc should be used by StarPU. hwloc should be found in the
 
- directory specified by @var{prefix}.
 
- @item --without-hwloc
 
- Specify hwloc should not be used by StarPU.
 
- @end table
 
- Additionally, the @command{configure} script recognize many variables, which
 
- can be listed by typing @code{./configure --help}. For example,
 
- @code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation of
 
- CUDA kernels.
 
- @node Configuring workers
 
- @subsection Configuring workers
 
- @table @code
 
- @item --enable-maxcpus=@var{count}
 
- Use at most @var{count} CPU cores.  This information is then
 
- available as the @code{STARPU_MAXCPUS} macro.
 
- @item --disable-cpu
 
- Disable the use of CPUs of the machine. Only GPUs etc. will be used.
 
- @item --enable-maxcudadev=@var{count}
 
- Use at most @var{count} CUDA devices.  This information is then
 
- available as the @code{STARPU_MAXCUDADEVS} macro.
 
- @item --disable-cuda
 
- Disable the use of CUDA, even if a valid CUDA installation was detected.
 
- @item --with-cuda-dir=@var{prefix}
 
- Search for CUDA under @var{prefix}, which should notably contain
 
- @file{include/cuda.h}.
 
- @item --with-cuda-include-dir=@var{dir}
 
- Search for CUDA headers under @var{dir}, which should
 
- notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
 
- value given to @code{--with-cuda-dir}.
 
- @item --with-cuda-lib-dir=@var{dir}
 
- Search for CUDA libraries under @var{dir}, which should notably contain
 
- the CUDA shared libraries---e.g., @file{libcuda.so}.  This defaults to
 
- @code{/lib} appended to the value given to @code{--with-cuda-dir}.
 
- @item --disable-cuda-memcpy-peer
 
- Explicitly disable peer transfers when using CUDA 4.0.
 
- @item --enable-maxopencldev=@var{count}
 
- Use at most @var{count} OpenCL devices.  This information is then
 
- available as the @code{STARPU_MAXOPENCLDEVS} macro.
 
- @item --disable-opencl
 
- Disable the use of OpenCL, even if the SDK is detected.
 
- @item --with-opencl-dir=@var{prefix}
 
- Search for an OpenCL implementation under @var{prefix}, which should
 
- notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
 
- Mac OS).
 
- @item --with-opencl-include-dir=@var{dir}
 
- Search for OpenCL headers under @var{dir}, which should notably contain
 
- @file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS).  This defaults to
 
- @code{/include} appended to the value given to @code{--with-opencl-dir}.
 
- @item --with-opencl-lib-dir=@var{dir}
 
- Search for an OpenCL library under @var{dir}, which should notably
 
- contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
 
- @code{/lib} appended to the value given to @code{--with-opencl-dir}.
 
- @item --enable-gordon
 
- Enable the use of the Gordon runtime for Cell SPUs.
 
- @c TODO: rather default to enabled when detected
 
- @item --with-gordon-dir=@var{prefix}
 
- Search for the Gordon SDK under @var{prefix}.
 
- @item --enable-maximplementations=@var{count}
 
- Allow for at most @var{count} codelet implementations for the same
 
- target device.  This information is then available as the
 
- @code{STARPU_MAXIMPLEMENTATIONS} macro.
 
- @item --disable-asynchronous-copy
 
- Disable asynchronous copies between CPU and GPU devices.
 
- The AMD implementation of OpenCL is known to
 
- fail when copying data asynchronously. When using this implementation,
 
- it is therefore necessary to disable asynchronous data transfers.
 
- @item --disable-asynchronous-cuda-copy
 
- Disable asynchronous copies between CPU and CUDA devices.
 
- @item --disable-asynchronous-opencl-copy
 
- Disable asynchronous copies between CPU and OpenCL devices.
 
- The AMD implementation of OpenCL is known to
 
- fail when copying data asynchronously. When using this implementation,
 
- it is therefore necessary to disable asynchronous data transfers.
 
- @end table
 
- @node Extension configuration
 
- @subsection Extension configuration
 
- @table @code
 
- @item --disable-socl
 
- Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}).  By
 
- default, it is enabled when an OpenCL implementation is found.
 
- @item --disable-starpu-top
 
- Disable the StarPU-Top interface (@pxref{StarPU-Top}).  By default, it
 
- is enabled when the required dependencies are found.
 
- @item --disable-gcc-extensions
 
- Disable the GCC plug-in (@pxref{C Extensions}).  By default, it is
 
- enabled when the GCC compiler provides a plug-in support.
 
- @item --with-mpicc=@var{path}
 
- Use the @command{mpicc} compiler at @var{path}, for starpumpi
 
- (@pxref{StarPU MPI support}).
 
- @item --enable-comm-stats
 
- Enable communication statistics for starpumpi (@pxref{StarPU MPI
 
- support}).
 
- @end table
 
- @node Advanced configuration
 
- @subsection Advanced configuration
 
- @table @code
 
- @item --enable-perf-debug
 
- Enable performance debugging through gprof.
 
- @item --enable-model-debug
 
- Enable performance model debugging.
 
- @item --enable-stats
 
- @c see ../../src/datawizard/datastats.c
 
- Enable gathering of memory transfer statistics.
 
- @item --enable-maxbuffers
 
- Define the maximum number of buffers that tasks will be able to take
 
- as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
 
- @item --enable-allocation-cache
 
- Enable the use of a data allocation cache to avoid the cost of it with
 
- CUDA. Still experimental.
 
- @item --enable-opengl-render
 
- Enable the use of OpenGL for the rendering of some examples.
 
- @c TODO: rather default to enabled when detected
 
- @item --enable-blas-lib
 
- Specify the blas library to be used by some of the examples. The
 
- library has to be 'atlas' or 'goto'.
 
- @item --disable-starpufft
 
- Disable the build of libstarpufft, even if fftw or cuFFT is available.
 
- @item --with-magma=@var{prefix}
 
- Search for MAGMA under @var{prefix}.  @var{prefix} should notably
 
- contain @file{include/magmablas.h}.
 
- @item --with-fxt=@var{prefix}
 
- Search for FxT under @var{prefix}.
 
- @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
 
- traces of scheduling events, which can then be rendered them using ViTE
 
- (@pxref{Off-line, off-line performance feedback}).  @var{prefix} should
 
- notably contain @code{include/fxt/fxt.h}.
 
- @item --with-perf-model-dir=@var{dir}
 
- Store performance models under @var{dir}, instead of the current user's
 
- home.
 
- @item --with-goto-dir=@var{prefix}
 
- Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.
 
- @item --with-atlas-dir=@var{prefix}
 
- Search for ATLAS under @var{prefix}, which should notably contain
 
- @file{include/cblas.h}.
 
- @item --with-mkl-cflags=@var{cflags}
 
- Use @var{cflags} to compile code that uses the MKL library.
 
- @item --with-mkl-ldflags=@var{ldflags}
 
- Use @var{ldflags} when linking code that uses the MKL library.  Note
 
- that the
 
- @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
 
- MKL website} provides a script to determine the linking flags.
 
- @item --disable-build-examples
 
- Disable the build of examples.
 
- @end table
 
- @node Execution configuration through environment variables
 
- @section Execution configuration through environment variables
 
- @menu
 
- * Workers::                     Configuring workers
 
- * Scheduling::                  Configuring the Scheduling engine
 
- * Extensions::
 
- * Misc::                        Miscellaneous and debug
 
- @end menu
 
- @node Workers
 
- @subsection Configuring workers
 
- @table @code
 
- @item @code{STARPU_NCPU}
 
- Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
 
- more CPU workers than there are physical CPUs, and that some CPUs are used to control
 
- the accelerators.
 
- @item @code{STARPU_NCUDA}
 
- Specify the number of CUDA devices that StarPU can use. If
 
- @code{STARPU_NCUDA} is lower than the number of physical devices, it is
 
- possible to select which CUDA devices should be used by the means of the
 
- @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
 
- create as many CUDA workers as there are CUDA devices.
 
- @item @code{STARPU_NOPENCL}
 
- OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
 
- @item @code{STARPU_NGORDON}
 
- Specify the number of SPUs that StarPU can use.
 
- @item @code{STARPU_WORKERS_NOBIND}
 
- Setting it to non-zero will prevent StarPU from binding its threads to
 
- CPUs. This is for instance useful when running the testsuite in parallel.
 
- @item @code{STARPU_WORKERS_CPUID}
 
- Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
 
- specifies on which logical CPU the different workers should be
 
- bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
 
- worker will be bound to logical CPU #0, the second CPU worker will be bound to
 
- logical CPU #1 and so on.  Note that the logical ordering of the CPUs is either
 
- determined by the OS, or provided by the @code{hwloc} library in case it is
 
- available.
 
- Note that the first workers correspond to the CUDA workers, then come the
 
- OpenCL and the SPU, and finally the CPU workers. For example if
 
- we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}
 
- and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
 
- by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
 
- the logical CPUs #1 and #3 will be used by the CPU workers.
 
- If the number of workers is larger than the array given in
 
- @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
 
- round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
 
- third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
 
- This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
 
- @code{starpu_conf} structure passed to @code{starpu_init} is set.
 
- @item @code{STARPU_WORKERS_CUDAID}
 
- Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
 
- possible to select which CUDA devices should be used by StarPU. On a machine
 
- equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
 
- @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
 
- they should use CUDA devices #1 and #3 (the logical ordering of the devices is
 
- the one reported by CUDA).
 
- This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
 
- the @code{starpu_conf} structure passed to @code{starpu_init} is set.
 
- @item @code{STARPU_WORKERS_OPENCLID}
 
- OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
 
- This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
 
- the @code{starpu_conf} structure passed to @code{starpu_init} is set.
 
- @item @code{STARPU_SINGLE_COMBINED_WORKER}
 
- If set, StarPU will create several workers which won't be able to work
 
- concurrently. It will create combined workers which size goes from 1 to the
 
- total number of CPU workers in the system.
 
- @item @code{SYNTHESIZE_ARITY_COMBINED_WORKER}
 
- Let the user decide how many elements are allowed between combined workers
 
- created from hwloc information. For instance, in the case of sockets with 6
 
- cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} is
 
- set to 6, no combined worker will be synthesized beyond one for the socket
 
- and one per core. If it is set to 3, 3 intermediate combined workers will be
 
- synthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to
 
- 2, 2 intermediate combined workers will be synthesized, to divide the the socket
 
- cores into 2 chunks of 3 cores, and then 3 additional combined workers will be
 
- synthesized, to divide the former synthesized workers into a bunch of 2 cores,
 
- and the remaining core (for which no combined worker is synthesized since there
 
- is already a normal worker for it).
 
- The default, 2, thus makes StarPU tend to building a binary trees of combined
 
- workers.
 
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_COPY}
 
- Disable asynchronous copies between CPU and GPU devices.
 
- The AMD implementation of OpenCL is known to
 
- fail when copying data asynchronously. When using this implementation,
 
- it is therefore necessary to disable asynchronous data transfers.
 
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY}
 
- Disable asynchronous copies between CPU and CUDA devices.
 
- @item @code{STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY}
 
- Disable asynchronous copies between CPU and OpenCL devices.
 
- The AMD implementation of OpenCL is known to
 
- fail when copying data asynchronously. When using this implementation,
 
- it is therefore necessary to disable asynchronous data transfers.
 
- @item @code{STARPU_DISABLE_CUDA_GPU_GPU_DIRECT}
 
- Disable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAM
 
- instead. This permits to test the performance effect of GPU-Direct.
 
- @end table
 
- @node Scheduling
 
- @subsection Configuring the Scheduling engine
 
- @table @code
 
- @item @code{STARPU_SCHED}
 
- Choose between the different scheduling policies proposed by StarPU: work
 
- random, stealing, greedy, with performance models, etc.
 
- Use @code{STARPU_SCHED=help} to get the list of available schedulers.
 
- @item @code{STARPU_CALIBRATE}
 
- If this variable is set to 1, the performance models are calibrated during
 
- the execution. If it is set to 2, the previous values are dropped to restart
 
- calibration from scratch. Setting this variable to 0 disable calibration, this
 
- is the default behaviour.
 
- Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
 
- @item @code{STARPU_BUS_CALIBRATE}
 
- If this variable is set to 1, the bus is recalibrated during intialization.
 
- @item @code{STARPU_PREFETCH}
 
- @anchor{STARPU_PREFETCH}
 
- This variable indicates whether data prefetching should be enabled (0 means
 
- that it is disabled). If prefetching is enabled, when a task is scheduled to be
 
- executed e.g. on a GPU, StarPU will request an asynchronous transfer in
 
- advance, so that data is already present on the GPU when the task starts. As a
 
- result, computation and data transfers are overlapped.
 
- Note that prefetching is enabled by default in StarPU.
 
- @item @code{STARPU_SCHED_ALPHA}
 
- To estimate the cost of a task StarPU takes into account the estimated
 
- computation time (obtained thanks to performance models). The alpha factor is
 
- the coefficient to be applied to it before adding it to the communication part.
 
- @item @code{STARPU_SCHED_BETA}
 
- To estimate the cost of a task StarPU takes into account the estimated
 
- data transfer time (obtained thanks to performance models). The beta factor is
 
- the coefficient to be applied to it before adding it to the computation part.
 
- @end table
 
- @node Extensions
 
- @subsection Extensions
 
- @table @code
 
- @item @code{SOCL_OCL_LIB_OPENCL}
 
- THE SOCL test suite is only run when the environment variable
 
- @code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the location
 
- of the libOpenCL.so file of the OCL ICD implementation.
 
- @end table
 
- @node Misc
 
- @subsection Miscellaneous and debug
 
- @table @code
 
- @item @code{STARPU_SILENT}
 
- This variable allows to disable verbose mode at runtime when StarPU
 
- has been configured with the option @code{--enable-verbose}.
 
- @item @code{STARPU_LOGFILENAME}
 
- This variable specifies in which file the debugging output should be saved to.
 
- @item @code{STARPU_FXT_PREFIX}
 
- This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
 
- @item @code{STARPU_LIMIT_GPU_MEM}
 
- This variable specifies the maximum number of megabytes that should be
 
- available to the application on each GPUs. In case this value is smaller than
 
- the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
 
- on the device. This variable is intended to be used for experimental purposes
 
- as it emulates devices that have a limited amount of memory.
 
- @item @code{STARPU_GENERATE_TRACE}
 
- When set to 1, this variable indicates that StarPU should automatically
 
- generate a Paje trace when starpu_shutdown is called.
 
- @end table
 
 
  |