| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603 | @c -*-texinfo-*-@c This file is part of the StarPU Handbook.@c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1@c Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique@c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique@c See the file starpu.texi for copying conditions.@menu* Compilation configuration::   * Execution configuration through environment variables::  @end menu@node Compilation configuration@section Compilation configurationThe following arguments can be given to the @code{configure} script.@menu* Common configuration::* Configuring workers::* Extension configuration::* Advanced configuration::@end menu@node Common configuration@subsection Common configuration@defvr {Configure option} --enable-debugEnable debugging messages.@end defvr@defvr {Configure option} --enable-debugEnable debugging messages.@end defvr@defvr {Configure option} --enable-fastDisable assertion checks, which saves computation time.@end defvr@defvr {Configure option} --enable-verboseIncrease the verbosity of the debugging messages.  This can be disabledat runtime by setting the environment variable @code{STARPU_SILENT} toany value.@smallexample$ STARPU_SILENT=1 ./vector_scal@end smallexample@end defvr@defvr {Configure option} --enable-coverageEnable flags for the @code{gcov} coverage tool.@end defvr@defvr {Configure option} --enable-quick-checkSpecify tests and examples should be run on a smaller data set, i.eallowing a faster execution time@end defvr@defvr {Configure option} --with-hwlocSpecify hwloc should be used by StarPU. hwloc should be found by themeans of the tools @code{pkg-config}.@end defvr@defvr {Configure option} --with-hwloc=@var{prefix}Specify hwloc should be used by StarPU. hwloc should be found in thedirectory specified by @var{prefix}.@end defvr@defvr {Configure option} --without-hwlocSpecify hwloc should not be used by StarPU.@end defvr@defvr {Configure option} --disable-build-docDisable the creation of the documentation. This should be done on amachine which does not have the tools @code{makeinfo} and @code{tex}.@end defvrAdditionally, the @command{configure} script recognize many variables, whichcan be listed by typing @code{./configure --help}. For example,@code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation ofCUDA kernels.@node Configuring workers@subsection Configuring workers@defvr {Configure option} --enable-maxcpus=@var{count}Use at most @var{count} CPU cores.  This information is thenavailable as the @code{STARPU_MAXCPUS} macro.@end defvr@defvr {Configure option} --disable-cpuDisable the use of CPUs of the machine. Only GPUs etc. will be used.@end defvr@defvr {Configure option} --enable-maxcudadev=@var{count}Use at most @var{count} CUDA devices.  This information is thenavailable as the @code{STARPU_MAXCUDADEVS} macro.@end defvr@defvr {Configure option} --disable-cudaDisable the use of CUDA, even if a valid CUDA installation was detected.@end defvr@defvr {Configure option} --with-cuda-dir=@var{prefix}Search for CUDA under @var{prefix}, which should notably contain@file{include/cuda.h}.@end defvr@defvr {Configure option} --with-cuda-include-dir=@var{dir}Search for CUDA headers under @var{dir}, which shouldnotably contain @code{cuda.h}. This defaults to @code{/include} appended to thevalue given to @code{--with-cuda-dir}.@end defvr@defvr {Configure option} --with-cuda-lib-dir=@var{dir}Search for CUDA libraries under @var{dir}, which should notably containthe CUDA shared libraries---e.g., @file{libcuda.so}.  This defaults to@code{/lib} appended to the value given to @code{--with-cuda-dir}.@end defvr@defvr {Configure option} --disable-cuda-memcpy-peerExplicitly disable peer transfers when using CUDA 4.0.@end defvr@defvr {Configure option} --enable-maxopencldev=@var{count}Use at most @var{count} OpenCL devices.  This information is thenavailable as the @code{STARPU_MAXOPENCLDEVS} macro.@end defvr@defvr {Configure option} --disable-openclDisable the use of OpenCL, even if the SDK is detected.@end defvr@defvr {Configure option} --with-opencl-dir=@var{prefix}Search for an OpenCL implementation under @var{prefix}, which shouldnotably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} onMac OS).@end defvr@defvr {Configure option} --with-opencl-include-dir=@var{dir}Search for OpenCL headers under @var{dir}, which should notably contain@file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS).  This defaults to@code{/include} appended to the value given to @code{--with-opencl-dir}.@end defvr@defvr {Configure option} --with-opencl-lib-dir=@var{dir}Search for an OpenCL library under @var{dir}, which should notablycontain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to@code{/lib} appended to the value given to @code{--with-opencl-dir}.@end defvr@defvr {Configure option} --enable-opencl-simulatorEnable considering the provided OpenCL implementation as a simulator, i.e. usethe kernel duration returned by OpenCL profiling information as wallclock timeinstead of the actual measured real time. This requires simgrid support.@end defvr@defvr {Configure option} --enable-maximplementations=@var{count}Allow for at most @var{count} codelet implementations for the sametarget device.  This information is then available as the@code{STARPU_MAXIMPLEMENTATIONS} macro.@end defvr@defvr {Configure option} --enable-max-sched-ctxs=@var{count}Allow for at most @var{count} scheduling contextsThis information is then available as the@code{STARPU_NMAX_SCHED_CTXS} macro.@end defvr@defvr {Configure option} --disable-asynchronous-copyDisable asynchronous copies between CPU and GPU devices.The AMD implementation of OpenCL is known tofail when copying data asynchronously. When using this implementation,it is therefore necessary to disable asynchronous data transfers.@end defvr@defvr {Configure option} --disable-asynchronous-cuda-copyDisable asynchronous copies between CPU and CUDA devices.@end defvr@defvr {Configure option} --disable-asynchronous-opencl-copyDisable asynchronous copies between CPU and OpenCL devices.The AMD implementation of OpenCL is known tofail when copying data asynchronously. When using this implementation,it is therefore necessary to disable asynchronous data transfers.@end defvr@node Extension configuration@subsection Extension configuration@defvr {Configure option} --disable-soclDisable the SOCL extension (@pxref{SOCL OpenCL Extensions}).  Bydefault, it is enabled when an OpenCL implementation is found.@end defvr@defvr {Configure option} --disable-starpu-topDisable the StarPU-Top interface (@pxref{StarPU-Top}).  By default, itis enabled when the required dependencies are found.@end defvr@defvr {Configure option} --disable-gcc-extensionsDisable the GCC plug-in (@pxref{C Extensions}).  By default, it isenabled when the GCC compiler provides a plug-in support.@end defvr@defvr {Configure option} --with-mpicc=@var{path}Use the @command{mpicc} compiler at @var{path}, for starpumpi(@pxref{StarPU MPI support}).@end defvr@node Advanced configuration@subsection Advanced configuration@defvr {Configure option} --enable-perf-debugEnable performance debugging through gprof.@end defvr@defvr {Configure option} --enable-model-debugEnable performance model debugging.@end defvr@defvr {Configure option} --enable-stats@c see ../../src/datawizard/datastats.cEnable gathering of various data statistics (@pxref{Data statistics}).@end defvr@defvr {Configure option} --enable-maxbuffersDefine the maximum number of buffers that tasks will be able to takeas parameters, then available as the @code{STARPU_NMAXBUFS} macro.@end defvr@defvr {Configure option} --enable-allocation-cacheEnable the use of a data allocation cache to avoid the cost of it withCUDA. Still experimental.@end defvr@defvr {Configure option} --enable-opengl-renderEnable the use of OpenGL for the rendering of some examples.@c TODO: rather default to enabled when detected@end defvr@defvr {Configure option} --enable-blas-libSpecify the blas library to be used by some of the examples. Thelibrary has to be 'atlas' or 'goto'.@end defvr@defvr {Configure option} --disable-starpufftDisable the build of libstarpufft, even if fftw or cuFFT is available.@end defvr@defvr {Configure option} --with-magma=@var{prefix}Search for MAGMA under @var{prefix}.  @var{prefix} should notablycontain @file{include/magmablas.h}.@end defvr@defvr {Configure option} --with-fxt=@var{prefix}Search for FxT under @var{prefix}.@url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generatetraces of scheduling events, which can then be rendered them using ViTE(@pxref{Off-line, off-line performance feedback}).  @var{prefix} shouldnotably contain @code{include/fxt/fxt.h}.@end defvr@defvr {Configure option} --with-perf-model-dir=@var{dir}Store performance models under @var{dir}, instead of the current user'shome.@end defvr@defvr {Configure option} --with-goto-dir=@var{prefix}Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.@end defvr@defvr {Configure option} --with-atlas-dir=@var{prefix}Search for ATLAS under @var{prefix}, which should notably contain@file{include/cblas.h}.@end defvr@defvr {Configure option} --with-mkl-cflags=@var{cflags}Use @var{cflags} to compile code that uses the MKL library.@end defvr@defvr {Configure option} --with-mkl-ldflags=@var{ldflags}Use @var{ldflags} when linking code that uses the MKL library.  Notethat the@url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,MKL website} provides a script to determine the linking flags.@end defvr@defvr {Configure option} --disable-build-examplesDisable the build of examples.@end defvr@defvr {Configure option} --enable-sched-ctx-hypervisorEnables the Scheduling Context Hypervisor plugin(@pxref{Scheduling Context Hypervisor}).By default, it is disabled.@end defvr@defvr {Configure option} --enable-memory-statsEnable memory statistics (@pxref{Memory feedback}).@end defvr@defvr {Configure option} --enable-simgridEnable simulation of execution in simgrid, to allow easy experimentation withvarious numbers of cores and GPUs, or amount of memory, etc. Experimental.The path to simgrid can be specified through the @code{SIMGRID_CFLAGS} and@code{SIMGRID_LIBS} environment variables, for instance:@exampleexport SIMGRID_CFLAGS="-I/usr/local/simgrid/include"export SIMGRID_LIBS="-L/usr/local/simgrid/lib -lsimgrid"@end example@end defvr@node Execution configuration through environment variables@section Execution configuration through environment variables@menu* Workers::                     Configuring workers* Scheduling::                  Configuring the Scheduling engine* Extensions::* Misc::                        Miscellaneous and debug@end menu@node Workers@subsection Configuring workers@defvr {Environment variable} STARPU_NCPUSpecify the number of CPU workers (thus not including workers dedicated to control accelerators). Note that by default, StarPU will not allocatemore CPU workers than there are physical CPUs, and that some CPUs are used to controlthe accelerators.@end defvr@defvr {Environment variable} STARPU_NCPUSThis variable is deprecated. You should use @code{STARPU_NCPU}.@end defvr@defvr {Environment variable} STARPU_NCUDASpecify the number of CUDA devices that StarPU can use. If@code{STARPU_NCUDA} is lower than the number of physical devices, it ispossible to select which CUDA devices should be used by the means of the@code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU willcreate as many CUDA workers as there are CUDA devices.@end defvr@defvr {Environment variable} STARPU_NOPENCLOpenCL equivalent of the @code{STARPU_NCUDA} environment variable.@end defvr@defvr {Environment variable} STARPU_OPENCL_ON_CPUSBy default, the OpenCL driver only enables GPU and acceleratordevices. By setting the environment variable@code{STARPU_OPENCL_ON_CPUS} to 1, the OpenCL driver will also enableCPU devices.@end defvr@defvr {Environment variable} STARPU_OPENCL_ONLY_ON_CPUSBy default, the OpenCL driver enables GPU and acceleratordevices. By setting the environment variable@code{STARPU_OPENCL_ONLY_ON_CPUS} to 1, the OpenCL driver will ONLY enableCPU devices.@end defvr@defvr {Environment variable} STARPU_WORKERS_NOBINDSetting it to non-zero will prevent StarPU from binding its threads toCPUs. This is for instance useful when running the testsuite in parallel.@end defvr@defvr {Environment variable} STARPU_WORKERS_CPUIDPassing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}specifies on which logical CPU the different workers should bebound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the firstworker will be bound to logical CPU #0, the second CPU worker will be bound tological CPU #1 and so on.  Note that the logical ordering of the CPUs is eitherdetermined by the OS, or provided by the @code{hwloc} library in case it isavailable.Note that the first workers correspond to the CUDA workers, then come theOpenCL workers, and finally the CPU workers. For example ifwe have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlledby logical CPU #0, the OpenCL device will be controlled by logical CPU #2, andthe logical CPUs #1 and #3 will be used by the CPU workers.If the number of workers is larger than the array given in@code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in around-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and thethird (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).This variable is ignored if the @code{use_explicit_workers_bindid} flag of the@code{starpu_conf} structure passed to @code{starpu_init} is set.@end defvr@defvr {Environment variable} STARPU_WORKERS_CUDAIDSimilarly to the @code{STARPU_WORKERS_CPUID} environment variable, it ispossible to select which CUDA devices should be used by StarPU. On a machineequipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and@code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and thatthey should use CUDA devices #1 and #3 (the logical ordering of the devices isthe one reported by CUDA).This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag ofthe @code{starpu_conf} structure passed to @code{starpu_init} is set.@end defvr@defvr {Environment variable} STARPU_WORKERS_OPENCLIDOpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag ofthe @code{starpu_conf} structure passed to @code{starpu_init} is set.@end defvr@defvr {Environment variable} @code{STARPU_SINGLE_COMBINED_WORKER}If set, StarPU will create several workers which won't be able to workconcurrently. It will create combined workers which size goes from 1 to thetotal number of CPU workers in the system.@end defvr@defvr {Environment variable} STARPU_SYNTHESIZE_ARITY_COMBINED_WORKERLet the user decide how many elements are allowed between combined workerscreated from hwloc information. For instance, in the case of sockets with 6cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} isset to 6, no combined worker will be synthesized beyond one for the socketand one per core. If it is set to 3, 3 intermediate combined workers will besynthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to2, 2 intermediate combined workers will be synthesized, to divide the the socketcores into 2 chunks of 3 cores, and then 3 additional combined workers will besynthesized, to divide the former synthesized workers into a bunch of 2 cores,and the remaining core (for which no combined worker is synthesized since thereis already a normal worker for it).The default, 2, thus makes StarPU tend to building a binary trees of combinedworkers.@end defvr@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_COPYDisable asynchronous copies between CPU and GPU devices.The AMD implementation of OpenCL is known tofail when copying data asynchronously. When using this implementation,it is therefore necessary to disable asynchronous data transfers.@end defvr@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPYDisable asynchronous copies between CPU and CUDA devices.@end defvr@defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPYDisable asynchronous copies between CPU and OpenCL devices.The AMD implementation of OpenCL is known tofail when copying data asynchronously. When using this implementation,it is therefore necessary to disable asynchronous data transfers.@end defvr@defvr {Environment variable} STARPU_DISABLE_CUDA_GPU_GPU_DIRECTDisable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAMinstead. This permits to test the performance effect of GPU-Direct.@end defvr@node Scheduling@subsection Configuring the Scheduling engine@defvr {Environment variable} STARPU_SCHEDChoose between the different scheduling policies proposed by StarPU: workrandom, stealing, greedy, with performance models, etc.Use @code{STARPU_SCHED=help} to get the list of available schedulers.@end defvr@defvr {Environment variable} STARPU_CALIBRATEIf this variable is set to 1, the performance models are calibrated duringthe execution. If it is set to 2, the previous values are dropped to restartcalibration from scratch. Setting this variable to 0 disable calibration, thisis the default behaviour.Note: this currently only applies to @code{dm} and @code{dmda} scheduling policies.@end defvr@defvr {Environment variable} STARPU_BUS_CALIBRATEIf this variable is set to 1, the bus is recalibrated during intialization.@end defvr@defvr {Environment variable} STARPU_PREFETCH@anchor{STARPU_PREFETCH}This variable indicates whether data prefetching should be enabled (0 meansthat it is disabled). If prefetching is enabled, when a task is scheduled to beexecuted e.g. on a GPU, StarPU will request an asynchronous transfer inadvance, so that data is already present on the GPU when the task starts. As aresult, computation and data transfers are overlapped.Note that prefetching is enabled by default in StarPU.@end defvr@defvr {Environment variable} STARPU_SCHED_ALPHATo estimate the cost of a task StarPU takes into account the estimatedcomputation time (obtained thanks to performance models). The alpha factor isthe coefficient to be applied to it before adding it to the communication part.@end defvr@defvr {Environment variable} STARPU_SCHED_BETATo estimate the cost of a task StarPU takes into account the estimateddata transfer time (obtained thanks to performance models). The beta factor isthe coefficient to be applied to it before adding it to the computation part.@end defvr@defvr {Environment variable} STARPU_SCHED_GAMMADefine the execution time penalty of a joule (@pxref{Power-based scheduling}).@end defvr@defvr {Environment variable} STARPU_IDLE_POWERDefine the idle power of the machine (@pxref{Power-based scheduling}).@end defvr@defvr {Environment variable} STARPU_PROFILINGEnable on-line performance monitoring (@pxref{Enabling on-line performance monitoring}).@end defvr@node Extensions@subsection Extensions@defvr {Environment variable} SOCL_OCL_LIB_OPENCLTHE SOCL test suite is only run when the environment variable@code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the locationof the libOpenCL.so file of the OCL ICD implementation.@end defvr@defvr {Environment variable} STARPU_COMM_STATS@anchor{STARPU_COMM_STATS}Communication statistics for starpumpi (@pxref{StarPU MPI support})will be enabled when the environment variable @code{STARPU_COMM_STATS}is defined to an value other than 0.@end defvr@defvr {Environment variable} STARPU_MPI_CACHE@anchor{STARPU_MPI_CACHE}Communication cache for starpumpi (@pxref{StarPU MPI support}) will bedisabled when the environment variable @code{STARPU_MPI_CACHE} is setto 0. It is enabled by default or for any other values of the variable@code{STARPU_MPI_CACHE}.@end defvr@node Misc@subsection Miscellaneous and debug@defvr {Environment variable} STARPU_OPENCL_PROGRAM_DIR@anchor{STARPU_OPENCL_PROGRAM_DIR}This specifies the directory where the OpenCL codelet source files arelocated. The function @ref{starpu_opencl_load_program_source} looksfor the codelet in the current directory, in the directory specifiedby the environment variable @code{STARPU_OPENCL_PROGRAM_DIR}, in thedirectory @code{share/starpu/opencl} of the installation directory ofStarPU, and finally in the source directory of StarPU.@end defvr@defvr {Environment variable} STARPU_SILENTThis variable allows to disable verbose mode at runtime when StarPUhas been configured with the option @code{--enable-verbose}. It alsodisables the display of StarPU information and warning messages.@end defvr@defvr {Environment variable} STARPU_LOGFILENAMEThis variable specifies in which file the debugging output should be saved to.@end defvr@defvr {Environment variable} STARPU_FXT_PREFIXThis variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.@end defvr@defvr {Environment variable} STARPU_LIMIT_GPU_MEMThis variable specifies the maximum number of megabytes that should beavailable to the application on each GPUs. In case this value is smaller thanthe size of the memory of a GPU, StarPU pre-allocates a buffer to waste memoryon the device. This variable is intended to be used for experimental purposesas it emulates devices that have a limited amount of memory.@end defvr@defvr {Environment variable} STARPU_GENERATE_TRACEWhen set to @code{1}, this variable indicates that StarPU should automaticallygenerate a Paje trace when @code{starpu_shutdown()} is called.@end defvr@defvr {Environment variable} STARPU_MEMORY_STATSWhen set to 0, disable the display of memory statistics on data whichhave not been unregistered at the end of the execution (@pxref{Memoryfeedback}).@end defvr@defvr {Environment variable} STARPU_BUS_STATSWhen defined, statistics about data transfers will be displayed when calling@code{starpu_shutdown()} (@pxref{Profiling}).@end defvr@defvr {Environment variable} STARPU_WORKER_STATSWhen defined, statistics about the workers will be displayed when calling@code{starpu_shutdown()} (@pxref{Profiling}). When combined with theenvironment variable @code{STARPU_PROFILING}, it displays the powerconsumption (@pxref{Power-based scheduling}).@end defvr@defvr {Environment variable} STARPU_STATSWhen set to 0, data statistics will not be displayed at theend of the execution of an application (@pxref{Data statistics}).@end defvr
 |