configuration.texi 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Compilation configuration::
  9. * Execution configuration through environment variables::
  10. @end menu
  11. @node Compilation configuration
  12. @section Compilation configuration
  13. The following arguments can be given to the @code{configure} script.
  14. @menu
  15. * Common configuration::
  16. * Configuring workers::
  17. * Extension configuration::
  18. * Advanced configuration::
  19. @end menu
  20. @node Common configuration
  21. @subsection Common configuration
  22. @table @code
  23. @item --enable-debug
  24. Enable debugging messages.
  25. @item --enable-fast
  26. Disable assertion checks, which saves computation time.
  27. @item --enable-verbose
  28. Increase the verbosity of the debugging messages. This can be disabled
  29. at runtime by setting the environment variable @code{STARPU_SILENT} to
  30. any value.
  31. @smallexample
  32. % STARPU_SILENT=1 ./vector_scal
  33. @end smallexample
  34. @item --enable-coverage
  35. Enable flags for the @code{gcov} coverage tool.
  36. @item --enable-quick-check
  37. Specify tests and examples should be run on a smaller data set, i.e
  38. allowing a faster execution time
  39. @item --with-hwloc
  40. Specify hwloc should be used by StarPU. hwloc should be found by the
  41. means of the tools @code{pkg-config}.
  42. @item --with-hwloc=@var{prefix}
  43. Specify hwloc should be used by StarPU. hwloc should be found in the
  44. directory specified by @var{prefix}.
  45. @item --without-hwloc
  46. Specify hwloc should not be used by StarPU.
  47. @end table
  48. Additionally, the @command{configure} script recognize many variables, which
  49. can be listed by typing @code{./configure --help}. For example,
  50. @code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation of
  51. CUDA kernels.
  52. @node Configuring workers
  53. @subsection Configuring workers
  54. @table @code
  55. @item --enable-maxcpus=@var{count}
  56. Use at most @var{count} CPU cores. This information is then
  57. available as the @code{STARPU_MAXCPUS} macro.
  58. @item --disable-cpu
  59. Disable the use of CPUs of the machine. Only GPUs etc. will be used.
  60. @item --enable-maxcudadev=@var{count}
  61. Use at most @var{count} CUDA devices. This information is then
  62. available as the @code{STARPU_MAXCUDADEVS} macro.
  63. @item --disable-cuda
  64. Disable the use of CUDA, even if a valid CUDA installation was detected.
  65. @item --with-cuda-dir=@var{prefix}
  66. Search for CUDA under @var{prefix}, which should notably contain
  67. @file{include/cuda.h}.
  68. @item --with-cuda-include-dir=@var{dir}
  69. Search for CUDA headers under @var{dir}, which should
  70. notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
  71. value given to @code{--with-cuda-dir}.
  72. @item --with-cuda-lib-dir=@var{dir}
  73. Search for CUDA libraries under @var{dir}, which should notably contain
  74. the CUDA shared libraries---e.g., @file{libcuda.so}. This defaults to
  75. @code{/lib} appended to the value given to @code{--with-cuda-dir}.
  76. @item --disable-cuda-memcpy-peer
  77. Explicitly disable peer transfers when using CUDA 4.0.
  78. @item --enable-maxopencldev=@var{count}
  79. Use at most @var{count} OpenCL devices. This information is then
  80. available as the @code{STARPU_MAXOPENCLDEVS} macro.
  81. @item --disable-opencl
  82. Disable the use of OpenCL, even if the SDK is detected.
  83. @item --with-opencl-dir=@var{prefix}
  84. Search for an OpenCL implementation under @var{prefix}, which should
  85. notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
  86. Mac OS).
  87. @item --with-opencl-include-dir=@var{dir}
  88. Search for OpenCL headers under @var{dir}, which should notably contain
  89. @file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS). This defaults to
  90. @code{/include} appended to the value given to @code{--with-opencl-dir}.
  91. @item --with-opencl-lib-dir=@var{dir}
  92. Search for an OpenCL library under @var{dir}, which should notably
  93. contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
  94. @code{/lib} appended to the value given to @code{--with-opencl-dir}.
  95. @item --enable-maximplementations=@var{count}
  96. Allow for at most @var{count} codelet implementations for the same
  97. target device. This information is then available as the
  98. @code{STARPU_MAXIMPLEMENTATIONS} macro.
  99. @item ----enable-max-sched-ctxs=@var{count}
  100. Allow for at most @var{count} scheduling contexts
  101. This information is then available as the
  102. @code{STARPU_NMAX_SCHED_CTXS} macro.
  103. @item --disable-asynchronous-copy
  104. Disable asynchronous copies between CPU and GPU devices.
  105. The AMD implementation of OpenCL is known to
  106. fail when copying data asynchronously. When using this implementation,
  107. it is therefore necessary to disable asynchronous data transfers.
  108. @item --disable-asynchronous-cuda-copy
  109. Disable asynchronous copies between CPU and CUDA devices.
  110. @item --disable-asynchronous-opencl-copy
  111. Disable asynchronous copies between CPU and OpenCL devices.
  112. The AMD implementation of OpenCL is known to
  113. fail when copying data asynchronously. When using this implementation,
  114. it is therefore necessary to disable asynchronous data transfers.
  115. @end table
  116. @node Extension configuration
  117. @subsection Extension configuration
  118. @table @code
  119. @item --disable-socl
  120. Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}). By
  121. default, it is enabled when an OpenCL implementation is found.
  122. @item --disable-starpu-top
  123. Disable the StarPU-Top interface (@pxref{StarPU-Top}). By default, it
  124. is enabled when the required dependencies are found.
  125. @item --disable-gcc-extensions
  126. Disable the GCC plug-in (@pxref{C Extensions}). By default, it is
  127. enabled when the GCC compiler provides a plug-in support.
  128. @item --with-mpicc=@var{path}
  129. Use the @command{mpicc} compiler at @var{path}, for starpumpi
  130. (@pxref{StarPU MPI support}).
  131. @item --enable-comm-stats
  132. @anchor{enable-comm-stats}
  133. Enable communication statistics for starpumpi (@pxref{StarPU MPI
  134. support}).
  135. @end table
  136. @node Advanced configuration
  137. @subsection Advanced configuration
  138. @table @code
  139. @item --enable-perf-debug
  140. Enable performance debugging through gprof.
  141. @item --enable-model-debug
  142. Enable performance model debugging.
  143. @item --enable-stats
  144. @c see ../../src/datawizard/datastats.c
  145. Enable gathering of memory transfer statistics.
  146. @item --enable-maxbuffers
  147. Define the maximum number of buffers that tasks will be able to take
  148. as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
  149. @item --enable-allocation-cache
  150. Enable the use of a data allocation cache to avoid the cost of it with
  151. CUDA. Still experimental.
  152. @item --enable-opengl-render
  153. Enable the use of OpenGL for the rendering of some examples.
  154. @c TODO: rather default to enabled when detected
  155. @item --enable-blas-lib
  156. Specify the blas library to be used by some of the examples. The
  157. library has to be 'atlas' or 'goto'.
  158. @item --disable-starpufft
  159. Disable the build of libstarpufft, even if fftw or cuFFT is available.
  160. @item --with-magma=@var{prefix}
  161. Search for MAGMA under @var{prefix}. @var{prefix} should notably
  162. contain @file{include/magmablas.h}.
  163. @item --with-fxt=@var{prefix}
  164. Search for FxT under @var{prefix}.
  165. @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
  166. traces of scheduling events, which can then be rendered them using ViTE
  167. (@pxref{Off-line, off-line performance feedback}). @var{prefix} should
  168. notably contain @code{include/fxt/fxt.h}.
  169. @item --with-perf-model-dir=@var{dir}
  170. Store performance models under @var{dir}, instead of the current user's
  171. home.
  172. @item --with-goto-dir=@var{prefix}
  173. Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.
  174. @item --with-atlas-dir=@var{prefix}
  175. Search for ATLAS under @var{prefix}, which should notably contain
  176. @file{include/cblas.h}.
  177. @item --with-mkl-cflags=@var{cflags}
  178. Use @var{cflags} to compile code that uses the MKL library.
  179. @item --with-mkl-ldflags=@var{ldflags}
  180. Use @var{ldflags} when linking code that uses the MKL library. Note
  181. that the
  182. @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
  183. MKL website} provides a script to determine the linking flags.
  184. @item --disable-build-examples
  185. Disable the build of examples.
  186. @item --enable-sched-ctx-hypervisor
  187. Enables the Scheduling Context Hypervisor plugin(@pxref{Scheduling Context Hypervisor}).
  188. By default, it is disabled.
  189. @end table
  190. @node Execution configuration through environment variables
  191. @section Execution configuration through environment variables
  192. @menu
  193. * Workers:: Configuring workers
  194. * Scheduling:: Configuring the Scheduling engine
  195. * Extensions::
  196. * Misc:: Miscellaneous and debug
  197. @end menu
  198. @node Workers
  199. @subsection Configuring workers
  200. @table @code
  201. @item @code{STARPU_NCPU}
  202. Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
  203. more CPU workers than there are physical CPUs, and that some CPUs are used to control
  204. the accelerators.
  205. @item @code{STARPU_NCUDA}
  206. Specify the number of CUDA devices that StarPU can use. If
  207. @code{STARPU_NCUDA} is lower than the number of physical devices, it is
  208. possible to select which CUDA devices should be used by the means of the
  209. @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
  210. create as many CUDA workers as there are CUDA devices.
  211. @item @code{STARPU_NOPENCL}
  212. OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
  213. @item @code{STARPU_WORKERS_NOBIND}
  214. Setting it to non-zero will prevent StarPU from binding its threads to
  215. CPUs. This is for instance useful when running the testsuite in parallel.
  216. @item @code{STARPU_WORKERS_CPUID}
  217. Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
  218. specifies on which logical CPU the different workers should be
  219. bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
  220. worker will be bound to logical CPU #0, the second CPU worker will be bound to
  221. logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
  222. determined by the OS, or provided by the @code{hwloc} library in case it is
  223. available.
  224. Note that the first workers correspond to the CUDA workers, then come the
  225. OpenCL workers, and finally the CPU workers. For example if
  226. we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}
  227. and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
  228. by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
  229. the logical CPUs #1 and #3 will be used by the CPU workers.
  230. If the number of workers is larger than the array given in
  231. @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
  232. round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
  233. third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
  234. This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
  235. @code{starpu_conf} structure passed to @code{starpu_init} is set.
  236. @item @code{STARPU_WORKERS_CUDAID}
  237. Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
  238. possible to select which CUDA devices should be used by StarPU. On a machine
  239. equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
  240. @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
  241. they should use CUDA devices #1 and #3 (the logical ordering of the devices is
  242. the one reported by CUDA).
  243. This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
  244. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  245. @item @code{STARPU_WORKERS_OPENCLID}
  246. OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
  247. This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
  248. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  249. @item @code{STARPU_SINGLE_COMBINED_WORKER}
  250. If set, StarPU will create several workers which won't be able to work
  251. concurrently. It will create combined workers which size goes from 1 to the
  252. total number of CPU workers in the system.
  253. @item @code{SYNTHESIZE_ARITY_COMBINED_WORKER}
  254. Let the user decide how many elements are allowed between combined workers
  255. created from hwloc information. For instance, in the case of sockets with 6
  256. cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} is
  257. set to 6, no combined worker will be synthesized beyond one for the socket
  258. and one per core. If it is set to 3, 3 intermediate combined workers will be
  259. synthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to
  260. 2, 2 intermediate combined workers will be synthesized, to divide the the socket
  261. cores into 2 chunks of 3 cores, and then 3 additional combined workers will be
  262. synthesized, to divide the former synthesized workers into a bunch of 2 cores,
  263. and the remaining core (for which no combined worker is synthesized since there
  264. is already a normal worker for it).
  265. The default, 2, thus makes StarPU tend to building a binary trees of combined
  266. workers.
  267. @item @code{STARPU_DISABLE_ASYNCHRONOUS_COPY}
  268. Disable asynchronous copies between CPU and GPU devices.
  269. The AMD implementation of OpenCL is known to
  270. fail when copying data asynchronously. When using this implementation,
  271. it is therefore necessary to disable asynchronous data transfers.
  272. @item @code{STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY}
  273. Disable asynchronous copies between CPU and CUDA devices.
  274. @item @code{STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY}
  275. Disable asynchronous copies between CPU and OpenCL devices.
  276. The AMD implementation of OpenCL is known to
  277. fail when copying data asynchronously. When using this implementation,
  278. it is therefore necessary to disable asynchronous data transfers.
  279. @item @code{STARPU_DISABLE_CUDA_GPU_GPU_DIRECT}
  280. Disable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAM
  281. instead. This permits to test the performance effect of GPU-Direct.
  282. @end table
  283. @node Scheduling
  284. @subsection Configuring the Scheduling engine
  285. @table @code
  286. @item @code{STARPU_SCHED}
  287. Choose between the different scheduling policies proposed by StarPU: work
  288. random, stealing, greedy, with performance models, etc.
  289. Use @code{STARPU_SCHED=help} to get the list of available schedulers.
  290. @item @code{STARPU_CALIBRATE}
  291. If this variable is set to 1, the performance models are calibrated during
  292. the execution. If it is set to 2, the previous values are dropped to restart
  293. calibration from scratch. Setting this variable to 0 disable calibration, this
  294. is the default behaviour.
  295. Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
  296. @item @code{STARPU_BUS_CALIBRATE}
  297. If this variable is set to 1, the bus is recalibrated during intialization.
  298. @item @code{STARPU_PREFETCH}
  299. @anchor{STARPU_PREFETCH}
  300. This variable indicates whether data prefetching should be enabled (0 means
  301. that it is disabled). If prefetching is enabled, when a task is scheduled to be
  302. executed e.g. on a GPU, StarPU will request an asynchronous transfer in
  303. advance, so that data is already present on the GPU when the task starts. As a
  304. result, computation and data transfers are overlapped.
  305. Note that prefetching is enabled by default in StarPU.
  306. @item @code{STARPU_SCHED_ALPHA}
  307. To estimate the cost of a task StarPU takes into account the estimated
  308. computation time (obtained thanks to performance models). The alpha factor is
  309. the coefficient to be applied to it before adding it to the communication part.
  310. @item @code{STARPU_SCHED_BETA}
  311. To estimate the cost of a task StarPU takes into account the estimated
  312. data transfer time (obtained thanks to performance models). The beta factor is
  313. the coefficient to be applied to it before adding it to the computation part.
  314. @end table
  315. @node Extensions
  316. @subsection Extensions
  317. @table @code
  318. @item @code{SOCL_OCL_LIB_OPENCL}
  319. THE SOCL test suite is only run when the environment variable
  320. @code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the location
  321. of the libOpenCL.so file of the OCL ICD implementation.
  322. @item @code{STARPU_COMM_STATS}
  323. Communication statistics for starpumpi (@pxref{StarPU MPI support})
  324. will be enabled when the environment variable @code{STARPU_COMM_STATS}
  325. is defined. The statistics can also be enabled by configuring StarPU
  326. with the option @code{--enable-comm-stats} (@pxref{enable-comm-stats}).
  327. @end table
  328. @node Misc
  329. @subsection Miscellaneous and debug
  330. @table @code
  331. @item @code{STARPU_SILENT}
  332. This variable allows to disable verbose mode at runtime when StarPU
  333. has been configured with the option @code{--enable-verbose}.
  334. @item @code{STARPU_LOGFILENAME}
  335. This variable specifies in which file the debugging output should be saved to.
  336. @item @code{STARPU_FXT_PREFIX}
  337. This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
  338. @item @code{STARPU_LIMIT_GPU_MEM}
  339. This variable specifies the maximum number of megabytes that should be
  340. available to the application on each GPUs. In case this value is smaller than
  341. the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
  342. on the device. This variable is intended to be used for experimental purposes
  343. as it emulates devices that have a limited amount of memory.
  344. @item @code{STARPU_GENERATE_TRACE}
  345. When set to 1, this variable indicates that StarPU should automatically
  346. generate a Paje trace when starpu_shutdown is called.
  347. @end table