configuration.texi 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Compilation configuration::
  9. * Execution configuration through environment variables::
  10. @end menu
  11. @node Compilation configuration
  12. @section Compilation configuration
  13. The following arguments can be given to the @code{configure} script.
  14. @menu
  15. * Common configuration::
  16. * Configuring workers::
  17. * Extension configuration::
  18. * Advanced configuration::
  19. @end menu
  20. @node Common configuration
  21. @subsection Common configuration
  22. @defvr {Configure option} --enable-debug
  23. Enable debugging messages.
  24. @end defvr
  25. @defvr {Configure option} --enable-debug
  26. Enable debugging messages.
  27. @end defvr
  28. @defvr {Configure option} --enable-fast
  29. Disable assertion checks, which saves computation time.
  30. @end defvr
  31. @defvr {Configue option} --enable-verbose
  32. Increase the verbosity of the debugging messages. This can be disabled
  33. at runtime by setting the environment variable @code{STARPU_SILENT} to
  34. any value.
  35. @smallexample
  36. % STARPU_SILENT=1 ./vector_scal
  37. @end smallexample
  38. @end defvr
  39. @defvr {Configue option} --enable-coverage
  40. Enable flags for the @code{gcov} coverage tool.
  41. @end defvr
  42. @defvr {Configue option} --enable-quick-check
  43. Specify tests and examples should be run on a smaller data set, i.e
  44. allowing a faster execution time
  45. @end defvr
  46. @defvr {Configue option} --with-hwloc
  47. Specify hwloc should be used by StarPU. hwloc should be found by the
  48. means of the tools @code{pkg-config}.
  49. @end defvr
  50. @defvr {Configue option} --with-hwloc=@var{prefix}
  51. Specify hwloc should be used by StarPU. hwloc should be found in the
  52. directory specified by @var{prefix}.
  53. @end defvr
  54. @defvr {Configue option} --without-hwloc
  55. Specify hwloc should not be used by StarPU.
  56. @end defvr
  57. Additionally, the @command{configure} script recognize many variables, which
  58. can be listed by typing @code{./configure --help}. For example,
  59. @code{./configure NVCCFLAGS="-arch sm_13"} adds a flag for the compilation of
  60. CUDA kernels.
  61. @node Configuring workers
  62. @subsection Configuring workers
  63. @defvr {Configue option} --enable-maxcpus=@var{count}
  64. Use at most @var{count} CPU cores. This information is then
  65. available as the @code{STARPU_MAXCPUS} macro.
  66. @end defvr
  67. @defvr {Configue option} --disable-cpu
  68. Disable the use of CPUs of the machine. Only GPUs etc. will be used.
  69. @end defvr
  70. @defvr {Configue option} --enable-maxcudadev=@var{count}
  71. Use at most @var{count} CUDA devices. This information is then
  72. available as the @code{STARPU_MAXCUDADEVS} macro.
  73. @end defvr
  74. @defvr {Configue option} --disable-cuda
  75. Disable the use of CUDA, even if a valid CUDA installation was detected.
  76. @end defvr
  77. @defvr {Configue option} --with-cuda-dir=@var{prefix}
  78. Search for CUDA under @var{prefix}, which should notably contain
  79. @file{include/cuda.h}.
  80. @end defvr
  81. @defvr {Configue option} --with-cuda-include-dir=@var{dir}
  82. Search for CUDA headers under @var{dir}, which should
  83. notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
  84. value given to @code{--with-cuda-dir}.
  85. @end defvr
  86. @defvr {Configue option} --with-cuda-lib-dir=@var{dir}
  87. Search for CUDA libraries under @var{dir}, which should notably contain
  88. the CUDA shared libraries---e.g., @file{libcuda.so}. This defaults to
  89. @code{/lib} appended to the value given to @code{--with-cuda-dir}.
  90. @end defvr
  91. @defvr {Configue option} --disable-cuda-memcpy-peer
  92. Explicitly disable peer transfers when using CUDA 4.0.
  93. @end defvr
  94. @defvr {Configue option} --enable-maxopencldev=@var{count}
  95. Use at most @var{count} OpenCL devices. This information is then
  96. available as the @code{STARPU_MAXOPENCLDEVS} macro.
  97. @end defvr
  98. @defvr {Configue option} --disable-opencl
  99. Disable the use of OpenCL, even if the SDK is detected.
  100. @end defvr
  101. @defvr {Configue option} --with-opencl-dir=@var{prefix}
  102. Search for an OpenCL implementation under @var{prefix}, which should
  103. notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
  104. Mac OS).
  105. @end defvr
  106. @defvr {Configue option} --with-opencl-include-dir=@var{dir}
  107. Search for OpenCL headers under @var{dir}, which should notably contain
  108. @file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS). This defaults to
  109. @code{/include} appended to the value given to @code{--with-opencl-dir}.
  110. @end defvr
  111. @defvr {Configue option} --with-opencl-lib-dir=@var{dir}
  112. Search for an OpenCL library under @var{dir}, which should notably
  113. contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
  114. @code{/lib} appended to the value given to @code{--with-opencl-dir}.
  115. @end defvr
  116. @defvr {Configue option} --enable-maximplementations=@var{count}
  117. Allow for at most @var{count} codelet implementations for the same
  118. target device. This information is then available as the
  119. @code{STARPU_MAXIMPLEMENTATIONS} macro.
  120. @end defvr
  121. @defvr {Configue option} --disable-asynchronous-copy
  122. Disable asynchronous copies between CPU and GPU devices.
  123. The AMD implementation of OpenCL is known to
  124. fail when copying data asynchronously. When using this implementation,
  125. it is therefore necessary to disable asynchronous data transfers.
  126. @end defvr
  127. @defvr {Configue option} --disable-asynchronous-cuda-copy
  128. Disable asynchronous copies between CPU and CUDA devices.
  129. @end defvr
  130. @defvr {Configue option} --disable-asynchronous-opencl-copy
  131. Disable asynchronous copies between CPU and OpenCL devices.
  132. The AMD implementation of OpenCL is known to
  133. fail when copying data asynchronously. When using this implementation,
  134. it is therefore necessary to disable asynchronous data transfers.
  135. @end defvr
  136. @node Extension configuration
  137. @subsection Extension configuration
  138. @defvr {Configue option} --disable-socl
  139. Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}). By
  140. default, it is enabled when an OpenCL implementation is found.
  141. @end defvr
  142. @defvr {Configue option} --disable-starpu-top
  143. Disable the StarPU-Top interface (@pxref{StarPU-Top}). By default, it
  144. is enabled when the required dependencies are found.
  145. @end defvr
  146. @defvr {Configue option} --disable-gcc-extensions
  147. Disable the GCC plug-in (@pxref{C Extensions}). By default, it is
  148. enabled when the GCC compiler provides a plug-in support.
  149. @end defvr
  150. @defvr {Configue option} --with-mpicc=@var{path}
  151. Use the @command{mpicc} compiler at @var{path}, for starpumpi
  152. (@pxref{StarPU MPI support}).
  153. @end defvr
  154. @node Advanced configuration
  155. @subsection Advanced configuration
  156. @defvr {Configue option} --enable-perf-debug
  157. Enable performance debugging through gprof.
  158. @end defvr
  159. @defvr {Configue option} --enable-model-debug
  160. Enable performance model debugging.
  161. @end defvr
  162. @defvr {Configue option} --enable-stats
  163. @c see ../../src/datawizard/datastats.c
  164. Enable gathering of memory transfer statistics.
  165. @end defvr
  166. @defvr {Configue option} --enable-maxbuffers
  167. Define the maximum number of buffers that tasks will be able to take
  168. as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
  169. @end defvr
  170. @defvr {Configue option} --enable-allocation-cache
  171. Enable the use of a data allocation cache to avoid the cost of it with
  172. CUDA. Still experimental.
  173. @end defvr
  174. @defvr {Configue option} --enable-opengl-render
  175. Enable the use of OpenGL for the rendering of some examples.
  176. @c TODO: rather default to enabled when detected
  177. @end defvr
  178. @defvr {Configue option} --enable-blas-lib
  179. Specify the blas library to be used by some of the examples. The
  180. library has to be 'atlas' or 'goto'.
  181. @end defvr
  182. @defvr {Configue option} --disable-starpufft
  183. Disable the build of libstarpufft, even if fftw or cuFFT is available.
  184. @end defvr
  185. @defvr {Configue option} --with-magma=@var{prefix}
  186. Search for MAGMA under @var{prefix}. @var{prefix} should notably
  187. contain @file{include/magmablas.h}.
  188. @end defvr
  189. @defvr {Configue option} --with-fxt=@var{prefix}
  190. Search for FxT under @var{prefix}.
  191. @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
  192. traces of scheduling events, which can then be rendered them using ViTE
  193. (@pxref{Off-line, off-line performance feedback}). @var{prefix} should
  194. notably contain @code{include/fxt/fxt.h}.
  195. @end defvr
  196. @defvr {Configue option} --with-perf-model-dir=@var{dir}
  197. Store performance models under @var{dir}, instead of the current user's
  198. home.
  199. @end defvr
  200. @defvr {Configue option} --with-goto-dir=@var{prefix}
  201. Search for GotoBLAS under @var{prefix}, which should notably contain @file{libgoto.so} or @file{libgoto2.so}.
  202. @end defvr
  203. @defvr {Configue option} --with-atlas-dir=@var{prefix}
  204. Search for ATLAS under @var{prefix}, which should notably contain
  205. @file{include/cblas.h}.
  206. @end defvr
  207. @defvr {Configue option} --with-mkl-cflags=@var{cflags}
  208. Use @var{cflags} to compile code that uses the MKL library.
  209. @end defvr
  210. @defvr {Configue option} --with-mkl-ldflags=@var{ldflags}
  211. Use @var{ldflags} when linking code that uses the MKL library. Note
  212. that the
  213. @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
  214. MKL website} provides a script to determine the linking flags.
  215. @end defvr
  216. @defvr {Configue option} --disable-build-examples
  217. Disable the build of examples.
  218. @end defvr
  219. @node Execution configuration through environment variables
  220. @section Execution configuration through environment variables
  221. @menu
  222. * Workers:: Configuring workers
  223. * Scheduling:: Configuring the Scheduling engine
  224. * Extensions::
  225. * Misc:: Miscellaneous and debug
  226. @end menu
  227. @node Workers
  228. @subsection Configuring workers
  229. @defvr {Environment variable} STARPU_NCPU
  230. Specify the number of CPU workers (thus not including workers dedicated to control accelerators). Note that by default, StarPU will not allocate
  231. more CPU workers than there are physical CPUs, and that some CPUs are used to control
  232. the accelerators.
  233. @end defvr
  234. @defvr {Environment variable} STARPU_NCUDA
  235. Specify the number of CUDA devices that StarPU can use. If
  236. @code{STARPU_NCUDA} is lower than the number of physical devices, it is
  237. possible to select which CUDA devices should be used by the means of the
  238. @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
  239. create as many CUDA workers as there are CUDA devices.
  240. @end defvr
  241. @defvr {Environment variable} STARPU_NOPENCL
  242. OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
  243. @end defvr
  244. @defvr {Environment variable} STARPU_WORKERS_NOBIND
  245. Setting it to non-zero will prevent StarPU from binding its threads to
  246. CPUs. This is for instance useful when running the testsuite in parallel.
  247. @end defvr
  248. @defvr {Environment variable} STARPU_WORKERS_CPUID
  249. Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
  250. specifies on which logical CPU the different workers should be
  251. bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
  252. worker will be bound to logical CPU #0, the second CPU worker will be bound to
  253. logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
  254. determined by the OS, or provided by the @code{hwloc} library in case it is
  255. available.
  256. Note that the first workers correspond to the CUDA workers, then come the
  257. OpenCL workers, and finally the CPU workers. For example if
  258. we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPU=2}
  259. and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
  260. by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
  261. the logical CPUs #1 and #3 will be used by the CPU workers.
  262. If the number of workers is larger than the array given in
  263. @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
  264. round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
  265. third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
  266. This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
  267. @code{starpu_conf} structure passed to @code{starpu_init} is set.
  268. @end defvr
  269. @defvr {Environment variable} STARPU_WORKERS_CUDAID
  270. Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
  271. possible to select which CUDA devices should be used by StarPU. On a machine
  272. equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
  273. @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
  274. they should use CUDA devices #1 and #3 (the logical ordering of the devices is
  275. the one reported by CUDA).
  276. This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
  277. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  278. @end defvr
  279. @defvr {Environment variable} STARPU_WORKERS_OPENCLID
  280. OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
  281. This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
  282. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  283. @end defvr
  284. @defvr {Environment variable} @code{STARPU_SINGLE_COMBINED_WORKER}
  285. If set, StarPU will create several workers which won't be able to work
  286. concurrently. It will create combined workers which size goes from 1 to the
  287. total number of CPU workers in the system.
  288. @end defvr
  289. @defvr {Environment variable} SYNTHESIZE_ARITY_COMBINED_WORKER
  290. Let the user decide how many elements are allowed between combined workers
  291. created from hwloc information. For instance, in the case of sockets with 6
  292. cores without shared L2 caches, if @code{SYNTHESIZE_ARITY_COMBINED_WORKER} is
  293. set to 6, no combined worker will be synthesized beyond one for the socket
  294. and one per core. If it is set to 3, 3 intermediate combined workers will be
  295. synthesized, to divide the socket cores into 3 chunks of 2 cores. If it set to
  296. 2, 2 intermediate combined workers will be synthesized, to divide the the socket
  297. cores into 2 chunks of 3 cores, and then 3 additional combined workers will be
  298. synthesized, to divide the former synthesized workers into a bunch of 2 cores,
  299. and the remaining core (for which no combined worker is synthesized since there
  300. is already a normal worker for it).
  301. The default, 2, thus makes StarPU tend to building a binary trees of combined
  302. workers.
  303. @end defvr
  304. @defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_COPY
  305. Disable asynchronous copies between CPU and GPU devices.
  306. The AMD implementation of OpenCL is known to
  307. fail when copying data asynchronously. When using this implementation,
  308. it is therefore necessary to disable asynchronous data transfers.
  309. @end defvr
  310. @defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_CUDA_COPY
  311. Disable asynchronous copies between CPU and CUDA devices.
  312. @end defvr
  313. @defvr {Environment variable} STARPU_DISABLE_ASYNCHRONOUS_OPENCL_COPY
  314. Disable asynchronous copies between CPU and OpenCL devices.
  315. The AMD implementation of OpenCL is known to
  316. fail when copying data asynchronously. When using this implementation,
  317. it is therefore necessary to disable asynchronous data transfers.
  318. @end defvr
  319. @defvr {Environment variable} STARPU_DISABLE_CUDA_GPU_GPU_DIRECT
  320. Disable direct CUDA transfers from GPU to GPU, and let CUDA copy through RAM
  321. instead. This permits to test the performance effect of GPU-Direct.
  322. @end defvr
  323. @node Scheduling
  324. @subsection Configuring the Scheduling engine
  325. @defvr {Environment variable} STARPU_SCHED
  326. Choose between the different scheduling policies proposed by StarPU: work
  327. random, stealing, greedy, with performance models, etc.
  328. Use @code{STARPU_SCHED=help} to get the list of available schedulers.
  329. @end defvr
  330. @defvr {Environment variable} STARPU_CALIBRATE
  331. If this variable is set to 1, the performance models are calibrated during
  332. the execution. If it is set to 2, the previous values are dropped to restart
  333. calibration from scratch. Setting this variable to 0 disable calibration, this
  334. is the default behaviour.
  335. Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
  336. @end defvr
  337. @defvr {Environment variable} STARPU_BUS_CALIBRATE
  338. If this variable is set to 1, the bus is recalibrated during intialization.
  339. @end defvr
  340. @defvr {Environment variable} STARPU_PREFETCH
  341. @anchor{STARPU_PREFETCH}
  342. This variable indicates whether data prefetching should be enabled (0 means
  343. that it is disabled). If prefetching is enabled, when a task is scheduled to be
  344. executed e.g. on a GPU, StarPU will request an asynchronous transfer in
  345. advance, so that data is already present on the GPU when the task starts. As a
  346. result, computation and data transfers are overlapped.
  347. Note that prefetching is enabled by default in StarPU.
  348. @end defvr
  349. @defvr {Environment variable} STARPU_SCHED_ALPHA
  350. To estimate the cost of a task StarPU takes into account the estimated
  351. computation time (obtained thanks to performance models). The alpha factor is
  352. the coefficient to be applied to it before adding it to the communication part.
  353. @end defvr
  354. @defvr {Environment variable} STARPU_SCHED_BETA
  355. To estimate the cost of a task StarPU takes into account the estimated
  356. data transfer time (obtained thanks to performance models). The beta factor is
  357. the coefficient to be applied to it before adding it to the computation part.
  358. @end defvr
  359. @node Extensions
  360. @subsection Extensions
  361. @defvr {Environment variable} SOCL_OCL_LIB_OPENCL
  362. THE SOCL test suite is only run when the environment variable
  363. @code{SOCL_OCL_LIB_OPENCL} is defined. It should contain the location
  364. of the libOpenCL.so file of the OCL ICD implementation.
  365. @end defvr
  366. @defvr {Environment variable} STARPU_COMM_STATS
  367. @anchor{STARPU_COMM_STATS}
  368. Communication statistics for starpumpi (@pxref{StarPU MPI support})
  369. will be enabled when the environment variable @code{STARPU_COMM_STATS}
  370. is defined to an value other than 0.
  371. @end defvr
  372. @defvr {Environment variable} STARPU_MPI_CACHE
  373. @anchor{STARPU_MPI_CACHE}
  374. Communication cache for starpumpi (@pxref{StarPU MPI support}) will be
  375. disabled when the environment variable @code{STARPU_MPI_CACHE} is set
  376. to 0. It is enabled by default or for any other values of the variable
  377. @code{STARPU_MPI_CACHE}.
  378. @end defvr
  379. @node Misc
  380. @subsection Miscellaneous and debug
  381. @defvr {Environment variable} STARPU_SILENT
  382. This variable allows to disable verbose mode at runtime when StarPU
  383. has been configured with the option @code{--enable-verbose}. It also
  384. disables the display of StarPU information and warning messages.
  385. @end defvr
  386. @defvr {Environment variable} STARPU_LOGFILENAME
  387. This variable specifies in which file the debugging output should be saved to.
  388. @end defvr
  389. @defvr {Environment variable} STARPU_FXT_PREFIX
  390. This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
  391. @end defvr
  392. @defvr {Environment variable} STARPU_LIMIT_GPU_MEM
  393. This variable specifies the maximum number of megabytes that should be
  394. available to the application on each GPUs. In case this value is smaller than
  395. the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
  396. on the device. This variable is intended to be used for experimental purposes
  397. as it emulates devices that have a limited amount of memory.
  398. @end defvr
  399. @defvr {Environment variable} STARPU_GENERATE_TRACE
  400. When set to 1, this variable indicates that StarPU should automatically
  401. generate a Paje trace when starpu_shutdown is called.
  402. @end defvr