configuration.texi 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @node Configuring StarPU
  8. @chapter Configuring StarPU
  9. @menu
  10. * Compilation configuration::
  11. * Execution configuration through environment variables::
  12. @end menu
  13. @node Compilation configuration
  14. @section Compilation configuration
  15. The following arguments can be given to the @code{configure} script.
  16. @menu
  17. * Common configuration::
  18. * Configuring workers::
  19. * Advanced configuration::
  20. @end menu
  21. @node Common configuration
  22. @subsection Common configuration
  23. @menu
  24. * --enable-debug::
  25. * --enable-fast::
  26. * --enable-verbose::
  27. * --enable-coverage::
  28. @end menu
  29. @node --enable-debug
  30. @subsubsection @code{--enable-debug}
  31. Enable debugging messages.
  32. @node --enable-fast
  33. @subsubsection @code{--enable-fast}
  34. Do not enforce assertions, saves a lot of time spent to compute them otherwise.
  35. @node --enable-verbose
  36. @subsubsection @code{--enable-verbose}
  37. Augment the verbosity of the debugging messages. This can be disabled
  38. at runtime by setting the environment variable @code{STARPU_SILENT} to
  39. any value.
  40. @smallexample
  41. % STARPU_SILENT=1 ./vector_scal
  42. @end smallexample
  43. @node --enable-coverage
  44. @subsubsection @code{--enable-coverage}
  45. Enable flags for the @code{gcov} coverage tool.
  46. @node Configuring workers
  47. @subsection Configuring workers
  48. @menu
  49. * --enable-maxcpus::
  50. * --disable-cpu::
  51. * --enable-maxcudadev::
  52. * --disable-cuda::
  53. * --with-cuda-dir::
  54. * --with-cuda-include-dir::
  55. * --with-cuda-lib-dir::
  56. * --disable-cuda-memcpy-peer::
  57. * --enable-maxopencldev::
  58. * --disable-opencl::
  59. * --with-opencl-dir::
  60. * --with-opencl-include-dir::
  61. * --with-opencl-lib-dir::
  62. * --enable-gordon::
  63. * --with-gordon-dir::
  64. * --enable-maximplementations::
  65. @end menu
  66. @node --enable-maxcpus
  67. @subsubsection @code{--enable-maxcpus=<number>}
  68. Define the maximum number of CPU cores that StarPU will support, then
  69. available as the @code{STARPU_MAXCPUS} macro.
  70. @node --disable-cpu
  71. @subsubsection @code{--disable-cpu}
  72. Disable the use of CPUs of the machine. Only GPUs etc. will be used.
  73. @node --enable-maxcudadev
  74. @subsubsection @code{--enable-maxcudadev=<number>}
  75. Define the maximum number of CUDA devices that StarPU will support, then
  76. available as the @code{STARPU_MAXCUDADEVS} macro.
  77. @node --disable-cuda
  78. @subsubsection @code{--disable-cuda}
  79. Disable the use of CUDA, even if a valid CUDA installation was detected.
  80. @node --with-cuda-dir
  81. @subsubsection @code{--with-cuda-dir=<path>}
  82. Specify the directory where CUDA is installed. This directory should notably contain
  83. @code{include/cuda.h}.
  84. @node --with-cuda-include-dir
  85. @subsubsection @code{--with-cuda-include-dir=<path>}
  86. Specify the directory where CUDA headers are installed. This directory should
  87. notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
  88. value given to @code{--with-cuda-dir}.
  89. @node --with-cuda-lib-dir
  90. @subsubsection @code{--with-cuda-lib-dir=<path>}
  91. Specify the directory where the CUDA library is installed. This directory should
  92. notably contain the CUDA shared libraries (e.g. libcuda.so). This defaults to
  93. @code{/lib} appended to the value given to @code{--with-cuda-dir}.
  94. @node --disable-cuda-memcpy-peer
  95. @subsubsection @code{--disable-cuda-memcpy-peer}
  96. Explicitely disable peer transfers when using CUDA 4.0
  97. @node --enable-maxopencldev
  98. @subsubsection @code{--enable-maxopencldev=<number>}
  99. Define the maximum number of OpenCL devices that StarPU will support, then
  100. available as the @code{STARPU_MAXOPENCLDEVS} macro.
  101. @node --disable-opencl
  102. @subsubsection @code{--disable-opencl}
  103. Disable the use of OpenCL, even if the SDK is detected.
  104. @node --with-opencl-dir
  105. @subsubsection @code{--with-opencl-dir=<path>}
  106. Specify the location of the OpenCL SDK. This directory should notably contain
  107. @code{include/CL/cl.h} (or @code{include/OpenCL/cl.h} on Mac OS).
  108. @node --with-opencl-include-dir
  109. @subsubsection @code{--with-opencl-include-dir=<path>}
  110. Specify the location of OpenCL headers. This directory should notably contain
  111. @code{CL/cl.h} (or @code{OpenCL/cl.h} on Mac OS). This defaults to
  112. @code{/include} appended to the value given to @code{--with-opencl-dir}.
  113. @node --with-opencl-lib-dir
  114. @subsubsection @code{--with-opencl-lib-dir=<path>}
  115. Specify the location of the OpenCL library. This directory should notably
  116. contain the OpenCL shared libraries (e.g. libOpenCL.so). This defaults to
  117. @code{/lib} appended to the value given to @code{--with-opencl-dir}.
  118. @node --enable-gordon
  119. @subsubsection @code{--enable-gordon}
  120. Enable the use of the Gordon runtime for Cell SPUs.
  121. @c TODO: rather default to enabled when detected
  122. @node --with-gordon-dir
  123. @subsubsection @code{--with-gordon-dir=<path>}
  124. Specify the location of the Gordon SDK.
  125. @node --enable-maximplementations
  126. @subsubsection @code{--enable-maximplementations=<number>}
  127. Define the number of implementations that can be defined for a single kind of
  128. device. It is then available as the @code{STARPU_MAXIMPLEMENTATIONS} macro.
  129. @node Advanced configuration
  130. @subsection Advanced configuration
  131. @menu
  132. * --enable-perf-debug::
  133. * --enable-model-debug::
  134. * --enable-stats::
  135. * --enable-maxbuffers::
  136. * --enable-allocation-cache::
  137. * --enable-opengl-render::
  138. * --enable-blas-lib::
  139. * --with-magma::
  140. * --with-fxt::
  141. * --with-perf-model-dir::
  142. * --with-mpicc::
  143. * --with-goto-dir::
  144. * --with-atlas-dir::
  145. * --with-mkl-cflags::
  146. * --with-mkl-ldflags::
  147. @end menu
  148. @node --enable-perf-debug
  149. @subsubsection @code{--enable-perf-debug}
  150. Enable performance debugging through gprof.
  151. @node --enable-model-debug
  152. @subsubsection @code{--enable-model-debug}
  153. Enable performance model debugging.
  154. @node --enable-stats
  155. @subsubsection @code{--enable-stats}
  156. Enable statistics.
  157. @node --enable-maxbuffers
  158. @subsubsection @code{--enable-maxbuffers=<nbuffers>}
  159. Define the maximum number of buffers that tasks will be able to take
  160. as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
  161. @node --enable-allocation-cache
  162. @subsubsection @code{--enable-allocation-cache}
  163. Enable the use of a data allocation cache to avoid the cost of it with
  164. CUDA. Still experimental.
  165. @node --enable-opengl-render
  166. @subsubsection @code{--enable-opengl-render}
  167. Enable the use of OpenGL for the rendering of some examples.
  168. @c TODO: rather default to enabled when detected
  169. @node --enable-blas-lib
  170. @subsubsection @code{--enable-blas-lib=<name>}
  171. Specify the blas library to be used by some of the examples. The
  172. library has to be 'atlas' or 'goto'.
  173. @node --with-magma
  174. @subsubsection @code{--with-magma=<path>}
  175. Specify where magma is installed. This directory should notably contain
  176. @code{include/magmablas.h}.
  177. @node --with-fxt
  178. @subsubsection @code{--with-fxt=<path>}
  179. Specify the location of FxT (for generating traces and rendering them
  180. using ViTE). This directory should notably contain
  181. @code{include/fxt/fxt.h}.
  182. @c TODO add ref to other section
  183. @node --with-perf-model-dir
  184. @subsubsection @code{--with-perf-model-dir=<dir>}
  185. Specify where performance models should be stored (instead of defaulting to the
  186. current user's home).
  187. @node --with-mpicc
  188. @subsubsection @code{--with-mpicc=<path to mpicc>}
  189. Specify the location of the @code{mpicc} compiler to be used for starpumpi.
  190. @node --with-goto-dir
  191. @subsubsection @code{--with-goto-dir=<dir>}
  192. Specify the location of GotoBLAS.
  193. @node --with-atlas-dir
  194. @subsubsection @code{--with-atlas-dir=<dir>}
  195. Specify the location of ATLAS. This directory should notably contain
  196. @code{include/cblas.h}.
  197. @node --with-mkl-cflags
  198. @subsubsection @code{--with-mkl-cflags=<cflags>}
  199. Specify the compilation flags for the MKL Library.
  200. @node --with-mkl-ldflags
  201. @subsubsection @code{--with-mkl-ldflags=<ldflags>}
  202. Specify the linking flags for the MKL Library. Note that the
  203. @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/}
  204. website provides a script to determine the linking flags.
  205. @node Execution configuration through environment variables
  206. @section Execution configuration through environment variables
  207. @menu
  208. * Workers:: Configuring workers
  209. * Scheduling:: Configuring the Scheduling engine
  210. * Misc:: Miscellaneous and debug
  211. @end menu
  212. Note: the values given in @code{starpu_conf} structure passed when
  213. calling @code{starpu_init} will override the values of the environment
  214. variables.
  215. @node Workers
  216. @subsection Configuring workers
  217. @menu
  218. * STARPU_NCPUS:: Number of CPU workers
  219. * STARPU_NCUDA:: Number of CUDA workers
  220. * STARPU_NOPENCL:: Number of OpenCL workers
  221. * STARPU_NGORDON:: Number of SPU workers (Cell)
  222. * STARPU_WORKERS_CPUID:: Bind workers to specific CPUs
  223. * STARPU_WORKERS_CUDAID:: Select specific CUDA devices
  224. * STARPU_WORKERS_OPENCLID:: Select specific OpenCL devices
  225. @end menu
  226. @node STARPU_NCPUS
  227. @subsubsection @code{STARPU_NCPUS} -- Number of CPU workers
  228. Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
  229. more CPU workers than there are physical CPUs, and that some CPUs are used to control
  230. the accelerators.
  231. @node STARPU_NCUDA
  232. @subsubsection @code{STARPU_NCUDA} -- Number of CUDA workers
  233. Specify the number of CUDA devices that StarPU can use. If
  234. @code{STARPU_NCUDA} is lower than the number of physical devices, it is
  235. possible to select which CUDA devices should be used by the means of the
  236. @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
  237. create as many CUDA workers as there are CUDA devices.
  238. @node STARPU_NOPENCL
  239. @subsubsection @code{STARPU_NOPENCL} -- Number of OpenCL workers
  240. OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
  241. @node STARPU_NGORDON
  242. @subsubsection @code{STARPU_NGORDON} -- Number of SPU workers (Cell)
  243. Specify the number of SPUs that StarPU can use.
  244. @node STARPU_WORKERS_CPUID
  245. @subsubsection @code{STARPU_WORKERS_CPUID} -- Bind workers to specific CPUs
  246. Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
  247. specifies on which logical CPU the different workers should be
  248. bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
  249. worker will be bound to logical CPU #0, the second CPU worker will be bound to
  250. logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
  251. determined by the OS, or provided by the @code{hwloc} library in case it is
  252. available.
  253. Note that the first workers correspond to the CUDA workers, then come the
  254. OpenCL and the SPU, and finally the CPU workers. For example if
  255. we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPUS=2}
  256. and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
  257. by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
  258. the logical CPUs #1 and #3 will be used by the CPU workers.
  259. If the number of workers is larger than the array given in
  260. @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
  261. round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
  262. third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
  263. This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
  264. @code{starpu_conf} structure passed to @code{starpu_init} is set.
  265. @node STARPU_WORKERS_CUDAID
  266. @subsubsection @code{STARPU_WORKERS_CUDAID} -- Select specific CUDA devices
  267. Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
  268. possible to select which CUDA devices should be used by StarPU. On a machine
  269. equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
  270. @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
  271. they should use CUDA devices #1 and #3 (the logical ordering of the devices is
  272. the one reported by CUDA).
  273. This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
  274. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  275. @node STARPU_WORKERS_OPENCLID
  276. @subsubsection @code{STARPU_WORKERS_OPENCLID} -- Select specific OpenCL devices
  277. OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
  278. This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
  279. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  280. @node Scheduling
  281. @subsection Configuring the Scheduling engine
  282. @menu
  283. * STARPU_SCHED:: Scheduling policy
  284. * STARPU_CALIBRATE:: Calibrate performance models
  285. * STARPU_PREFETCH:: Use data prefetch
  286. * STARPU_SCHED_ALPHA:: Computation factor
  287. * STARPU_SCHED_BETA:: Communication factor
  288. @end menu
  289. @node STARPU_SCHED
  290. @subsubsection @code{STARPU_SCHED} -- Scheduling policy
  291. Choose between the different scheduling policies proposed by StarPU: work
  292. random, stealing, greedy, with performance models, etc.
  293. Use @code{STARPU_SCHED=help} to get the list of available schedulers.
  294. @node STARPU_CALIBRATE
  295. @subsubsection @code{STARPU_CALIBRATE} -- Calibrate performance models
  296. If this variable is set to 1, the performance models are calibrated during
  297. the execution. If it is set to 2, the previous values are dropped to restart
  298. calibration from scratch. Setting this variable to 0 disable calibration, this
  299. is the default behaviour.
  300. Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
  301. @node STARPU_PREFETCH
  302. @subsubsection @code{STARPU_PREFETCH} -- Use data prefetch
  303. This variable indicates whether data prefetching should be enabled (0 means
  304. that it is disabled). If prefetching is enabled, when a task is scheduled to be
  305. executed e.g. on a GPU, StarPU will request an asynchronous transfer in
  306. advance, so that data is already present on the GPU when the task starts. As a
  307. result, computation and data transfers are overlapped.
  308. Note that prefetching is enabled by default in StarPU.
  309. @node STARPU_SCHED_ALPHA
  310. @subsubsection @code{STARPU_SCHED_ALPHA} -- Computation factor
  311. To estimate the cost of a task StarPU takes into account the estimated
  312. computation time (obtained thanks to performance models). The alpha factor is
  313. the coefficient to be applied to it before adding it to the communication part.
  314. @node STARPU_SCHED_BETA
  315. @subsubsection @code{STARPU_SCHED_BETA} -- Communication factor
  316. To estimate the cost of a task StarPU takes into account the estimated
  317. data transfer time (obtained thanks to performance models). The beta factor is
  318. the coefficient to be applied to it before adding it to the computation part.
  319. @node Misc
  320. @subsection Miscellaneous and debug
  321. @menu
  322. * STARPU_SILENT:: Disable verbose mode
  323. * STARPU_LOGFILENAME:: Select debug file name
  324. * STARPU_FXT_PREFIX:: FxT trace location
  325. * STARPU_LIMIT_GPU_MEM:: Restrict memory size on the GPUs
  326. * STARPU_GENERATE_TRACE:: Generate a Paje trace when StarPU is shut down
  327. @end menu
  328. @node STARPU_SILENT
  329. @subsubsection @code{STARPU_SILENT} -- Disable verbose mode
  330. This variable allows to disable verbose mode at runtime when StarPU
  331. has been configured with the option @code{--enable-verbose}.
  332. @node STARPU_LOGFILENAME
  333. @subsubsection @code{STARPU_LOGFILENAME} -- Select debug file name
  334. This variable specifies in which file the debugging output should be saved to.
  335. @node STARPU_FXT_PREFIX
  336. @subsubsection @code{STARPU_FXT_PREFIX} -- FxT trace location
  337. This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
  338. @node STARPU_LIMIT_GPU_MEM
  339. @subsubsection @code{STARPU_LIMIT_GPU_MEM} -- Restrict memory size on the GPUs
  340. This variable specifies the maximum number of megabytes that should be
  341. available to the application on each GPUs. In case this value is smaller than
  342. the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
  343. on the device. This variable is intended to be used for experimental purposes
  344. as it emulates devices that have a limited amount of memory.
  345. @node STARPU_GENERATE_TRACE
  346. @subsubsection @code{STARPU_GENERATE_TRACE} -- Generate a Paje trace when StarPU is shut down
  347. When set to 1, this variable indicates that StarPU should automatically
  348. generate a Paje trace when starpu_shutdown is called.