configuration.texi 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Compilation configuration::
  9. * Execution configuration through environment variables::
  10. @end menu
  11. @node Compilation configuration
  12. @section Compilation configuration
  13. The following arguments can be given to the @code{configure} script.
  14. @menu
  15. * Common configuration::
  16. * Configuring workers::
  17. * Advanced configuration::
  18. @end menu
  19. @node Common configuration
  20. @subsection Common configuration
  21. @table @code
  22. @item --enable-debug
  23. Enable debugging messages.
  24. @item --enable-fast
  25. Disable assertion checks, which saves computation time.
  26. @item --enable-verbose
  27. Increase the verbosity of the debugging messages. This can be disabled
  28. at runtime by setting the environment variable @code{STARPU_SILENT} to
  29. any value.
  30. @smallexample
  31. % STARPU_SILENT=1 ./vector_scal
  32. @end smallexample
  33. @item --enable-coverage
  34. Enable flags for the @code{gcov} coverage tool.
  35. @end table
  36. @node Configuring workers
  37. @subsection Configuring workers
  38. @menu
  39. * --enable-maxcpus::
  40. * --disable-cpu::
  41. * --enable-maxcudadev::
  42. * --disable-cuda::
  43. * --with-cuda-dir::
  44. * --with-cuda-include-dir::
  45. * --with-cuda-lib-dir::
  46. * --disable-cuda-memcpy-peer::
  47. * --enable-maxopencldev::
  48. * --disable-opencl::
  49. * --with-opencl-dir::
  50. * --with-opencl-include-dir::
  51. * --with-opencl-lib-dir::
  52. * --enable-gordon::
  53. * --with-gordon-dir::
  54. * --enable-maximplementations::
  55. @end menu
  56. @node --enable-maxcpus
  57. @subsubsection @code{--enable-maxcpus=<number>}
  58. Define the maximum number of CPU cores that StarPU will support, then
  59. available as the @code{STARPU_MAXCPUS} macro.
  60. @node --disable-cpu
  61. @subsubsection @code{--disable-cpu}
  62. Disable the use of CPUs of the machine. Only GPUs etc. will be used.
  63. @node --enable-maxcudadev
  64. @subsubsection @code{--enable-maxcudadev=<number>}
  65. Define the maximum number of CUDA devices that StarPU will support, then
  66. available as the @code{STARPU_MAXCUDADEVS} macro.
  67. @node --disable-cuda
  68. @subsubsection @code{--disable-cuda}
  69. Disable the use of CUDA, even if a valid CUDA installation was detected.
  70. @node --with-cuda-dir
  71. @subsubsection @code{--with-cuda-dir=<path>}
  72. Specify the directory where CUDA is installed. This directory should notably contain
  73. @code{include/cuda.h}.
  74. @node --with-cuda-include-dir
  75. @subsubsection @code{--with-cuda-include-dir=<path>}
  76. Specify the directory where CUDA headers are installed. This directory should
  77. notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
  78. value given to @code{--with-cuda-dir}.
  79. @node --with-cuda-lib-dir
  80. @subsubsection @code{--with-cuda-lib-dir=<path>}
  81. Specify the directory where the CUDA library is installed. This directory should
  82. notably contain the CUDA shared libraries (e.g. libcuda.so). This defaults to
  83. @code{/lib} appended to the value given to @code{--with-cuda-dir}.
  84. @node --disable-cuda-memcpy-peer
  85. @subsubsection @code{--disable-cuda-memcpy-peer}
  86. Explicitely disable peer transfers when using CUDA 4.0
  87. @node --enable-maxopencldev
  88. @subsubsection @code{--enable-maxopencldev=<number>}
  89. Define the maximum number of OpenCL devices that StarPU will support, then
  90. available as the @code{STARPU_MAXOPENCLDEVS} macro.
  91. @node --disable-opencl
  92. @subsubsection @code{--disable-opencl}
  93. Disable the use of OpenCL, even if the SDK is detected.
  94. @node --with-opencl-dir
  95. @subsubsection @code{--with-opencl-dir=<path>}
  96. Specify the location of the OpenCL SDK. This directory should notably contain
  97. @code{include/CL/cl.h} (or @code{include/OpenCL/cl.h} on Mac OS).
  98. @node --with-opencl-include-dir
  99. @subsubsection @code{--with-opencl-include-dir=<path>}
  100. Specify the location of OpenCL headers. This directory should notably contain
  101. @code{CL/cl.h} (or @code{OpenCL/cl.h} on Mac OS). This defaults to
  102. @code{/include} appended to the value given to @code{--with-opencl-dir}.
  103. @node --with-opencl-lib-dir
  104. @subsubsection @code{--with-opencl-lib-dir=<path>}
  105. Specify the location of the OpenCL library. This directory should notably
  106. contain the OpenCL shared libraries (e.g. libOpenCL.so). This defaults to
  107. @code{/lib} appended to the value given to @code{--with-opencl-dir}.
  108. @node --enable-gordon
  109. @subsubsection @code{--enable-gordon}
  110. Enable the use of the Gordon runtime for Cell SPUs.
  111. @c TODO: rather default to enabled when detected
  112. @node --with-gordon-dir
  113. @subsubsection @code{--with-gordon-dir=<path>}
  114. Specify the location of the Gordon SDK.
  115. @node --enable-maximplementations
  116. @subsubsection @code{--enable-maximplementations=<number>}
  117. Define the number of implementations that can be defined for a single kind of
  118. device. It is then available as the @code{STARPU_MAXIMPLEMENTATIONS} macro.
  119. @node Advanced configuration
  120. @subsection Advanced configuration
  121. @table @code
  122. @item --enable-perf-debug
  123. Enable performance debugging through gprof.
  124. @item --enable-model-debug
  125. Enable performance model debugging.
  126. @item --enable-stats
  127. @c see ../../src/datawizard/datastats.c
  128. Enable gathering of memory transfer statistics.
  129. @item --enable-maxbuffers
  130. Define the maximum number of buffers that tasks will be able to take
  131. as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
  132. @item --enable-allocation-cache
  133. Enable the use of a data allocation cache to avoid the cost of it with
  134. CUDA. Still experimental.
  135. @item --enable-opengl-render
  136. Enable the use of OpenGL for the rendering of some examples.
  137. @c TODO: rather default to enabled when detected
  138. @item --enable-blas-lib
  139. Specify the blas library to be used by some of the examples. The
  140. library has to be 'atlas' or 'goto'.
  141. @item --disable-starpufft
  142. Disable the build of libstarpufft, even if fftw or cuFFT is available.
  143. @item --with-magma=@var{prefix}
  144. Search for MAGMA under @var{prefix}. @var{prefix} should notably
  145. contain @file{include/magmablas.h}.
  146. @item --with-fxt=@var{prefix}
  147. Search for FxT under @var{prefix}.
  148. @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
  149. traces of scheduling events, which can then be rendered them using ViTE
  150. (@pxref{Off-line, off-line performance feedback}). @var{prefix} should
  151. notably contain @code{include/fxt/fxt.h}.
  152. @item --with-perf-model-dir=@var{dir}
  153. Store performance models under @var{dir}, instead of the current user's
  154. home.
  155. @item --with-mpicc=@var{path}
  156. Use the @command{mpicc} compiler at @var{path}, for starpumpi
  157. (@pxref{StarPU MPI support}).
  158. @item --with-goto-dir=@var{prefix}
  159. Search for GotoBLAS under @var{prefix}.
  160. @item --with-atlas-dir=@var{prefix}
  161. Search for ATLAS under @var{prefix}, which should notably contain
  162. @file{include/cblas.h}.
  163. @item --with-mkl-cflags=@var{cflags}
  164. Use @var{cflags} to compile code that uses the MKL library.
  165. @item --with-mkl-ldflags=@var{ldflags}
  166. Use @var{ldflags} when linking code that uses the MKL library. Note
  167. that the
  168. @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
  169. MKL website} provides a script to determine the linking flags.
  170. @item --disable-gcc-extensions
  171. Disable the GCC plug-in (@pxref{C Extensions}). By default, it is
  172. enabled when the GCC compiler provides a plug-in support.
  173. @item --disable-socl
  174. Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}). By
  175. default, it is enabled when an OpenCL implementation is found.
  176. @item --disable-starpu-top
  177. Disable the StarPU-Top interface (@pxref{starpu-top}). By default, it
  178. is enabled when the required dependencies are found.
  179. @end table
  180. @node Execution configuration through environment variables
  181. @section Execution configuration through environment variables
  182. @menu
  183. * Workers:: Configuring workers
  184. * Scheduling:: Configuring the Scheduling engine
  185. * Misc:: Miscellaneous and debug
  186. @end menu
  187. Note: the values given in @code{starpu_conf} structure passed when
  188. calling @code{starpu_init} will override the values of the environment
  189. variables.
  190. @node Workers
  191. @subsection Configuring workers
  192. @menu
  193. * STARPU_NCPUS:: Number of CPU workers
  194. * STARPU_NCUDA:: Number of CUDA workers
  195. * STARPU_NOPENCL:: Number of OpenCL workers
  196. * STARPU_NGORDON:: Number of SPU workers (Cell)
  197. * STARPU_WORKERS_CPUID:: Bind workers to specific CPUs
  198. * STARPU_WORKERS_CUDAID:: Select specific CUDA devices
  199. * STARPU_WORKERS_OPENCLID:: Select specific OpenCL devices
  200. @end menu
  201. @node STARPU_NCPUS
  202. @subsubsection @code{STARPU_NCPUS} -- Number of CPU workers
  203. Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
  204. more CPU workers than there are physical CPUs, and that some CPUs are used to control
  205. the accelerators.
  206. @node STARPU_NCUDA
  207. @subsubsection @code{STARPU_NCUDA} -- Number of CUDA workers
  208. Specify the number of CUDA devices that StarPU can use. If
  209. @code{STARPU_NCUDA} is lower than the number of physical devices, it is
  210. possible to select which CUDA devices should be used by the means of the
  211. @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
  212. create as many CUDA workers as there are CUDA devices.
  213. @node STARPU_NOPENCL
  214. @subsubsection @code{STARPU_NOPENCL} -- Number of OpenCL workers
  215. OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
  216. @node STARPU_NGORDON
  217. @subsubsection @code{STARPU_NGORDON} -- Number of SPU workers (Cell)
  218. Specify the number of SPUs that StarPU can use.
  219. @node STARPU_WORKERS_CPUID
  220. @subsubsection @code{STARPU_WORKERS_CPUID} -- Bind workers to specific CPUs
  221. Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
  222. specifies on which logical CPU the different workers should be
  223. bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
  224. worker will be bound to logical CPU #0, the second CPU worker will be bound to
  225. logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
  226. determined by the OS, or provided by the @code{hwloc} library in case it is
  227. available.
  228. Note that the first workers correspond to the CUDA workers, then come the
  229. OpenCL and the SPU, and finally the CPU workers. For example if
  230. we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPUS=2}
  231. and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
  232. by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
  233. the logical CPUs #1 and #3 will be used by the CPU workers.
  234. If the number of workers is larger than the array given in
  235. @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
  236. round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
  237. third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
  238. This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
  239. @code{starpu_conf} structure passed to @code{starpu_init} is set.
  240. @node STARPU_WORKERS_CUDAID
  241. @subsubsection @code{STARPU_WORKERS_CUDAID} -- Select specific CUDA devices
  242. Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
  243. possible to select which CUDA devices should be used by StarPU. On a machine
  244. equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
  245. @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
  246. they should use CUDA devices #1 and #3 (the logical ordering of the devices is
  247. the one reported by CUDA).
  248. This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
  249. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  250. @node STARPU_WORKERS_OPENCLID
  251. @subsubsection @code{STARPU_WORKERS_OPENCLID} -- Select specific OpenCL devices
  252. OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
  253. This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
  254. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  255. @node Scheduling
  256. @subsection Configuring the Scheduling engine
  257. @menu
  258. * STARPU_SCHED:: Scheduling policy
  259. * STARPU_CALIBRATE:: Calibrate performance models
  260. * STARPU_PREFETCH:: Use data prefetch
  261. * STARPU_SCHED_ALPHA:: Computation factor
  262. * STARPU_SCHED_BETA:: Communication factor
  263. @end menu
  264. @node STARPU_SCHED
  265. @subsubsection @code{STARPU_SCHED} -- Scheduling policy
  266. Choose between the different scheduling policies proposed by StarPU: work
  267. random, stealing, greedy, with performance models, etc.
  268. Use @code{STARPU_SCHED=help} to get the list of available schedulers.
  269. @node STARPU_CALIBRATE
  270. @subsubsection @code{STARPU_CALIBRATE} -- Calibrate performance models
  271. If this variable is set to 1, the performance models are calibrated during
  272. the execution. If it is set to 2, the previous values are dropped to restart
  273. calibration from scratch. Setting this variable to 0 disable calibration, this
  274. is the default behaviour.
  275. Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
  276. @node STARPU_PREFETCH
  277. @subsubsection @code{STARPU_PREFETCH} -- Use data prefetch
  278. This variable indicates whether data prefetching should be enabled (0 means
  279. that it is disabled). If prefetching is enabled, when a task is scheduled to be
  280. executed e.g. on a GPU, StarPU will request an asynchronous transfer in
  281. advance, so that data is already present on the GPU when the task starts. As a
  282. result, computation and data transfers are overlapped.
  283. Note that prefetching is enabled by default in StarPU.
  284. @node STARPU_SCHED_ALPHA
  285. @subsubsection @code{STARPU_SCHED_ALPHA} -- Computation factor
  286. To estimate the cost of a task StarPU takes into account the estimated
  287. computation time (obtained thanks to performance models). The alpha factor is
  288. the coefficient to be applied to it before adding it to the communication part.
  289. @node STARPU_SCHED_BETA
  290. @subsubsection @code{STARPU_SCHED_BETA} -- Communication factor
  291. To estimate the cost of a task StarPU takes into account the estimated
  292. data transfer time (obtained thanks to performance models). The beta factor is
  293. the coefficient to be applied to it before adding it to the computation part.
  294. @node Misc
  295. @subsection Miscellaneous and debug
  296. @menu
  297. * STARPU_SILENT:: Disable verbose mode
  298. * STARPU_LOGFILENAME:: Select debug file name
  299. * STARPU_FXT_PREFIX:: FxT trace location
  300. * STARPU_LIMIT_GPU_MEM:: Restrict memory size on the GPUs
  301. * STARPU_GENERATE_TRACE:: Generate a Paje trace when StarPU is shut down
  302. @end menu
  303. @node STARPU_SILENT
  304. @subsubsection @code{STARPU_SILENT} -- Disable verbose mode
  305. This variable allows to disable verbose mode at runtime when StarPU
  306. has been configured with the option @code{--enable-verbose}.
  307. @node STARPU_LOGFILENAME
  308. @subsubsection @code{STARPU_LOGFILENAME} -- Select debug file name
  309. This variable specifies in which file the debugging output should be saved to.
  310. @node STARPU_FXT_PREFIX
  311. @subsubsection @code{STARPU_FXT_PREFIX} -- FxT trace location
  312. This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
  313. @node STARPU_LIMIT_GPU_MEM
  314. @subsubsection @code{STARPU_LIMIT_GPU_MEM} -- Restrict memory size on the GPUs
  315. This variable specifies the maximum number of megabytes that should be
  316. available to the application on each GPUs. In case this value is smaller than
  317. the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
  318. on the device. This variable is intended to be used for experimental purposes
  319. as it emulates devices that have a limited amount of memory.
  320. @node STARPU_GENERATE_TRACE
  321. @subsubsection @code{STARPU_GENERATE_TRACE} -- Generate a Paje trace when StarPU is shut down
  322. When set to 1, this variable indicates that StarPU should automatically
  323. generate a Paje trace when starpu_shutdown is called.