configuration.texi 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Compilation configuration::
  9. * Execution configuration through environment variables::
  10. @end menu
  11. @node Compilation configuration
  12. @section Compilation configuration
  13. The following arguments can be given to the @code{configure} script.
  14. @menu
  15. * Common configuration::
  16. * Configuring workers::
  17. * Advanced configuration::
  18. @end menu
  19. @node Common configuration
  20. @subsection Common configuration
  21. @table @code
  22. @item --enable-debug
  23. Enable debugging messages.
  24. @item --enable-fast
  25. Disable assertion checks, which saves computation time.
  26. @item --enable-verbose
  27. Increase the verbosity of the debugging messages. This can be disabled
  28. at runtime by setting the environment variable @code{STARPU_SILENT} to
  29. any value.
  30. @smallexample
  31. % STARPU_SILENT=1 ./vector_scal
  32. @end smallexample
  33. @item --enable-coverage
  34. Enable flags for the @code{gcov} coverage tool.
  35. @end table
  36. @node Configuring workers
  37. @subsection Configuring workers
  38. @table @code
  39. @item --enable-maxcpus=@var{count}
  40. Use at most @var{count} CPU cores. This information is then
  41. available as the @code{STARPU_MAXCPUS} macro.
  42. @item --disable-cpu
  43. Disable the use of CPUs of the machine. Only GPUs etc. will be used.
  44. @item --enable-maxcudadev=@var{count}
  45. Use at most @var{count} CUDA devices. This information is then
  46. available as the @code{STARPU_MAXCUDADEVS} macro.
  47. @item --disable-cuda
  48. Disable the use of CUDA, even if a valid CUDA installation was detected.
  49. @item --with-cuda-dir=@var{prefix}
  50. Search for CUDA under @var{prefix}, which should notably contain
  51. @file{include/cuda.h}.
  52. @item --with-cuda-include-dir=@var{dir}
  53. Search for CUDA headers under @var{dir}, which should
  54. notably contain @code{cuda.h}. This defaults to @code{/include} appended to the
  55. value given to @code{--with-cuda-dir}.
  56. @item --with-cuda-lib-dir=@var{dir}
  57. Search for CUDA libraries under @var{dir}, which should notably contain
  58. the CUDA shared libraries---e.g., @file{libcuda.so}. This defaults to
  59. @code{/lib} appended to the value given to @code{--with-cuda-dir}.
  60. @item --disable-cuda-memcpy-peer
  61. Explicitly disable peer transfers when using CUDA 4.0.
  62. @item --enable-maxopencldev=@var{count}
  63. Use at most @var{count} OpenCL devices. This information is then
  64. available as the @code{STARPU_MAXOPENCLDEVS} macro.
  65. @item --disable-opencl
  66. Disable the use of OpenCL, even if the SDK is detected.
  67. @item --with-opencl-dir=@var{prefix}
  68. Search for an OpenCL implementation under @var{prefix}, which should
  69. notably contain @file{include/CL/cl.h} (or @file{include/OpenCL/cl.h} on
  70. Mac OS).
  71. @item --with-opencl-include-dir=@var{dir}
  72. Search for OpenCL headers under @var{dir}, which should notably contain
  73. @file{CL/cl.h} (or @file{OpenCL/cl.h} on Mac OS). This defaults to
  74. @code{/include} appended to the value given to @code{--with-opencl-dir}.
  75. @item --with-opencl-lib-dir=@var{dir}
  76. Search for an OpenCL library under @var{dir}, which should notably
  77. contain the OpenCL shared libraries---e.g. @file{libOpenCL.so}. This defaults to
  78. @code{/lib} appended to the value given to @code{--with-opencl-dir}.
  79. @item --enable-gordon
  80. Enable the use of the Gordon runtime for Cell SPUs.
  81. @c TODO: rather default to enabled when detected
  82. @item --with-gordon-dir=@var{prefix}
  83. Search for the Gordon SDK under @var{prefix}.
  84. @item --enable-maximplementations=@var{count}
  85. Allow for at most @var{count} codelet implementations for the same
  86. target device. This information is then available as the
  87. @code{STARPU_MAXIMPLEMENTATIONS} macro.
  88. @end table
  89. @node Advanced configuration
  90. @subsection Advanced configuration
  91. @table @code
  92. @item --enable-perf-debug
  93. Enable performance debugging through gprof.
  94. @item --enable-model-debug
  95. Enable performance model debugging.
  96. @item --enable-stats
  97. @c see ../../src/datawizard/datastats.c
  98. Enable gathering of memory transfer statistics.
  99. @item --enable-maxbuffers
  100. Define the maximum number of buffers that tasks will be able to take
  101. as parameters, then available as the @code{STARPU_NMAXBUFS} macro.
  102. @item --enable-allocation-cache
  103. Enable the use of a data allocation cache to avoid the cost of it with
  104. CUDA. Still experimental.
  105. @item --enable-opengl-render
  106. Enable the use of OpenGL for the rendering of some examples.
  107. @c TODO: rather default to enabled when detected
  108. @item --enable-blas-lib
  109. Specify the blas library to be used by some of the examples. The
  110. library has to be 'atlas' or 'goto'.
  111. @item --disable-starpufft
  112. Disable the build of libstarpufft, even if fftw or cuFFT is available.
  113. @item --with-magma=@var{prefix}
  114. Search for MAGMA under @var{prefix}. @var{prefix} should notably
  115. contain @file{include/magmablas.h}.
  116. @item --with-fxt=@var{prefix}
  117. Search for FxT under @var{prefix}.
  118. @url{http://savannah.nongnu.org/projects/fkt, FxT} is used to generate
  119. traces of scheduling events, which can then be rendered them using ViTE
  120. (@pxref{Off-line, off-line performance feedback}). @var{prefix} should
  121. notably contain @code{include/fxt/fxt.h}.
  122. @item --with-perf-model-dir=@var{dir}
  123. Store performance models under @var{dir}, instead of the current user's
  124. home.
  125. @item --with-mpicc=@var{path}
  126. Use the @command{mpicc} compiler at @var{path}, for starpumpi
  127. (@pxref{StarPU MPI support}).
  128. @item --with-goto-dir=@var{prefix}
  129. Search for GotoBLAS under @var{prefix}.
  130. @item --with-atlas-dir=@var{prefix}
  131. Search for ATLAS under @var{prefix}, which should notably contain
  132. @file{include/cblas.h}.
  133. @item --with-mkl-cflags=@var{cflags}
  134. Use @var{cflags} to compile code that uses the MKL library.
  135. @item --with-mkl-ldflags=@var{ldflags}
  136. Use @var{ldflags} when linking code that uses the MKL library. Note
  137. that the
  138. @url{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/,
  139. MKL website} provides a script to determine the linking flags.
  140. @item --disable-gcc-extensions
  141. Disable the GCC plug-in (@pxref{C Extensions}). By default, it is
  142. enabled when the GCC compiler provides a plug-in support.
  143. @item --disable-socl
  144. Disable the SOCL extension (@pxref{SOCL OpenCL Extensions}). By
  145. default, it is enabled when an OpenCL implementation is found.
  146. @item --disable-starpu-top
  147. Disable the StarPU-Top interface (@pxref{starpu-top}). By default, it
  148. is enabled when the required dependencies are found.
  149. @end table
  150. @node Execution configuration through environment variables
  151. @section Execution configuration through environment variables
  152. @menu
  153. * Workers:: Configuring workers
  154. * Scheduling:: Configuring the Scheduling engine
  155. * Misc:: Miscellaneous and debug
  156. @end menu
  157. Note: the values given in @code{starpu_conf} structure passed when
  158. calling @code{starpu_init} will override the values of the environment
  159. variables.
  160. @node Workers
  161. @subsection Configuring workers
  162. @menu
  163. * STARPU_NCPUS:: Number of CPU workers
  164. * STARPU_NCUDA:: Number of CUDA workers
  165. * STARPU_NOPENCL:: Number of OpenCL workers
  166. * STARPU_NGORDON:: Number of SPU workers (Cell)
  167. * STARPU_WORKERS_NOBIND:: Do not bind workers
  168. * STARPU_WORKERS_CPUID:: Bind workers to specific CPUs
  169. * STARPU_WORKERS_CUDAID:: Select specific CUDA devices
  170. * STARPU_WORKERS_OPENCLID:: Select specific OpenCL devices
  171. @end menu
  172. @node STARPU_NCPUS
  173. @subsubsection @code{STARPU_NCPUS} -- Number of CPU workers
  174. Specify the number of CPU workers (thus not including workers dedicated to control acceleratores). Note that by default, StarPU will not allocate
  175. more CPU workers than there are physical CPUs, and that some CPUs are used to control
  176. the accelerators.
  177. @node STARPU_NCUDA
  178. @subsubsection @code{STARPU_NCUDA} -- Number of CUDA workers
  179. Specify the number of CUDA devices that StarPU can use. If
  180. @code{STARPU_NCUDA} is lower than the number of physical devices, it is
  181. possible to select which CUDA devices should be used by the means of the
  182. @code{STARPU_WORKERS_CUDAID} environment variable. By default, StarPU will
  183. create as many CUDA workers as there are CUDA devices.
  184. @node STARPU_NOPENCL
  185. @subsubsection @code{STARPU_NOPENCL} -- Number of OpenCL workers
  186. OpenCL equivalent of the @code{STARPU_NCUDA} environment variable.
  187. @node STARPU_NGORDON
  188. @subsubsection @code{STARPU_NGORDON} -- Number of SPU workers (Cell)
  189. Specify the number of SPUs that StarPU can use.
  190. @node STARPU_WORKERS_NOBIND
  191. @subsubsection @code{STARPU_WORKERS_NOBIND} -- Do not bind workers to specific CPUs
  192. Setting it to non-zero will prevent StarPU from binding its threads to
  193. CPUs. This is for instance useful when running the testsuite in parallel.
  194. @node STARPU_WORKERS_CPUID
  195. @subsubsection @code{STARPU_WORKERS_CPUID} -- Bind workers to specific CPUs
  196. Passing an array of integers (starting from 0) in @code{STARPU_WORKERS_CPUID}
  197. specifies on which logical CPU the different workers should be
  198. bound. For instance, if @code{STARPU_WORKERS_CPUID = "0 1 4 5"}, the first
  199. worker will be bound to logical CPU #0, the second CPU worker will be bound to
  200. logical CPU #1 and so on. Note that the logical ordering of the CPUs is either
  201. determined by the OS, or provided by the @code{hwloc} library in case it is
  202. available.
  203. Note that the first workers correspond to the CUDA workers, then come the
  204. OpenCL and the SPU, and finally the CPU workers. For example if
  205. we have @code{STARPU_NCUDA=1}, @code{STARPU_NOPENCL=1}, @code{STARPU_NCPUS=2}
  206. and @code{STARPU_WORKERS_CPUID = "0 2 1 3"}, the CUDA device will be controlled
  207. by logical CPU #0, the OpenCL device will be controlled by logical CPU #2, and
  208. the logical CPUs #1 and #3 will be used by the CPU workers.
  209. If the number of workers is larger than the array given in
  210. @code{STARPU_WORKERS_CPUID}, the workers are bound to the logical CPUs in a
  211. round-robin fashion: if @code{STARPU_WORKERS_CPUID = "0 1"}, the first and the
  212. third (resp. second and fourth) workers will be put on CPU #0 (resp. CPU #1).
  213. This variable is ignored if the @code{use_explicit_workers_bindid} flag of the
  214. @code{starpu_conf} structure passed to @code{starpu_init} is set.
  215. @node STARPU_WORKERS_CUDAID
  216. @subsubsection @code{STARPU_WORKERS_CUDAID} -- Select specific CUDA devices
  217. Similarly to the @code{STARPU_WORKERS_CPUID} environment variable, it is
  218. possible to select which CUDA devices should be used by StarPU. On a machine
  219. equipped with 4 GPUs, setting @code{STARPU_WORKERS_CUDAID = "1 3"} and
  220. @code{STARPU_NCUDA=2} specifies that 2 CUDA workers should be created, and that
  221. they should use CUDA devices #1 and #3 (the logical ordering of the devices is
  222. the one reported by CUDA).
  223. This variable is ignored if the @code{use_explicit_workers_cuda_gpuid} flag of
  224. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  225. @node STARPU_WORKERS_OPENCLID
  226. @subsubsection @code{STARPU_WORKERS_OPENCLID} -- Select specific OpenCL devices
  227. OpenCL equivalent of the @code{STARPU_WORKERS_CUDAID} environment variable.
  228. This variable is ignored if the @code{use_explicit_workers_opencl_gpuid} flag of
  229. the @code{starpu_conf} structure passed to @code{starpu_init} is set.
  230. @node Scheduling
  231. @subsection Configuring the Scheduling engine
  232. @menu
  233. * STARPU_SCHED:: Scheduling policy
  234. * STARPU_CALIBRATE:: Calibrate performance models
  235. * STARPU_PREFETCH:: Use data prefetch
  236. * STARPU_SCHED_ALPHA:: Computation factor
  237. * STARPU_SCHED_BETA:: Communication factor
  238. @end menu
  239. @node STARPU_SCHED
  240. @subsubsection @code{STARPU_SCHED} -- Scheduling policy
  241. Choose between the different scheduling policies proposed by StarPU: work
  242. random, stealing, greedy, with performance models, etc.
  243. Use @code{STARPU_SCHED=help} to get the list of available schedulers.
  244. @node STARPU_CALIBRATE
  245. @subsubsection @code{STARPU_CALIBRATE} -- Calibrate performance models
  246. If this variable is set to 1, the performance models are calibrated during
  247. the execution. If it is set to 2, the previous values are dropped to restart
  248. calibration from scratch. Setting this variable to 0 disable calibration, this
  249. is the default behaviour.
  250. Note: this currently only applies to @code{dm}, @code{dmda} and @code{heft} scheduling policies.
  251. @node STARPU_PREFETCH
  252. @subsubsection @code{STARPU_PREFETCH} -- Use data prefetch
  253. This variable indicates whether data prefetching should be enabled (0 means
  254. that it is disabled). If prefetching is enabled, when a task is scheduled to be
  255. executed e.g. on a GPU, StarPU will request an asynchronous transfer in
  256. advance, so that data is already present on the GPU when the task starts. As a
  257. result, computation and data transfers are overlapped.
  258. Note that prefetching is enabled by default in StarPU.
  259. @node STARPU_SCHED_ALPHA
  260. @subsubsection @code{STARPU_SCHED_ALPHA} -- Computation factor
  261. To estimate the cost of a task StarPU takes into account the estimated
  262. computation time (obtained thanks to performance models). The alpha factor is
  263. the coefficient to be applied to it before adding it to the communication part.
  264. @node STARPU_SCHED_BETA
  265. @subsubsection @code{STARPU_SCHED_BETA} -- Communication factor
  266. To estimate the cost of a task StarPU takes into account the estimated
  267. data transfer time (obtained thanks to performance models). The beta factor is
  268. the coefficient to be applied to it before adding it to the computation part.
  269. @node Misc
  270. @subsection Miscellaneous and debug
  271. @menu
  272. * STARPU_SILENT:: Disable verbose mode
  273. * STARPU_LOGFILENAME:: Select debug file name
  274. * STARPU_FXT_PREFIX:: FxT trace location
  275. * STARPU_LIMIT_GPU_MEM:: Restrict memory size on the GPUs
  276. * STARPU_GENERATE_TRACE:: Generate a Paje trace when StarPU is shut down
  277. @end menu
  278. @node STARPU_SILENT
  279. @subsubsection @code{STARPU_SILENT} -- Disable verbose mode
  280. This variable allows to disable verbose mode at runtime when StarPU
  281. has been configured with the option @code{--enable-verbose}.
  282. @node STARPU_LOGFILENAME
  283. @subsubsection @code{STARPU_LOGFILENAME} -- Select debug file name
  284. This variable specifies in which file the debugging output should be saved to.
  285. @node STARPU_FXT_PREFIX
  286. @subsubsection @code{STARPU_FXT_PREFIX} -- FxT trace location
  287. This variable specifies in which directory to save the trace generated if FxT is enabled. It needs to have a trailing '/' character.
  288. @node STARPU_LIMIT_GPU_MEM
  289. @subsubsection @code{STARPU_LIMIT_GPU_MEM} -- Restrict memory size on the GPUs
  290. This variable specifies the maximum number of megabytes that should be
  291. available to the application on each GPUs. In case this value is smaller than
  292. the size of the memory of a GPU, StarPU pre-allocates a buffer to waste memory
  293. on the device. This variable is intended to be used for experimental purposes
  294. as it emulates devices that have a limited amount of memory.
  295. @node STARPU_GENERATE_TRACE
  296. @subsubsection @code{STARPU_GENERATE_TRACE} -- Generate a Paje trace when StarPU is shut down
  297. When set to 1, this variable indicates that StarPU should automatically
  298. generate a Paje trace when starpu_shutdown is called.