14faq.doxy 7.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
  1. /*
  2. * This file is part of the StarPU Handbook.
  3. * Copyright (C) 2009--2011 Universit@'e de Bordeaux
  4. * Copyright (C) 2010, 2011, 2012, 2013, 2014 CNRS
  5. * Copyright (C) 2011, 2012 INRIA
  6. * See the file version.doxy for copying conditions.
  7. */
  8. /*! \page FrequentlyAskedQuestions Frequently Asked Questions
  9. \section HowToInitializeAComputationLibraryOnceForEachWorker How To Initialize A Computation Library Once For Each Worker?
  10. Some libraries need to be initialized once for each concurrent instance that
  11. may run on the machine. For instance, a C++ computation class which is not
  12. thread-safe by itself, but for which several instanciated objects of that class
  13. can be used concurrently. This can be used in StarPU by initializing one such
  14. object per worker. For instance, the libstarpufft example does the following to
  15. be able to use FFTW on CPUs.
  16. Some global array stores the instanciated objects:
  17. \code{.c}
  18. fftw_plan plan_cpu[STARPU_NMAXWORKERS];
  19. \endcode
  20. At initialisation time of libstarpu, the objects are initialized:
  21. \code{.c}
  22. int workerid;
  23. for (workerid = 0; workerid < starpu_worker_get_count(); workerid++) {
  24. switch (starpu_worker_get_type(workerid)) {
  25. case STARPU_CPU_WORKER:
  26. plan_cpu[workerid] = fftw_plan(...);
  27. break;
  28. }
  29. }
  30. \endcode
  31. And in the codelet body, they are used:
  32. \code{.c}
  33. static void fft(void *descr[], void *_args)
  34. {
  35. int workerid = starpu_worker_get_id();
  36. fftw_plan plan = plan_cpu[workerid];
  37. ...
  38. fftw_execute(plan, ...);
  39. }
  40. \endcode
  41. This however is not sufficient for FFT on CUDA: initialization has
  42. to be done from the workers themselves. This can be done thanks to
  43. starpu_execute_on_each_worker(). For instance libstarpufft does the following.
  44. \code{.c}
  45. static void fft_plan_gpu(void *args)
  46. {
  47. plan plan = args;
  48. int n2 = plan->n2[0];
  49. int workerid = starpu_worker_get_id();
  50. cufftPlan1d(&plan->plans[workerid].plan_cuda, n, _CUFFT_C2C, 1);
  51. cufftSetStream(plan->plans[workerid].plan_cuda, starpu_cuda_get_local_stream());
  52. }
  53. void starpufft_plan(void)
  54. {
  55. starpu_execute_on_each_worker(fft_plan_gpu, plan, STARPU_CUDA);
  56. }
  57. \endcode
  58. \section UsingTheDriverAPI Using The Driver API
  59. \ref API_Running_Drivers
  60. \code{.c}
  61. int ret;
  62. struct starpu_driver = {
  63. .type = STARPU_CUDA_WORKER,
  64. .id.cuda_id = 0
  65. };
  66. ret = starpu_driver_init(&d);
  67. if (ret != 0)
  68. error();
  69. while (some_condition) {
  70. ret = starpu_driver_run_once(&d);
  71. if (ret != 0)
  72. error();
  73. }
  74. ret = starpu_driver_deinit(&d);
  75. if (ret != 0)
  76. error();
  77. \endcode
  78. To add a new kind of device to the structure starpu_driver, one needs to:
  79. <ol>
  80. <li> Add a member to the union starpu_driver::id
  81. </li>
  82. <li> Modify the internal function <c>_starpu_launch_drivers()</c> to
  83. make sure the driver is not always launched.
  84. </li>
  85. <li> Modify the function starpu_driver_run() so that it can handle
  86. another kind of architecture.
  87. </li>
  88. <li> Write the new function <c>_starpu_run_foobar()</c> in the
  89. corresponding driver.
  90. </li>
  91. </ol>
  92. \section On-GPURendering On-GPU Rendering
  93. Graphical-oriented applications need to draw the result of their computations,
  94. typically on the very GPU where these happened. Technologies such as OpenGL/CUDA
  95. interoperability permit to let CUDA directly work on the OpenGL buffers, making
  96. them thus immediately ready for drawing, by mapping OpenGL buffer, textures or
  97. renderbuffer objects into CUDA. CUDA however imposes some technical
  98. constraints: peer memcpy has to be disabled, and the thread that runs OpenGL has
  99. to be the one that runs CUDA computations for that GPU.
  100. To achieve this with StarPU, pass the option
  101. \ref disable-cuda-memcpy-peer "--disable-cuda-memcpy-peer"
  102. to <c>./configure</c> (TODO: make it dynamic), OpenGL/GLUT has to be initialized
  103. first, and the interoperability mode has to
  104. be enabled by using the field
  105. starpu_conf::cuda_opengl_interoperability, and the driver loop has to
  106. be run by the application, by using the field
  107. starpu_conf::not_launched_drivers to prevent StarPU from running it in
  108. a separate thread, and by using starpu_driver_run() to run the loop.
  109. The examples <c>gl_interop</c> and <c>gl_interop_idle</c> show how it
  110. articulates in a simple case, where rendering is done in task
  111. callbacks. The former uses <c>glutMainLoopEvent</c> to make GLUT
  112. progress from the StarPU driver loop, while the latter uses
  113. <c>glutIdleFunc</c> to make StarPU progress from the GLUT main loop.
  114. Then, to use an OpenGL buffer as a CUDA data, StarPU simply needs to be given
  115. the CUDA pointer at registration, for instance:
  116. \code{.c}
  117. /* Get the CUDA worker id */
  118. for (workerid = 0; workerid < starpu_worker_get_count(); workerid++)
  119. if (starpu_worker_get_type(workerid) == STARPU_CUDA_WORKER)
  120. break;
  121. /* Build a CUDA pointer pointing at the OpenGL buffer */
  122. cudaGraphicsResourceGetMappedPointer((void**)&output, &num_bytes, resource);
  123. /* And register it to StarPU */
  124. starpu_vector_data_register(&handle, starpu_worker_get_memory_node(workerid),
  125. output, num_bytes / sizeof(float4), sizeof(float4));
  126. /* The handle can now be used as usual */
  127. starpu_task_insert(&cl, STARPU_RW, handle, 0);
  128. /* ... */
  129. /* This gets back data into the OpenGL buffer */
  130. starpu_data_unregister(handle);
  131. \endcode
  132. and display it e.g. in the callback function.
  133. \section UsingStarPUWithMKL Using StarPU With MKL 11 (Intel Composer XE 2013)
  134. Some users had issues with MKL 11 and StarPU (versions 1.1rc1 and
  135. 1.0.5) on Linux with MKL, using 1 thread for MKL and doing all the
  136. parallelism using StarPU (no multithreaded tasks), setting the
  137. environment variable MKL_NUM_THREADS to 1, and using the threaded MKL library,
  138. with iomp5.
  139. Using this configuration, StarPU uses only 1 core, no matter the value of
  140. \ref STARPU_NCPU. The problem is actually a thread pinning issue with MKL.
  141. The solution is to set the environment variable KMP_AFFINITY to <c>disabled</c>
  142. (http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm).
  143. \section ThreadBindingOnNetBSD Thread Binding on NetBSD
  144. When using StarPU on a NetBSD machine, if the topology
  145. discovery library <c>hwloc</c> is used, thread binding will fail. To
  146. prevent the problem, you should at least use the version 1.7 of
  147. <c>hwloc</c>, and also issue the following call:
  148. \verbatim
  149. $ sysctl -w security.models.extensions.user_set_cpu_affinity=1
  150. \endverbatim
  151. Or add the following line in the file <c>/etc/sysctl.conf</c>
  152. \verbatim
  153. security.models.extensions.user_set_cpu_affinity=1
  154. \endverbatim
  155. \section PauseResume Interleaving StarPU and non-StarPU code
  156. If your application only partially uses StarPU, and you do not want to
  157. call starpu_init() / starpu_shutdown() at the beginning/end
  158. of each section, StarPU workers will poll for work between the
  159. sections. To avoid this behavior, you can "pause" StarPU with the
  160. starpu_pause() function. This will prevent the StarPU workers from
  161. accepting new work (tasks that are already in progress will not be
  162. frozen), and stop them from polling for more work.
  163. Note that this does not prevent you from submitting new tasks, but
  164. they won't execute until starpu_resume() is called. Also note
  165. that StarPU must not be paused when you call starpu_shutdown(), and
  166. that this function pair works in a push/pull manner, ie you need to
  167. match the number of calls to these functions to clear their effect.
  168. One way to use these functions could be:
  169. \code{.c}
  170. starpu_init(NULL);
  171. starpu_pause(); // To submit all the tasks without a single one executing
  172. submit_some_tasks();
  173. starpu_resume(); // The tasks start executing
  174. starpu_task_wait_for_all();
  175. starpu_pause(); // Stop the workers from polling
  176. // Non-StarPU code
  177. starpu_resume();
  178. // ...
  179. starpu_shutdown();
  180. \endcode
  181. */