| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230 | 
							- /*
 
-  * This file is part of the StarPU Handbook.
 
-  * Copyright (C) 2009--2011  Universit@'e de Bordeaux
 
-  * Copyright (C) 2010, 2011, 2012, 2013, 2014  CNRS
 
-  * Copyright (C) 2011, 2012 INRIA
 
-  * See the file version.doxy for copying conditions.
 
-  */
 
- /*! \page FrequentlyAskedQuestions Frequently Asked Questions
 
- \section HowToInitializeAComputationLibraryOnceForEachWorker How To Initialize A Computation Library Once For Each Worker?
 
- Some libraries need to be initialized once for each concurrent instance that
 
- may run on the machine. For instance, a C++ computation class which is not
 
- thread-safe by itself, but for which several instanciated objects of that class
 
- can be used concurrently. This can be used in StarPU by initializing one such
 
- object per worker. For instance, the libstarpufft example does the following to
 
- be able to use FFTW on CPUs.
 
- Some global array stores the instanciated objects:
 
- \code{.c}
 
- fftw_plan plan_cpu[STARPU_NMAXWORKERS];
 
- \endcode
 
- At initialisation time of libstarpu, the objects are initialized:
 
- \code{.c}
 
- int workerid;
 
- for (workerid = 0; workerid < starpu_worker_get_count(); workerid++) {
 
-     switch (starpu_worker_get_type(workerid)) {
 
-         case STARPU_CPU_WORKER:
 
-             plan_cpu[workerid] = fftw_plan(...);
 
-             break;
 
-     }
 
- }
 
- \endcode
 
- And in the codelet body, they are used:
 
- \code{.c}
 
- static void fft(void *descr[], void *_args)
 
- {
 
-     int workerid = starpu_worker_get_id();
 
-     fftw_plan plan = plan_cpu[workerid];
 
-     ...
 
-     fftw_execute(plan, ...);
 
- }
 
- \endcode
 
- This however is not sufficient for FFT on CUDA: initialization has
 
- to be done from the workers themselves.  This can be done thanks to
 
- starpu_execute_on_each_worker().  For instance libstarpufft does the following.
 
- \code{.c}
 
- static void fft_plan_gpu(void *args)
 
- {
 
-     plan plan = args;
 
-     int n2 = plan->n2[0];
 
-     int workerid = starpu_worker_get_id();
 
-     cufftPlan1d(&plan->plans[workerid].plan_cuda, n, _CUFFT_C2C, 1);
 
-     cufftSetStream(plan->plans[workerid].plan_cuda, starpu_cuda_get_local_stream());
 
- }
 
- void starpufft_plan(void)
 
- {
 
-     starpu_execute_on_each_worker(fft_plan_gpu, plan, STARPU_CUDA);
 
- }
 
- \endcode
 
- \section UsingTheDriverAPI Using The Driver API
 
- \ref API_Running_Drivers
 
- \code{.c}
 
- int ret;
 
- struct starpu_driver = {
 
-     .type = STARPU_CUDA_WORKER,
 
-     .id.cuda_id = 0
 
- };
 
- ret = starpu_driver_init(&d);
 
- if (ret != 0)
 
-     error();
 
- while (some_condition) {
 
-     ret = starpu_driver_run_once(&d);
 
-     if (ret != 0)
 
-         error();
 
- }
 
- ret = starpu_driver_deinit(&d);
 
- if (ret != 0)
 
-     error();
 
- \endcode
 
- To add a new kind of device to the structure starpu_driver, one needs to:
 
- <ol>
 
- <li> Add a member to the union starpu_driver::id
 
- </li>
 
- <li> Modify the internal function <c>_starpu_launch_drivers()</c> to
 
- make sure the driver is not always launched.
 
- </li>
 
- <li> Modify the function starpu_driver_run() so that it can handle
 
- another kind of architecture.
 
- </li>
 
- <li> Write the new function <c>_starpu_run_foobar()</c> in the
 
- corresponding driver.
 
- </li>
 
- </ol>
 
- \section On-GPURendering On-GPU Rendering
 
- Graphical-oriented applications need to draw the result of their computations,
 
- typically on the very GPU where these happened. Technologies such as OpenGL/CUDA
 
- interoperability permit to let CUDA directly work on the OpenGL buffers, making
 
- them thus immediately ready for drawing, by mapping OpenGL buffer, textures or
 
- renderbuffer objects into CUDA.  CUDA however imposes some technical
 
- constraints: peer memcpy has to be disabled, and the thread that runs OpenGL has
 
- to be the one that runs CUDA computations for that GPU.
 
- To achieve this with StarPU, pass the option
 
- \ref disable-cuda-memcpy-peer "--disable-cuda-memcpy-peer"
 
- to <c>./configure</c> (TODO: make it dynamic), OpenGL/GLUT has to be initialized
 
- first, and the interoperability mode has to
 
- be enabled by using the field
 
- starpu_conf::cuda_opengl_interoperability, and the driver loop has to
 
- be run by the application, by using the field
 
- starpu_conf::not_launched_drivers to prevent StarPU from running it in
 
- a separate thread, and by using starpu_driver_run() to run the loop.
 
- The examples <c>gl_interop</c> and <c>gl_interop_idle</c> show how it
 
- articulates in a simple case, where rendering is done in task
 
- callbacks. The former uses <c>glutMainLoopEvent</c> to make GLUT
 
- progress from the StarPU driver loop, while the latter uses
 
- <c>glutIdleFunc</c> to make StarPU progress from the GLUT main loop.
 
- Then, to use an OpenGL buffer as a CUDA data, StarPU simply needs to be given
 
- the CUDA pointer at registration, for instance:
 
- \code{.c}
 
- /* Get the CUDA worker id */
 
- for (workerid = 0; workerid < starpu_worker_get_count(); workerid++)
 
-         if (starpu_worker_get_type(workerid) == STARPU_CUDA_WORKER)
 
-                 break;
 
- /* Build a CUDA pointer pointing at the OpenGL buffer */
 
- cudaGraphicsResourceGetMappedPointer((void**)&output, &num_bytes, resource);
 
- /* And register it to StarPU */
 
- starpu_vector_data_register(&handle, starpu_worker_get_memory_node(workerid),
 
-                             output, num_bytes / sizeof(float4), sizeof(float4));
 
- /* The handle can now be used as usual */
 
- starpu_task_insert(&cl, STARPU_RW, handle, 0);
 
- /* ... */
 
- /* This gets back data into the OpenGL buffer */
 
- starpu_data_unregister(handle);
 
- \endcode
 
- and display it e.g. in the callback function.
 
- \section UsingStarPUWithMKL Using StarPU With MKL 11 (Intel Composer XE 2013)
 
- Some users had issues with MKL 11 and StarPU (versions 1.1rc1 and
 
- 1.0.5) on Linux with MKL, using 1 thread for MKL and doing all the
 
- parallelism using StarPU (no multithreaded tasks), setting the
 
- environment variable MKL_NUM_THREADS to 1, and using the threaded MKL library,
 
- with iomp5.
 
- Using this configuration, StarPU uses only 1 core, no matter the value of
 
- \ref STARPU_NCPU. The problem is actually a thread pinning issue with MKL.
 
- The solution is to set the environment variable KMP_AFFINITY to <c>disabled</c>
 
- (http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm).
 
- \section ThreadBindingOnNetBSD Thread Binding on NetBSD
 
- When using StarPU on a NetBSD machine, if the topology
 
- discovery library <c>hwloc</c> is used, thread binding will fail. To
 
- prevent the problem, you should at least use the version 1.7 of
 
- <c>hwloc</c>, and also issue the following call:
 
- \verbatim
 
- $ sysctl -w security.models.extensions.user_set_cpu_affinity=1
 
- \endverbatim
 
- Or add the following line in the file <c>/etc/sysctl.conf</c>
 
- \verbatim
 
- security.models.extensions.user_set_cpu_affinity=1
 
- \endverbatim
 
- \section PauseResume Interleaving StarPU and non-StarPU code
 
- If your application only partially uses StarPU, and you do not want to
 
- call starpu_init() / starpu_shutdown() at the beginning/end
 
- of each section, StarPU workers will poll for work between the
 
- sections. To avoid this behavior, you can "pause" StarPU with the 
 
- starpu_pause() function. This will prevent the StarPU workers from
 
- accepting new work (tasks that are already in progress will not be
 
- frozen), and stop them from polling for more work.
 
- Note that this does not prevent you from submitting new tasks, but
 
- they won't execute until starpu_resume() is called. Also note
 
- that StarPU must not be paused when you call starpu_shutdown(), and
 
- that this function pair works in a push/pull manner, ie you need to
 
- match the number of calls to these functions to clear their effect.
 
- One way to use these functions could be:
 
- \code{.c}
 
- starpu_init(NULL);
 
- starpu_pause(); // To submit all the tasks without a single one executing
 
- submit_some_tasks();
 
- starpu_resume(); // The tasks start executing
 
- starpu_task_wait_for_all();
 
- starpu_pause(); // Stop the workers from polling
 
- // Non-StarPU code
 
- starpu_resume();
 
- // ...
 
- starpu_shutdown();
 
- \endcode
 
- */
 
 
  |