123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115 |
- /*
- * This file is part of the StarPU Handbook.
- * Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
- * Copyright (C) 2010, 2011, 2012, 2013 Centre National de la Recherche Scientifique
- * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
- * See the file version.doxy for copying conditions.
- */
- /*! \page TipsAndTricksToKnowAbout Tips and Tricks To Know About
- \section HowToInitializeAComputationLibraryOnceForEachWorker How To Initialize A Computation Library Once For Each Worker?
- Some libraries need to be initialized once for each concurrent instance that
- may run on the machine. For instance, a C++ computation class which is not
- thread-safe by itself, but for which several instanciated objects of that class
- can be used concurrently. This can be used in StarPU by initializing one such
- object per worker. For instance, the libstarpufft example does the following to
- be able to use FFTW on CPUs.
- Some global array stores the instanciated objects:
- \code{.c}
- fftw_plan plan_cpu[STARPU_NMAXWORKERS];
- \endcode
- At initialisation time of libstarpu, the objects are initialized:
- \code{.c}
- int workerid;
- for (workerid = 0; workerid < starpu_worker_get_count(); workerid++) {
- switch (starpu_worker_get_type(workerid)) {
- case STARPU_CPU_WORKER:
- plan_cpu[workerid] = fftw_plan(...);
- break;
- }
- }
- \endcode
- And in the codelet body, they are used:
- \code{.c}
- static void fft(void *descr[], void *_args)
- {
- int workerid = starpu_worker_get_id();
- fftw_plan plan = plan_cpu[workerid];
- ...
- fftw_execute(plan, ...);
- }
- \endcode
- This however is not sufficient for FFT on CUDA: initialization has
- to be done from the workers themselves. This can be done thanks to
- starpu_execute_on_each_worker(). For instance libstarpufft does the following.
- \code{.c}
- static void fft_plan_gpu(void *args)
- {
- plan plan = args;
- int n2 = plan->n2[0];
- int workerid = starpu_worker_get_id();
- cufftPlan1d(&plan->plans[workerid].plan_cuda, n, _CUFFT_C2C, 1);
- cufftSetStream(plan->plans[workerid].plan_cuda, starpu_cuda_get_local_stream());
- }
- void starpufft_plan(void)
- {
- starpu_execute_on_each_worker(fft_plan_gpu, plan, STARPU_CUDA);
- }
- \endcode
- \section HowToLimitMemoryPerNode How to limit memory per node
- TODO
- Talk about
- \ref STARPU_LIMIT_CUDA_devid_MEM, \ref STARPU_LIMIT_CUDA_MEM,
- \ref STARPU_LIMIT_OPENCL_devid_MEM, \ref STARPU_LIMIT_OPENCL_MEM
- and \ref STARPU_LIMIT_CPU_MEM
- starpu_memory_get_available()
- \section ThreadBindingOnNetBSD Thread Binding on NetBSD
- When using StarPU on a NetBSD machine, if the topology
- discovery library <c>hwloc</c> is used, thread binding will fail. To
- prevent the problem, you should at least use the version 1.7 of
- <c>hwloc</c>, and also issue the following call:
- \verbatim
- $ sysctl -w security.models.extensions.user_set_cpu_affinity=1
- \endverbatim
- Or add the following line in the file <c>/etc/sysctl.conf</c>
- \verbatim
- security.models.extensions.user_set_cpu_affinity=1
- \endverbatim
- \section UsingStarPUWithMKL Using StarPU With MKL 11 (Intel Composer XE 2013)
- Some users had issues with MKL 11 and StarPU (versions 1.1rc1 and
- 1.0.5) on Linux with MKL, using 1 thread for MKL and doing all the
- parallelism using StarPU (no multithreaded tasks), setting the
- environment variable MKL_NUM_THREADS to 1, and using the threaded MKL library,
- with iomp5.
- Using this configuration, StarPU uses only 1 core, no matter the value of
- \ref STARPU_NCPU. The problem is actually a thread pinning issue with MKL.
- The solution is to set the environment variable KMP_AFFINITY to <c>disabled</c>
- (http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm).
- */
|