exa2pro
/
starpu-max


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
							/*
 * This file is part of the StarPU Handbook.
 * Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 * Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique
 * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 * See the file version.doxy for copying conditions.
 */

/*! \page TipsAndTricksToKnowAbout Tips and Tricks To Know About

\section HowToInitializeAComputationLibraryOnceForEachWorker How To Initialize A Computation Library Once For Each Worker?

Some libraries need to be initialized once for each concurrent instance that
may run on the machine. For instance, a C++ computation class which is not
thread-safe by itself, but for which several instanciated objects of that class
can be used concurrently. This can be used in StarPU by initializing one such
object per worker. For instance, the libstarpufft example does the following to
be able to use FFTW on CPUs.

Some global array stores the instanciated objects:

\code{.c}
fftw_plan plan_cpu[STARPU_NMAXWORKERS];
\endcode

At initialisation time of libstarpu, the objects are initialized:

\code{.c}
int workerid;
for (workerid = 0; workerid < starpu_worker_get_count(); workerid++) {
    switch (starpu_worker_get_type(workerid)) {
        case STARPU_CPU_WORKER:
            plan_cpu[workerid] = fftw_plan(...);
            break;
    }
}
\endcode

And in the codelet body, they are used:

\code{.c}
static void fft(void *descr[], void *_args)
{
    int workerid = starpu_worker_get_id();
    fftw_plan plan = plan_cpu[workerid];
    ...

    fftw_execute(plan, ...);
}
\endcode

This however is not sufficient for FFT on CUDA: initialization has
to be done from the workers themselves.  This can be done thanks to
starpu_execute_on_each_worker().  For instance libstarpufft does the following.

\code{.c}
static void fft_plan_gpu(void *args)
{
    plan plan = args;
    int n2 = plan->n2[0];
    int workerid = starpu_worker_get_id();

    cufftPlan1d(&plan->plans[workerid].plan_cuda, n, _CUFFT_C2C, 1);
    cufftSetStream(plan->plans[workerid].plan_cuda, starpu_cuda_get_local_stream());
}
void starpufft_plan(void)
{
    starpu_execute_on_each_worker(fft_plan_gpu, plan, STARPU_CUDA);
}
\endcode

\section HowToLimitMemoryPerNode How to limit memory per node

TODO

Talk about
\ref STARPU_LIMIT_CUDA_devid_MEM, \ref STARPU_LIMIT_CUDA_MEM,
\ref STARPU_LIMIT_OPENCL_devid_MEM, \ref STARPU_LIMIT_OPENCL_MEM
and \ref STARPU_LIMIT_CPU_MEM

starpu_memory_get_available()

\section ThreadBindingOnNetBSD Thread Binding on NetBSD

When using StarPU on a NetBSD machine, if the topology
discovery library <c>hwloc</c> is used, thread binding will fail. To
prevent the problem, you should at least use the version 1.7 of
<c>hwloc</c>, and also issue the following call:

\verbatim
$ sysctl -w security.models.extensions.user_set_cpu_affinity=1
\endverbatim

Or add the following line in the file <c>/etc/sysctl.conf</c>

\verbatim
security.models.extensions.user_set_cpu_affinity=1
\endverbatim

\section UsingStarPUWithMKL Using StarPU With MKL 11 (Intel Composer XE 2013)

Some users had issues with MKL 11 and StarPU (versions 1.1rc1 and
1.0.5) on Linux with MKL, using 1 thread for MKL and doing all the
parallelism using StarPU (no multithreaded tasks), setting the
environment variable MKL_NUM_THREADS to 1, and using the threaded MKL library,
with iomp5.

Using this configuration, StarPU uses only 1 core, no matter the value of
\ref STARPU_NCPU. The problem is actually a thread pinning issue with MKL.

The solution is to set the environment variable KMP_AFFINITY to <c>disabled</c>
(http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm).

*/