/*
 * This file is part of the StarPU Handbook.
 * Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 * Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique
 * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 * See the file version.doxy for copying conditions.
 */

/*! \page tipsTricks Tips and Tricks to know about

\section How_to_initialize_a_computation_library_once_for_each_worker How to initialize a computation library once for each worker?

Some libraries need to be initialized once for each concurrent instance that
may run on the machine. For instance, a C++ computation class which is not
thread-safe by itself, but for which several instanciated objects of that class
can be used concurrently. This can be used in StarPU by initializing one such
object per worker. For instance, the libstarpufft example does the following to
be able to use FFTW.

Some global array stores the instanciated objects:

\code{.c}
fftw_plan plan_cpu[STARPU_NMAXWORKERS];
\endcode

At initialisation time of libstarpu, the objects are initialized:

\code{.c}
int workerid;
for (workerid = 0; workerid < starpu_worker_get_count(); workerid++) {
    switch (starpu_worker_get_type(workerid)) {
        case STARPU_CPU_WORKER:
            plan_cpu[workerid] = fftw_plan(...);
            break;
    }
}
\endcode

And in the codelet body, they are used:

\code{.c}
static void fft(void *descr[], void *_args)
{
    int workerid = starpu_worker_get_id();
    fftw_plan plan = plan_cpu[workerid];
    ...

    fftw_execute(plan, ...);
}
\endcode

Another way to go which may be needed is to execute some code from the workers
themselves thanks to starpu_execute_on_each_worker(). This may be required
by CUDA to behave properly due to threading issues. For instance, StarPU's
starpu_cublas_init() looks like the following to call
<c>cublasInit</c> from the workers themselves:

\code{.c}
static void init_cublas_func(void *args STARPU_ATTRIBUTE_UNUSED)
{
    cublasStatus cublasst = cublasInit();
    cublasSetKernelStream(starpu_cuda_get_local_stream());
}
void starpu_cublas_init(void)
{
    starpu_execute_on_each_worker(init_cublas_func, NULL, STARPU_CUDA);
}
\endcode

\section How_to_limit_memory_per_node How to limit memory per node

TODO

Talk about
<c>STARPU_LIMIT_CUDA_devid_MEM</c>, <c>STARPU_LIMIT_CUDA_MEM</c>,
<c>STARPU_LIMIT_OPENCL_devid_MEM</c>, <c>STARPU_LIMIT_OPENCL_MEM</c>
and <c>STARPU_LIMIT_CPU_MEM</c>

starpu_memory_get_available()

\section Thread_Binding_on_NetBSD Thread Binding on NetBSD

When using StarPU on a NetBSD machine, if the topology
discovery library <c>hwloc</c> is used, thread binding will fail. To
prevent the problem, you should at least use the version 1.7 of
<c>hwloc</c>, and also issue the following call:

\verbatim
$ sysctl -w security.models.extensions.user_set_cpu_affinity=1
\endverbatim

Or add the following line in the file <c>/etc/sysctl.conf</c>

\verbatim
security.models.extensions.user_set_cpu_affinity=1
\endverbatim

*/