/*
 * This file is part of the StarPU Handbook.
 * Copyright (C) 2015 Universit@'e de Bordeaux
 * Copyright (C) 2015 CNRS
 * Copyright (C) 2015 INRIA
 * See the file version.doxy for copying conditions.
 */
/*! \page ClusteringAMachine Clustering A Machine
TODO: clarify and put more explanations, express how to create clusters
using the context API.
\section GeneralIdeas General Ideas
Clusters are a concept introduced in this
paper. This
comes from a basic idea, making use of two level of parallelism in a DAG.
We keep the DAG parallelism but consider on top of it that a task can
contain internal parallelism. A good example is if each task in the DAG
is OpenMP enabled.
The particularity of such tasks is that we will combine the power of two
runtime systems: StarPU will manage the DAG parallelism and another
runtime (e.g. OpenMP) will manage the internal parallelism. The challenge
is in creating an interface between the two runtime systems so that StarPU
can regroup cores inside a machine (creating what we call a "cluster") on
top of which the parallel tasks (e.g. OpenMP tasks) will be ran in a
contained fashion.
The aim of the cluster API is to facilitate this process in an automatic
fashion. For this purpose, we depend on the hwloc tool to detect the
machine configuration and then partition it into usable clusters.
An example of code running on clusters is available in
examples/sched_ctx/parallel_tasks_with_cluster_api.c.
Let's first look at how to create one in practice, then we will detail
their internals.
\section CreatingClusters Creating Clusters
Partitioning a machine into clusters with the cluster API is fairly
straightforward. The simplest way is to state under which machine
topology level we wish to regroup all resources. This level is an HwLoc
object, of the type hwloc_obj_type_t. More can be found in the
hwloc
documentation.
Once a cluster is created, the full machine is represented with an opaque
structure named starpu_cluster_machine. This can be printed to show the
current machine state.
\code{.c}
struct starpu_cluster_machine *clusters;
clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET, 0);
starpu_cluster_print(clusters);
//... submit some tasks with OpenMP computations 
starpu_uncluster_machine(clusters);
//... we are back in the default starpu state
\endcode
The following graphic is an example of what a particular machine can
look like once clusterized. The main difference is that we have less
worker queues and tasks which will be executed on several resources at
once. The execution of these tasks will be left to the internal runtime
system, represented with a dashed box around the resources.
\image latex runtime-par.eps "StarPU using parallel tasks" width=0.5\textwidth
\image html runtime-par.png "StarPU using parallel tasks"
Creating clusters as shown in the example above will create workers able to
execute OpenMP code by default. The cluster API aims in allowing to
parametrize the cluster creation and can take a va_list of arguments
as input after the HwLoc object (always terminated by a 0 value). These can
help creating clusters of a type different from OpenMP, or create a more
precise partition of the machine.
\section ExampleOfConstrainingOpenMP Example Of Constraining OpenMP
Clusters require being able to constrain the runtime managing the internal
task parallelism (internal runtime) to the resources set by StarPU. The
purpose of this is to express how StarPU must communicate with the internal
runtime to achieve the required cooperation. In the case of OpenMP, StarPU
will provide an awake thread from the cluster to execute this liaison. It
will then provide on demand the process ids of the other resources supposed
to be in the region. Finally, thanks to an OpenMP region we can create the
required number of threads and bind each of them on the correct region.
These will then be reused each time we encounter a \#pragma omp
parallel in the following computations of our program.
The following graphic is an example of what an OpenMP-type cluster looks
like and how it represented in StarPU. We can see that one StarPU (black)
thread is awake, and we need to create on the other resources the OpenMP
threads (in pink).
\image latex parallel_worker2.eps "StarPU with an OpenMP cluster" width=0.3\textwidth
\image html parallel_worker2.png "StarPU with an OpenMP cluster"
Finally, the following code shows how to create OpenMP cooperate with StarPU
and create the aforementioned OpenMP threads constrained in the cluster's
resources set:
\code{.c}
void starpu_openmp_prologue(void * sched_ctx_id)
        int sched_ctx = *(int*)sched_ctx_id;
		int *cpuids = NULL;
		int ncpuids = 0;
		int workerid = starpu_worker_get_id();
        //we can target only CPU workers
		if (starpu_worker_get_type(workerid) == STARPU_CPU_WORKER)
		{
                //grab all the ids inside the cluster
				starpu_sched_ctx_get_available_cpuids(sched_ctx, &cpuids, &ncpuids);
                //set the number of threads
				omp_set_num_threads(ncpuids);
#pragma omp parallel
                {
                        //bind each threads to its respective resource
						starpu_sched_ctx_bind_current_thread_to_cpuid(cpuids[omp_get_thread_num()]);
                }
                free(cpuids);
		}
		return;
}
\endcode
This is in fact exactly the default function used when we don't specify
anything. As can be seen, we based the clusters on several tools and
models present in the StarPU contexts, and merely extended them to allow
to represent and carry clusters. More on contexts can be read here
\ref SchedulingContexts.
\section CreatingCustomClusters Creating Custom Clusters
As was previously said it is possible to create clusters using another
cluster type, in order to bind another internal runtime inside StarPU.
This can be done with in several ways:
- By using the currently available functions
- By passing as argument a user defined function
Here are two examples:
\code{.c}
struct starpu_cluster_machine *clusters;
clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
                                 STARPU_CLUSTER_TYPE, GNU_OPENMP_MKL,
                                 0);
\endcode
This type of clusters is available by default only if StarPU is compiled
with MKL. It uses MKL functions to set the number of threads which is
more reliable when using an OpenMP implementation different from the
Intel one.
\code{.c}
void foo_func(void* foo_arg);
\\...
int foo_arg = 0;
struct starpu_cluster_machine *clusters;
clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
                                  STARPU_CLUSTER_CREATE_FUNC, &foo_func,
                                  STARPU_CLUSTER_CREATE_FUNC_ARG, &foo_arg,
                                  0);
\endcode
*/