| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166 | /* * This file is part of the StarPU Handbook. * Copyright (C) 2015 Universit@'e de Bordeaux * Copyright (C) 2015 CNRS * Copyright (C) 2015 INRIA * See the file version.doxy for copying conditions. *//*! \page ClusteringAMachine Clustering A MachineTODO: clarify and put more explanations, express how to create clustersusing the context API.\section GeneralIdeas General IdeasClusters are a concept introduced in this<a href="https://hal.inria.fr/view/index/docid/1181135">paper</a>. Thiscomes from a basic idea, making use of two level of parallelism in a DAG.We keep the DAG parallelism but consider on top of it that a task cancontain internal parallelism. A good example is if each task in the DAGis OpenMP enabled.The particularity of such tasks is that we will combine the power of tworuntime systems: StarPU will manage the DAG parallelism and anotherruntime (e.g. OpenMP) will manage the internal parallelism. The challengeis in creating an interface between the two runtime systems so that StarPUcan regroup cores inside a machine (creating what we call a "cluster") ontop of which the parallel tasks (e.g. OpenMP tasks) will be ran in acontained fashion.The aim of the cluster API is to facilitate this process in an automaticfashion. For this purpose, we depend on the hwloc tool to detect themachine configuration and then partition it into usable clusters.An example of code running on clusters is available in<c>examples/sched_ctx/parallel_tasks_with_cluster_api.c</c>.Let's first look at how to create one in practice, then we will detailtheir internals.\section CreatingClusters Creating ClustersPartitioning a machine into clusters with the cluster API is fairlystraightforward. The simplest way is to state under which machinetopology level we wish to regroup all resources. This level is an HwLocobject, of the type <c>hwloc_obj_type_t</c>. More can be found in the<a href="https://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00076.php">hwlocdocumentation</a>.Once a cluster is created, the full machine is represented with an opaquestructure named <c>starpu_cluster_machine</c>. This can be printed to show thecurrent machine state.\code{.c}struct starpu_cluster_machine *clusters;clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET, 0);starpu_cluster_print(clusters);//... submit some tasks with OpenMP computations starpu_uncluster_machine(clusters);//... we are back in the default starpu state\endcodeThe following graphic is an example of what a particular machine canlook like once clusterized. The main difference is that we have lessworker queues and tasks which will be executed on several resources atonce. The execution of these tasks will be left to the internal runtimesystem, represented with a dashed box around the resources.\image latex runtime-par.eps "StarPU using parallel tasks" width=0.5\textwidth\image html runtime-par.png "StarPU using parallel tasks"Creating clusters as shown in the example above will create workers able toexecute OpenMP code by default. The cluster API aims in allowing toparametrize the cluster creation and can take a <c>va_list</c> of argumentsas input after the HwLoc object (always terminated by a 0 value). These canhelp creating clusters of a type different from OpenMP, or create a moreprecise partition of the machine.\section ExampleOfConstrainingOpenMP Example Of Constraining OpenMPClusters require being able to constrain the runtime managing the internaltask parallelism (internal runtime) to the resources set by StarPU. Thepurpose of this is to express how StarPU must communicate with the internalruntime to achieve the required cooperation. In the case of OpenMP, StarPUwill provide an awake thread from the cluster to execute this liaison. Itwill then provide on demand the process ids of the other resources supposedto be in the region. Finally, thanks to an OpenMP region we can create therequired number of threads and bind each of them on the correct region.These will then be reused each time we encounter a <c>\#pragma ompparallel</c> in the following computations of our program.The following graphic is an example of what an OpenMP-type cluster lookslike and how it represented in StarPU. We can see that one StarPU (black)thread is awake, and we need to create on the other resources the OpenMPthreads (in pink).\image latex parallel_worker2.eps "StarPU with an OpenMP cluster" width=0.3\textwidth\image html parallel_worker2.png "StarPU with an OpenMP cluster"Finally, the following code shows how to create OpenMP cooperate with StarPUand create the aforementioned OpenMP threads constrained in the cluster'sresources set:\code{.c}void starpu_openmp_prologue(void * sched_ctx_id)        int sched_ctx = *(int*)sched_ctx_id;		int *cpuids = NULL;		int ncpuids = 0;		int workerid = starpu_worker_get_id();        //we can target only CPU workers		if (starpu_worker_get_type(workerid) == STARPU_CPU_WORKER)		{                //grab all the ids inside the cluster				starpu_sched_ctx_get_available_cpuids(sched_ctx, &cpuids, &ncpuids);                //set the number of threads				omp_set_num_threads(ncpuids);#pragma omp parallel                {                        //bind each threads to its respective resource						starpu_sched_ctx_bind_current_thread_to_cpuid(cpuids[omp_get_thread_num()]);                }                free(cpuids);		}		return;}\endcodeThis is in fact exactly the default function used when we don't specifyanything. As can be seen, we based the clusters on several tools andmodels present in the StarPU contexts, and merely extended them to allowto represent and carry clusters. More on contexts can be read here\ref SchedulingContexts.\section CreatingCustomClusters Creating Custom ClustersAs was previously said it is possible to create clusters using anothercluster type, in order to bind another internal runtime inside StarPU.This can be done with in several ways:- By using the currently available functions- By passing as argument a user defined functionHere are two examples:\code{.c}struct starpu_cluster_machine *clusters;clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,                                 STARPU_CLUSTER_TYPE, GNU_OPENMP_MKL,                                 0);\endcodeThis type of clusters is available by default only if StarPU is compiledwith MKL. It uses MKL functions to set the number of threads which ismore reliable when using an OpenMP implementation different from theIntel one.\code{.c}void foo_func(void* foo_arg);\\...int foo_arg = 0;struct starpu_cluster_machine *clusters;clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,                                  STARPU_CLUSTER_CREATE_FUNC, &foo_func,                                  STARPU_CLUSTER_CREATE_FUNC_ARG, &foo_arg,                                  0);\endcode*/
 |