123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2015-2019 CNRS
- * Copyright (C) 2015,2018 Université de Bordeaux
- * Copyright (C) 2015,2016 Inria
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page ClusteringAMachine Clustering A Machine
- TODO: clarify and put more explanations, express how to create clusters
- using the context API.
- \section GeneralIdeas General Ideas
- Clusters are a concept introduced in this
- <a href="https://hal.inria.fr/view/index/docid/1181135">paper</a>. This
- comes from a basic idea, making use of two levels of parallelism in a DAG.
- We keep the DAG parallelism but consider on top of it that a task can
- contain internal parallelism. A good example is if each task in the DAG
- is OpenMP enabled.
- The particularity of such tasks is that we will combine the power of two
- runtime systems: StarPU will manage the DAG parallelism and another
- runtime (e.g. OpenMP) will manage the internal parallelism. The challenge
- is in creating an interface between the two runtime systems so that StarPU
- can regroup cores inside a machine (creating what we call a "cluster") on
- top of which the parallel tasks (e.g. OpenMP tasks) will be ran in a
- contained fashion.
- The aim of the cluster API is to facilitate this process in an automatic
- fashion. For this purpose, we depend on the \c hwloc tool to detect the
- machine configuration and then partition it into usable clusters.
- An example of code running on clusters is available in
- <c>examples/sched_ctx/parallel_tasks_with_cluster_api.c</c>.
- Let's first look at how to create a cluster.
- \section CreatingClusters Creating Clusters
- Partitioning a machine into clusters with the cluster API is fairly
- straightforward. The simplest way is to state under which machine
- topology level we wish to regroup all resources. This level is an \c hwloc
- object, of the type <c>hwloc_obj_type_t</c>. More information can be found in the
- <a href="https://www.open-mpi.org/projects/hwloc/doc/v2.0.3/">hwloc
- documentation</a>.
- Once a cluster is created, the full machine is represented with an opaque
- structure starpu_cluster_machine. This can be printed to show the
- current machine state.
- \code{.c}
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET, 0);
- starpu_cluster_print(clusters);
- //... submit some tasks with OpenMP computations
- starpu_uncluster_machine(clusters);
- //... we are back in the default starpu state
- \endcode
- The following graphic is an example of what a particular machine can
- look like once clusterized. The main difference is that we have less
- worker queues and tasks which will be executed on several resources at
- once. The execution of these tasks will be left to the internal runtime
- system, represented with a dashed box around the resources.
- \image latex runtime-par.eps "StarPU using parallel tasks" width=0.5\textwidth
- \image html runtime-par.png "StarPU using parallel tasks"
- Creating clusters as shown in the example above will create workers able to
- execute OpenMP code by default. The cluster API aims in allowing to
- parametrize the cluster creation and can take a <c>va_list</c> of arguments
- as input after the \c hwloc object (always terminated by a 0 value). These can
- help creating clusters of a type different from OpenMP, or create a more
- precise partition of the machine.
- \section ExampleOfConstrainingOpenMP Example Of Constraining OpenMP
- Clusters require being able to constrain the runtime managing the internal
- task parallelism (internal runtime) to the resources set by StarPU. The
- purpose of this is to express how StarPU must communicate with the internal
- runtime to achieve the required cooperation. In the case of OpenMP, StarPU
- will provide an awake thread from the cluster to execute this liaison. It
- will then provide on demand the process ids of the other resources supposed
- to be in the region. Finally, thanks to an OpenMP region we can create the
- required number of threads and bind each of them on the correct region.
- These will then be reused each time we encounter a <c>\#pragma omp
- parallel</c> in the following computations of our program.
- The following graphic is an example of what an OpenMP-type cluster looks
- like and how it represented in StarPU. We can see that one StarPU (black)
- thread is awake, and we need to create on the other resources the OpenMP
- threads (in pink).
- \image latex parallel_worker2.eps "StarPU with an OpenMP cluster" width=0.3\textwidth
- \image html parallel_worker2.png "StarPU with an OpenMP cluster"
- Finally, the following code shows how to force OpenMP to cooperate with StarPU
- and create the aforementioned OpenMP threads constrained in the cluster's
- resources set:
- \code{.c}
- void starpu_openmp_prologue(void * sched_ctx_id)
- {
- int sched_ctx = *(int*)sched_ctx_id;
- int *cpuids = NULL;
- int ncpuids = 0;
- int workerid = starpu_worker_get_id();
- //we can target only CPU workers
- if (starpu_worker_get_type(workerid) == STARPU_CPU_WORKER)
- {
- //grab all the ids inside the cluster
- starpu_sched_ctx_get_available_cpuids(sched_ctx, &cpuids, &ncpuids);
- //set the number of threads
- omp_set_num_threads(ncpuids);
- #pragma omp parallel
- {
- //bind each threads to its respective resource
- starpu_sched_ctx_bind_current_thread_to_cpuid(cpuids[omp_get_thread_num()]);
- }
- free(cpuids);
- }
- return;
- }
- \endcode
- This function is the default function used when calling starpu_cluster_machine() without extra parameter.
- Cluster are based on several tools and models already available within
- StarPU contexts, and merely extend contexts. More on contexts can be
- read in Section \ref SchedulingContexts.
- \section CreatingCustomClusters Creating Custom Clusters
- Clusters can be created either with the predefined functions provided
- within StarPU, or with user-defined functions to bind another runtime
- inside StarPU.
- The predefined cluster types provided by StarPU are
- ::STARPU_CLUSTER_OPENMP, ::STARPU_CLUSTER_INTEL_OPENMP_MKL and
- ::STARPU_CLUSTER_GNU_OPENMP_MKL. The last one is only provided if
- StarPU is compiled with the \c MKL library. It uses MKL functions to
- set the number of threads which is more reliable when using an OpenMP
- implementation different from the Intel one.
- Here an example creating a MKL cluster.
- \code{.c}
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
- STARPU_CLUSTER_TYPE, STARPU_CLUSTER_GNU_OPENMP_MKL,
- 0);
- \endcode
- Using the default type ::STARPU_CLUSTER_OPENMP is similar to calling
- starpu_cluster_machine() without any extra parameter.
- Users can also define their own function.
- \code{.c}
- void foo_func(void* foo_arg);
- \\...
- int foo_arg = 0;
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
- STARPU_CLUSTER_CREATE_FUNC, &foo_func,
- STARPU_CLUSTER_CREATE_FUNC_ARG, &foo_arg,
- 0);
- \endcode
- \section ClustersWithSchedulingContextsAPI Clusters With Scheduling
- As previously mentioned, the cluster API is implemented
- on top of \ref SchedulingContexts. Its main addition is to ease the
- creation of a machine CPU partition with no overlapping by using
- \c hwloc, whereas scheduling contexts can use any number of any
- resources.
- It is therefore possible, but not recommended, to create clusters
- using the scheduling contexts API. This can be useful mostly in the
- most complex machine configurations where users have to dimension
- precisely clusters by hand using their own algorithm.
- \code{.c}
- /* the list of resources the context will manage */
- int workerids[3] = {1, 3, 10};
- /* indicate the list of workers assigned to it, the number of workers,
- the name of the context and the scheduling policy to be used within
- the context */
- int id_ctx = starpu_sched_ctx_create(workerids, 3, "my_ctx", 0);
- /* let StarPU know that the following tasks will be submitted to this context */
- starpu_sched_ctx_set_task_context(id);
- task->prologue_callback_pop_func=&runtime_interface_function_here;
- /* submit the task to StarPU */
- starpu_task_submit(task);
- \endcode
- As this example illustrates, creating a context without scheduling
- policy will create a cluster. The important change is that users
- will have to specify an interface function between StarPU and the other runtime.
- This can be done in the field starpu_task::prologue_callback_pop_func. Such a function
- can be similar to the OpenMP thread team creation one (see above).
- Note that the OpenMP mode is the default one both for clusters and
- contexts. The result of a cluster creation is a woken up master worker
- and sleeping "slaves" which allow the master to run tasks on their
- resources. To create a cluster with woken up workers one can use the
- flag \ref STARPU_SCHED_CTX_AWAKE_WORKERS with the scheduling context
- API and \ref STARPU_CLUSTER_AWAKE_WORKERS with the cluster API as
- parameter to the creation function.
- */
|