123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2015-2019 CNRS
- * Copyright (C) 2015,2018 Université de Bordeaux
- * Copyright (C) 2015,2016 Inria
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page ClusteringAMachine Clustering A Machine
- \section GeneralIdeas General Ideas
- Clusters are a concept introduced in this
- <a href="https://hal.inria.fr/view/index/docid/1181135">paper</a>.
- The granularity problem is tackled by using resource aggregation:
- instead of dynamically splitting tasks, resources are aggregated
- to process coarse grain tasks in a parallel fashion. This is built on
- top of scheduling contexts to be able to handle any type of parallel
- tasks.
- This comes from a basic idea, making use of two levels of parallelism
- in a DAG.
- We keep the DAG parallelism but consider on top of it that a task can
- contain internal parallelism. A good example is if each task in the DAG
- is OpenMP enabled.
- The particularity of such tasks is that we will combine the power of two
- runtime systems: StarPU will manage the DAG parallelism and another
- runtime (e.g. OpenMP) will manage the internal parallelism. The challenge
- is in creating an interface between the two runtime systems so that StarPU
- can regroup cores inside a machine (creating what we call a \b cluster) on
- top of which the parallel tasks (e.g. OpenMP tasks) will be run in a
- contained fashion.
- The aim of the cluster API is to facilitate this process in an automatic
- fashion. For this purpose, we depend on the \c hwloc tool to detect the
- machine configuration and then partition it into usable clusters.
- <br>
- An example of code running on clusters is available in
- <c>examples/sched_ctx/parallel_tasks_with_cluster_api.c</c>.
- <br>
- Let's first look at how to create a cluster.
- To enable clusters in StarPU, one needs to set the configure option
- \ref enable-cluster "--enable-cluster".
- \section CreatingClusters Creating Clusters
- Partitioning a machine into clusters with the cluster API is fairly
- straightforward. The simplest way is to state under which machine
- topology level we wish to regroup all resources. This level is an \c hwloc
- object, of the type <c>hwloc_obj_type_t</c>. More information can be found in the
- <a href="https://www.open-mpi.org/projects/hwloc/doc/v2.0.3/">hwloc
- documentation</a>.
- Once a cluster is created, the full machine is represented with an opaque
- structure starpu_cluster_machine. This can be printed to show the
- current machine state.
- \code{.c}
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET, 0);
- starpu_cluster_print(clusters);
- /* submit some tasks with OpenMP computations */
- starpu_uncluster_machine(clusters);
- /* we are back in the default StarPU state */
- \endcode
- The following graphic is an example of what a particular machine can
- look like once clusterized. The main difference is that we have less
- worker queues and tasks which will be executed on several resources at
- once. The execution of these tasks will be left to the internal runtime
- system, represented with a dashed box around the resources.
- \image latex runtime-par.eps "StarPU using parallel tasks" width=0.5\textwidth
- \image html runtime-par.png "StarPU using parallel tasks"
- Creating clusters as shown in the example above will create workers able to
- execute OpenMP code by default. The cluster creation function
- starpu_cluster_machine() takes optional parameters after the \c hwloc
- object (always terminated by the value \c 0) which allow to parametrize the
- cluster creation. These parameters can help creating clusters of a
- type different from OpenMP, or create a more precise partition of the
- machine.
- This is explained in Section \ref CreatingCustomClusters.
- \section ExampleOfConstrainingOpenMP Example Of Constraining OpenMP
- Clusters require being able to constrain the runtime managing the internal
- task parallelism (internal runtime) to the resources set by StarPU. The
- purpose of this is to express how StarPU must communicate with the internal
- runtime to achieve the required cooperation. In the case of OpenMP, StarPU
- will provide an awake thread from the cluster to execute this liaison. It
- will then provide on demand the process ids of the other resources supposed
- to be in the region. Finally, thanks to an OpenMP region we can create the
- required number of threads and bind each of them on the correct region.
- These will then be reused each time we encounter a <c>\#pragma omp
- parallel</c> in the following computations of our program.
- The following graphic is an example of what an OpenMP-type cluster looks
- like and how it represented in StarPU. We can see that one StarPU (black)
- thread is awake, and we need to create on the other resources the OpenMP
- threads (in pink).
- \image latex parallel_worker2.eps "StarPU with an OpenMP cluster" width=0.3\textwidth
- \image html parallel_worker2.png "StarPU with an OpenMP cluster"
- Finally, the following code shows how to force OpenMP to cooperate with StarPU
- and create the aforementioned OpenMP threads constrained in the cluster's
- resources set:
- \code{.c}
- void starpu_openmp_prologue(void * sched_ctx_id)
- {
- int sched_ctx = *(int*)sched_ctx_id;
- int *cpuids = NULL;
- int ncpuids = 0;
- int workerid = starpu_worker_get_id();
- //we can target only CPU workers
- if (starpu_worker_get_type(workerid) == STARPU_CPU_WORKER)
- {
- //grab all the ids inside the cluster
- starpu_sched_ctx_get_available_cpuids(sched_ctx, &cpuids, &ncpuids);
- //set the number of threads
- omp_set_num_threads(ncpuids);
- #pragma omp parallel
- {
- //bind each threads to its respective resource
- starpu_sched_ctx_bind_current_thread_to_cpuid(cpuids[omp_get_thread_num()]);
- }
- free(cpuids);
- }
- return;
- }
- \endcode
- This function is the default function used when calling starpu_cluster_machine() without extra parameter.
- Cluster are based on several tools and models already available within
- StarPU contexts, and merely extend contexts. More on contexts can be
- read in Section \ref SchedulingContexts.
- \section CreatingCustomClusters Creating Custom Clusters
- Clusters can be created either with the predefined types provided
- within StarPU, or with user-defined functions to bind another runtime
- inside StarPU.
- The predefined cluster types provided by StarPU are
- ::STARPU_CLUSTER_OPENMP, ::STARPU_CLUSTER_INTEL_OPENMP_MKL and
- ::STARPU_CLUSTER_GNU_OPENMP_MKL. The last one is only provided if
- StarPU is compiled with the \c MKL library. It uses MKL functions to
- set the number of threads which is more reliable when using an OpenMP
- implementation different from the Intel one.
- The cluster type is set when calling the function
- starpu_cluster_machine() with the parameter ::STARPU_CLUSTER_TYPE as
- in the example below, which is creating a \c MKL cluster.
- \code{.c}
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
- STARPU_CLUSTER_TYPE, STARPU_CLUSTER_GNU_OPENMP_MKL,
- 0);
- \endcode
- Using the default type ::STARPU_CLUSTER_OPENMP is similar to calling
- starpu_cluster_machine() without any extra parameter.
- <br>
- Users can also define their own function.
- \code{.c}
- void foo_func(void* foo_arg);
- int foo_arg = 0;
- struct starpu_cluster_machine *clusters;
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
- STARPU_CLUSTER_CREATE_FUNC, &foo_func,
- STARPU_CLUSTER_CREATE_FUNC_ARG, &foo_arg,
- 0);
- \endcode
- Parameters that can be given to starpu_cluster_machine() are
- ::STARPU_CLUSTER_MIN_NB,
- ::STARPU_CLUSTER_MAX_NB, ::STARPU_CLUSTER_NB,
- ::STARPU_CLUSTER_POLICY_NAME, ::STARPU_CLUSTER_POLICY_STRUCT,
- ::STARPU_CLUSTER_KEEP_HOMOGENEOUS, ::STARPU_CLUSTER_PREFERE_MIN,
- ::STARPU_CLUSTER_CREATE_FUNC, ::STARPU_CLUSTER_CREATE_FUNC_ARG,
- ::STARPU_CLUSTER_TYPE, ::STARPU_CLUSTER_AWAKE_WORKERS,
- ::STARPU_CLUSTER_PARTITION_ONE, ::STARPU_CLUSTER_NEW and
- ::STARPU_CLUSTER_NCORES.
- \section ClustersWithSchedulingContextsAPI Clusters With Scheduling
- As previously mentioned, the cluster API is implemented
- on top of \ref SchedulingContexts. Its main addition is to ease the
- creation of a machine CPU partition with no overlapping by using
- \c hwloc, whereas scheduling contexts can use any number of any type
- of resources.
- It is therefore possible, but not recommended, to create clusters
- using the scheduling contexts API. This can be useful mostly in the
- most complex machine configurations where users have to dimension
- precisely clusters by hand using their own algorithm.
- \code{.c}
- /* the list of resources the context will manage */
- int workerids[3] = {1, 3, 10};
- /* indicate the list of workers assigned to it, the number of workers,
- the name of the context and the scheduling policy to be used within
- the context */
- int id_ctx = starpu_sched_ctx_create(workerids, 3, "my_ctx", 0);
- /* let StarPU know that the following tasks will be submitted to this context */
- starpu_sched_ctx_set_task_context(id);
- task->prologue_callback_pop_func=&runtime_interface_function_here;
- /* submit the task to StarPU */
- starpu_task_submit(task);
- \endcode
- As this example illustrates, creating a context without scheduling
- policy will create a cluster. The interface function between StarPU
- and the other runtime must be specified through the field
- starpu_task::prologue_callback_pop_func. Such a function can be
- similar to the OpenMP thread team creation one (see above).
- <br>
- Note that the OpenMP mode is the default mode both for clusters and
- contexts. The result of a cluster creation is a woken-up master worker
- and sleeping "slaves" which allow the master to run tasks on their
- resources.
- To create a cluster with woken-up workers, the flag
- ::STARPU_SCHED_CTX_AWAKE_WORKERS must be set when using the scheduling
- context API function starpu_sched_ctx_create(), or the flag
- ::STARPU_CLUSTER_AWAKE_WORKERS must be set when using the cluster API
- function starpu_cluster_machine().
- */
|