| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263 | 
							- /* StarPU --- Runtime system for heterogeneous multicore architectures.
 
-  *
 
-  * Copyright (C) 2015-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
 
-  *
 
-  * StarPU is free software; you can redistribute it and/or modify
 
-  * it under the terms of the GNU Lesser General Public License as published by
 
-  * the Free Software Foundation; either version 2.1 of the License, or (at
 
-  * your option) any later version.
 
-  *
 
-  * StarPU is distributed in the hope that it will be useful, but
 
-  * WITHOUT ANY WARRANTY; without even the implied warranty of
 
-  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
-  *
 
-  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
 
-  */
 
- /*! \page ClusteringAMachine Clustering A Machine
 
- \section ClusteringGeneralIdeas General Ideas
 
- Clusters are a concept introduced in this
 
- <a href="https://hal.inria.fr/view/index/docid/1181135">paper</a>.
 
- The granularity problem is tackled by using resource aggregation:
 
- instead of dynamically splitting tasks, resources are aggregated
 
- to process coarse grain tasks in a parallel fashion. This is built on
 
- top of scheduling contexts to be able to handle any type of parallel
 
- tasks.
 
- This comes from a basic idea, making use of two levels of parallelism
 
- in a DAG.
 
- We keep the DAG parallelism but consider on top of it that a task can
 
- contain internal parallelism. A good example is if each task in the DAG
 
- is OpenMP enabled.
 
- The particularity of such tasks is that we will combine the power of two
 
- runtime systems: StarPU will manage the DAG parallelism and another
 
- runtime (e.g. OpenMP) will manage the internal parallelism. The challenge
 
- is in creating an interface between the two runtime systems so that StarPU
 
- can regroup cores inside a machine (creating what we call a \b cluster) on
 
- top of which the parallel tasks (e.g. OpenMP tasks) will be run in a
 
- contained fashion.
 
- The aim of the cluster API is to facilitate this process in an automatic
 
- fashion. For this purpose, we depend on the \c hwloc tool to detect the
 
- machine configuration and then partition it into usable clusters.
 
- <br>
 
- An example of code running on clusters is available in
 
- <c>examples/sched_ctx/parallel_tasks_with_cluster_api.c</c>.
 
- <br>
 
- Let's first look at how to create a cluster.
 
- To enable clusters in StarPU, one needs to set the configure option
 
- \ref enable-cluster "--enable-cluster".
 
- \section CreatingClusters Creating Clusters
 
- Partitioning a machine into clusters with the cluster API is fairly
 
- straightforward. The simplest way is to state under which machine
 
- topology level we wish to regroup all resources. This level is an \c hwloc
 
- object, of the type <c>hwloc_obj_type_t</c>. More information can be found in the
 
- <a href="https://www.open-mpi.org/projects/hwloc/doc/v2.0.3/">hwloc
 
- documentation</a>.
 
- Once a cluster is created, the full machine is represented with an opaque
 
- structure starpu_cluster_machine. This can be printed to show the
 
- current machine state.
 
- \code{.c}
 
- struct starpu_cluster_machine *clusters;
 
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET, 0);
 
- starpu_cluster_print(clusters);
 
- /* submit some tasks with OpenMP computations */
 
- starpu_uncluster_machine(clusters);
 
- /* we are back in the default StarPU state */
 
- \endcode
 
- The following graphic is an example of what a particular machine can
 
- look like once clusterized. The main difference is that we have less
 
- worker queues and tasks which will be executed on several resources at
 
- once. The execution of these tasks will be left to the internal runtime
 
- system, represented with a dashed box around the resources.
 
- \image latex runtime-par.eps "StarPU using parallel tasks" width=0.5\textwidth
 
- \image html runtime-par.png "StarPU using parallel tasks"
 
- Creating clusters as shown in the example above will create workers able to
 
- execute OpenMP code by default. The cluster creation function
 
- starpu_cluster_machine() takes optional parameters after the \c hwloc
 
- object (always terminated by the value \c 0) which allow to parametrize the
 
- cluster creation. These parameters can help creating clusters of a
 
- type different from OpenMP, or create a more precise partition of the
 
- machine.
 
- This is explained in Section \ref CreatingCustomClusters.
 
- \section ExampleOfConstrainingOpenMP Example Of Constraining OpenMP
 
- Clusters require being able to constrain the runtime managing the internal
 
- task parallelism (internal runtime) to the resources set by StarPU. The
 
- purpose of this is to express how StarPU must communicate with the internal
 
- runtime to achieve the required cooperation. In the case of OpenMP, StarPU
 
- will provide an awake thread from the cluster to execute this liaison. It
 
- will then provide on demand the process ids of the other resources supposed
 
- to be in the region. Finally, thanks to an OpenMP region we can create the
 
- required number of threads and bind each of them on the correct region.
 
- These will then be reused each time we encounter a <c>\#pragma omp
 
- parallel</c> in the following computations of our program.
 
- The following graphic is an example of what an OpenMP-type cluster looks
 
- like and how it represented in StarPU. We can see that one StarPU (black)
 
- thread is awake, and we need to create on the other resources the OpenMP
 
- threads (in pink).
 
- \image latex parallel_worker2.eps "StarPU with an OpenMP cluster" width=0.3\textwidth
 
- \image html parallel_worker2.png "StarPU with an OpenMP cluster"
 
- Finally, the following code shows how to force OpenMP to cooperate with StarPU
 
- and create the aforementioned OpenMP threads constrained in the cluster's
 
- resources set:
 
- \code{.c}
 
- void starpu_openmp_prologue(void * sched_ctx_id)
 
- {
 
-   int sched_ctx = *(int*)sched_ctx_id;
 
-   int *cpuids = NULL;
 
-   int ncpuids = 0;
 
-   int workerid = starpu_worker_get_id();
 
-   //we can target only CPU workers
 
-   if (starpu_worker_get_type(workerid) == STARPU_CPU_WORKER)
 
-   {
 
-     //grab all the ids inside the cluster
 
-     starpu_sched_ctx_get_available_cpuids(sched_ctx, &cpuids, &ncpuids);
 
-     //set the number of threads
 
-     omp_set_num_threads(ncpuids);
 
- #pragma omp parallel
 
-     {
 
-       //bind each threads to its respective resource
 
-       starpu_sched_ctx_bind_current_thread_to_cpuid(cpuids[omp_get_thread_num()]);
 
-     }
 
-     free(cpuids);
 
-   }
 
-   return;
 
- }
 
- \endcode
 
- This function is the default function used when calling starpu_cluster_machine() without extra parameter.
 
- Cluster are based on several tools and models already available within
 
- StarPU contexts, and merely extend contexts. More on contexts can be
 
- read in Section \ref SchedulingContexts.
 
- \section CreatingCustomClusters Creating Custom Clusters
 
- Clusters can be created either with the predefined types provided
 
- within StarPU, or with user-defined functions to bind another runtime
 
- inside StarPU.
 
- The predefined cluster types provided by StarPU are
 
- ::STARPU_CLUSTER_OPENMP, ::STARPU_CLUSTER_INTEL_OPENMP_MKL and
 
- ::STARPU_CLUSTER_GNU_OPENMP_MKL. The last one is only provided if
 
- StarPU is compiled with the \c MKL library.  It uses MKL functions to
 
- set the number of threads which is more reliable when using an OpenMP
 
- implementation different from the Intel one.
 
- The cluster type is set when calling the function
 
- starpu_cluster_machine() with the parameter ::STARPU_CLUSTER_TYPE as
 
- in the example below, which is creating a \c MKL cluster.
 
- \code{.c}
 
- struct starpu_cluster_machine *clusters;
 
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
 
-                                  STARPU_CLUSTER_TYPE, STARPU_CLUSTER_GNU_OPENMP_MKL,
 
-                                  0);
 
- \endcode
 
- Using the default type ::STARPU_CLUSTER_OPENMP is similar to calling
 
- starpu_cluster_machine() without any extra parameter.
 
- <br>
 
- Users can also define their own function.
 
- \code{.c}
 
- void foo_func(void* foo_arg);
 
- int foo_arg = 0;
 
- struct starpu_cluster_machine *clusters;
 
- clusters = starpu_cluster_machine(HWLOC_OBJ_SOCKET,
 
-                                   STARPU_CLUSTER_CREATE_FUNC, &foo_func,
 
-                                   STARPU_CLUSTER_CREATE_FUNC_ARG, &foo_arg,
 
-                                   0);
 
- \endcode
 
- Parameters that can be given to starpu_cluster_machine() are
 
- ::STARPU_CLUSTER_MIN_NB,
 
- ::STARPU_CLUSTER_MAX_NB, ::STARPU_CLUSTER_NB,
 
- ::STARPU_CLUSTER_POLICY_NAME, ::STARPU_CLUSTER_POLICY_STRUCT,
 
- ::STARPU_CLUSTER_KEEP_HOMOGENEOUS, ::STARPU_CLUSTER_PREFERE_MIN,
 
- ::STARPU_CLUSTER_CREATE_FUNC, ::STARPU_CLUSTER_CREATE_FUNC_ARG,
 
- ::STARPU_CLUSTER_TYPE, ::STARPU_CLUSTER_AWAKE_WORKERS,
 
- ::STARPU_CLUSTER_PARTITION_ONE, ::STARPU_CLUSTER_NEW and
 
- ::STARPU_CLUSTER_NCORES.
 
- \section ClustersWithSchedulingContextsAPI Clusters With Scheduling
 
- As previously mentioned, the cluster API is implemented
 
- on top of \ref SchedulingContexts. Its main addition is to ease the
 
- creation of a machine CPU partition with no overlapping by using
 
- \c hwloc, whereas scheduling contexts can use any number of any type
 
- of resources.
 
- It is therefore possible, but not recommended, to create clusters
 
- using the scheduling contexts API. This can be useful mostly in the
 
- most complex machine configurations where users have to dimension
 
- precisely clusters by hand using their own algorithm.
 
- \code{.c}
 
- /* the list of resources the context will manage */
 
- int workerids[3] = {1, 3, 10};
 
- /* indicate the list of workers assigned to it, the number of workers,
 
- the name of the context and the scheduling policy to be used within
 
- the context */
 
- int id_ctx = starpu_sched_ctx_create(workerids, 3, "my_ctx", 0);
 
- /* let StarPU know that the following tasks will be submitted to this context */
 
- starpu_sched_ctx_set_task_context(id);
 
- task->prologue_callback_pop_func=&runtime_interface_function_here;
 
- /* submit the task to StarPU */
 
- starpu_task_submit(task);
 
- \endcode
 
- As this example illustrates, creating a context without scheduling
 
- policy will create a cluster. The interface function between StarPU
 
- and the other runtime must be specified through the field
 
- starpu_task::prologue_callback_pop_func. Such a function can be
 
- similar to the OpenMP thread team creation one (see above).
 
- <br>
 
- Note that the OpenMP mode is the default mode both for clusters and
 
- contexts. The result of a cluster creation is a woken-up master worker
 
- and sleeping "slaves" which allow the master to run tasks on their
 
- resources.
 
- To create a cluster with woken-up workers, the flag
 
- ::STARPU_SCHED_CTX_AWAKE_WORKERS must be set when using the scheduling
 
- context API function starpu_sched_ctx_create(), or the flag
 
- ::STARPU_CLUSTER_AWAKE_WORKERS must be set when using the cluster API
 
- function starpu_cluster_machine().
 
- */
 
 
  |