# StarPU --- Runtime system for heterogeneous multicore architectures.
#
# Copyright (C) 2013 Simon Archipoff
#
# StarPU is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or (at
# your option) any later version.
#
# StarPU is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See the GNU Lesser General Public License in COPYING.LGPL for more details.
Mutex policy
scheduler have to be protected when the hypervisor is modifying it.
there is a mutex in struct starpu_sched_tree wich should be taken by
the application to push a task
and one mutex per worker wich should be taken by workers when they pop
or push a task.
The hypervisor must take all of them to modifying the scheduler.
Creation/Destruction
All the struct starpu_sched_node * starpu_sched_node_foo_create()
function return a initialized struct starpu_sched_node.
The void starpu_sched_node_destroy(struct starpu_sched_node * node)
function call node->deinit_data(node) to free data allocated during
creation
Workers nodes are particulars, there is no creation function, only
accessor to guaranty uniqueness of worker nodes. worker_node->workers and
worker_node->workers_in_ctx should not be modified.
Add/Remove workers
I see 2 way for adding/removing workers of the scheduler
The hypervisor block all the scheduling and modify the scheduler in
the way it wants, and then update all node->workers_in_ctx bitmaps, and
all node->push_task should respect it.
And the second one may be done in an atomic way. The struct
starpu_sched_tree hold a struct starpu_bitmap * that represent
available workers in context. All node can make a call to struct starpu_bitmap
* starpu_sched_node_get_worker_mask(unsigned sched_ctx_id) to see
where they can push a task according to available workers.
But with this way we have a problem for node->estimated_end, in case
of fifo, we have to know how many workers are available to the fifo
node. We also have a problem for shared object. The first way seems to
be better.
Hierarchical construction
Bugs everywhere, works only in simple and particulars cases.
Its difficult to guess where we should plug accelerators because we cant rely on
hwloc topology. Hierarchical heft seems to work on simple machines with numa nodes
and GPUs
this fail if hwloc_socket_composed_sched_node or hwloc_cache_composed_sched_node is not
NULL
Various things
In several place realloc is used (in prio_deque and for
starpu_sched_node_add_child), because we should not have a lot
different priority level nor adding too many childs.