Explorar o código

Add implicit support for asynchronous partition planning. This means one does not need to call starpu_data_partition_submit etc. explicitly any more, StarPU will make the appropriate calls as needed.

Samuel Thibault %!s(int64=7) %!d(string=hai) anos
pai
achega
abfe581427

+ 3 - 0
ChangeLog

@@ -32,6 +32,9 @@ New features:
   * Add a new implementation of StarPU-MPI on top of NewMadeleine
   * Add a new implementation of StarPU-MPI on top of NewMadeleine
   * Add optional callbacks to notify an external resource manager
   * Add optional callbacks to notify an external resource manager
     about workers going to sleep and waking up
     about workers going to sleep and waking up
+  * Add implicit support for asynchronous partition planning. This means one
+    does not need to call starpu_data_partition_submit etc. explicitly any
+    more, StarPU will make the appropriate calls as needed.
 
 
 Small features:
 Small features:
   * Scheduling contexts may now be associated a user data pointer at creation
   * Scheduling contexts may now be associated a user data pointer at creation

+ 21 - 30
doc/doxygen/chapters/310_data_management.doxy

@@ -341,11 +341,15 @@ currently working on the data.  This can be a bottleneck for the application.
 
 
 An asynchronous API also exists, it works only on handles with sequential
 An asynchronous API also exists, it works only on handles with sequential
 consistency. The principle is to first plan the partitioning, which returns
 consistency. The principle is to first plan the partitioning, which returns
-data handles of the partition, which are not functional yet. Along other task
-submission, one can submit the actual partitioning, and then use the handles
-of the partition. Before using the handle of the whole data, one has to submit
-the unpartitioning. <c>fmultiple_submit</c> is a complete example using this
-technique.
+data handles of the partition, which are not functional yet. When submitting
+tasks, one can mix using the handles of the partition, of the whole data. One
+can even partition recursively and mix using handles at different levels of the
+recursion. Of course, StarPU will have to introduce coherency synchronization.
+
+<c>fmultiple_submit_implicit</c> is a complete example using this technique.
+One can also look at <c>fmultiple_submit_readonly</c> which contains the
+explicit coherency synchronization which are automatically introduced by StarPU
+for <c>fmultiple_submit_implicit</c>.
 
 
 In short, we first register a matrix and plan the partitioning:
 In short, we first register a matrix and plan the partitioning:
 
 
@@ -361,33 +365,20 @@ starpu_data_partition_plan(handle, &f_vert, vert_handle);
 
 
 starpu_data_partition_plan() returns the handles for the partition in <c>vert_handle</c>.
 starpu_data_partition_plan() returns the handles for the partition in <c>vert_handle</c>.
 
 
-One can submit tasks working on the main handle, but not yet on the <c>vert_handle</c>
-handles. Now we submit the partitioning:
-
-\code{.c}
-starpu_data_partition_submit(handle, PARTS, vert_handle);
-\endcode
-
-And now we can submit tasks working on <c>vert_handle</c> handles (and not on the main
-handle any more). Eventually we want to work on the main handle again, so we
-submit the unpartitioning:
-
-\code{.c}
-starpu_data_unpartition_submit(handle, PARTS, vert_handle, -1);
-\endcode
-
-And now we can submit tasks working on the main handle again.
+One can then submit tasks working on the main handle, and tasks working on
+<c>vert_handle</c> handles. Between using the main handle and <c>vert_handle</c>
+handles, StarPU will automatically call starpu_data_partition_submit and
+starpu_data_unpartition_submit.
 
 
 All this code is asynchronous, just submitting which tasks, partitioning and
 All this code is asynchronous, just submitting which tasks, partitioning and
-unpartitioning should be done at runtime.
-
-Planning several partitioning of the same data is also possible, one just has
-to submit unpartitioning (to get back to the initial handle) before submitting
-another partitioning.
-
-It is also possible to activate several partitioning at the same time, in
-read-only mode, by using starpu_data_partition_readonly_submit(). A complete
-example is available in <c>examples/filters/fmultiple_submit_readonly.c</c>.
+unpartitioning will be done at runtime.
+
+Planning several partitioning of the same data is also possible, StarPU will
+unpartition and repartition as needed when mixing accesses of different
+partitions. If data access is done in read-only mode, StarPU will allow the
+different partitioning to coexist. As soon as a data is accessed in read-write
+mode, StarPU will automatically unpartition everything and activate only the
+partitioning leading to the data being written to.
 
 
 \section ManualPartitioning Manual Partitioning
 \section ManualPartitioning Manual Partitioning
 
 

+ 5 - 0
doc/doxygen/chapters/api/codelet_and_tasks.doxy

@@ -150,6 +150,11 @@ Value to be set in starpu_codelet::flags to execute the codelet functions even i
 Value to be set in starpu_codelet::flags to execute the codelet functions even in simgrid mode,
 Value to be set in starpu_codelet::flags to execute the codelet functions even in simgrid mode,
 and later inject the measured timing inside the simulation.
 and later inject the measured timing inside the simulation.
 
 
+\def STARPU_CODELET_NOPLANS
+\ingroup API_Codelet_And_Tasks
+Value to be set in starpu_codelet::flags to make starpu_task_submit not submit
+automatic asynchronous partitioning/unpartitioning.
+
 \typedef starpu_cpu_func_t
 \typedef starpu_cpu_func_t
 \ingroup API_Codelet_And_Tasks
 \ingroup API_Codelet_And_Tasks
 CPU implementation of a codelet.
 CPU implementation of a codelet.

+ 9 - 0
examples/Makefile.am

@@ -224,6 +224,7 @@ STARPU_EXAMPLES +=				\
 	filters/fmultiple_manual		\
 	filters/fmultiple_manual		\
 	filters/fmultiple_submit		\
 	filters/fmultiple_submit		\
 	filters/fmultiple_submit_readonly	\
 	filters/fmultiple_submit_readonly	\
+	filters/fmultiple_submit_implicit	\
 	tag_example/tag_example			\
 	tag_example/tag_example			\
 	tag_example/tag_example2		\
 	tag_example/tag_example2		\
 	tag_example/tag_example3		\
 	tag_example/tag_example3		\
@@ -569,6 +570,14 @@ filters_fmultiple_submit_readonly_SOURCES +=	\
 	filters/fmultiple_cuda.cu
 	filters/fmultiple_cuda.cu
 endif
 endif
 
 
+filters_fmultiple_submit_implicit_SOURCES =		\
+	filters/fmultiple_submit_implicit.c
+
+if STARPU_USE_CUDA
+filters_fmultiple_submit_implicit_SOURCES +=		\
+	filters/fmultiple_cuda.cu
+endif
+
 examplebin_PROGRAMS +=				\
 examplebin_PROGRAMS +=				\
 	filters/shadow				\
 	filters/shadow				\
 	filters/shadow2d			\
 	filters/shadow2d			\

+ 9 - 5
examples/filters/fmultiple_submit.c

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
  * Copyright (C) 2017                                     CNRS
  * Copyright (C) 2017                                     CNRS
- * Copyright (C) 2015                                     Université de Bordeaux
+ * Copyright (C) 2015,2017                                Université de Bordeaux
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -89,11 +89,9 @@ struct starpu_codelet cl_check_scale =
 #ifdef STARPU_USE_CUDA
 #ifdef STARPU_USE_CUDA
 	.cuda_funcs = {fmultiple_check_scale_cuda},
 	.cuda_funcs = {fmultiple_check_scale_cuda},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
-#else
-	/* Only enable it on CPUs if we don't have a CUDA device, to force remote execution on the CUDA device */
+#endif
 	.cpu_funcs = {fmultiple_check_scale},
 	.cpu_funcs = {fmultiple_check_scale},
 	.cpu_funcs_name = {"fmultiple_check_scale"},
 	.cpu_funcs_name = {"fmultiple_check_scale"},
-#endif
 	.nbuffers = 1,
 	.nbuffers = 1,
 	.modes = {STARPU_RW},
 	.modes = {STARPU_RW},
 	.name = "fmultiple_check_scale"
 	.name = "fmultiple_check_scale"
@@ -118,6 +116,12 @@ int main(void)
 		return 77;
 		return 77;
 	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
 	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
 
 
+	/* Disable codelet on CPUs if we have a CUDA device, to force remote execution on the CUDA device */
+	if (starpu_cuda_worker_get_count()) {
+		cl_check_scale.cpu_funcs[0] = NULL;
+		cl_check_scale.cpu_funcs_name[0] = NULL;
+	}
+
 	/* Declare the whole matrix to StarPU */
 	/* Declare the whole matrix to StarPU */
 	starpu_matrix_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)matrix, NX, NX, NY, sizeof(matrix[0][0]));
 	starpu_matrix_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)matrix, NX, NX, NY, sizeof(matrix[0][0]));
 
 
@@ -182,7 +186,7 @@ int main(void)
 	/* Now switch back to total view of the matrix */
 	/* Now switch back to total view of the matrix */
 	starpu_data_unpartition_submit(handle, PARTS, horiz_handle, -1);
 	starpu_data_unpartition_submit(handle, PARTS, horiz_handle, -1);
 
 
-	/* And check the values of the whole matrix */
+	/* And check and scale the values of the whole matrix */
 	int factor = 4;
 	int factor = 4;
 	int start = 0;
 	int start = 0;
 	ret = starpu_task_insert(&cl_check_scale,
 	ret = starpu_task_insert(&cl_check_scale,

+ 362 - 0
examples/filters/fmultiple_submit_implicit.c

@@ -0,0 +1,362 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2017                                     CNRS
+ * Copyright (C) 2015,2017                                Université de Bordeaux
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+
+/*
+ * This examplifies how to access the same matrix with different partitioned
+ * views, doing the coherency through partition planning.
+ *
+ * We first run a kernel on the whole matrix to fill it, then check the value
+ * in parallel from the whole handle, from the horizontal slices, and from the
+ * vertical slices. Then we switch back to the whole matrix to check and scale
+ * it. Then we check the result again from the whole handle, the horizontal
+ * slices, and the vertical slices. Then we switch to read-write on the
+ * horizontal slices to check and scale them. Then we check again from the
+ * whole handle, the horizontal slices, and the vertical slices. Eventually we
+ * switch back to the whole matrix to check and scale it.
+ *
+ * Please keep this in sync with fmultiple_submit_readonly.c
+ */
+
+#include <starpu.h>
+
+#define NX    6
+#define NY    6
+#define PARTS 2
+
+#define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
+
+void matrix_fill(void *buffers[], void *cl_arg)
+{
+	unsigned i, j;
+	(void)cl_arg;
+
+	/* length of the matrix */
+	unsigned nx = STARPU_MATRIX_GET_NX(buffers[0]);
+	unsigned ny = STARPU_MATRIX_GET_NY(buffers[0]);
+	unsigned ld = STARPU_MATRIX_GET_LD(buffers[0]);
+	int *val = (int *)STARPU_MATRIX_GET_PTR(buffers[0]);
+
+	for(j=0; j<ny ; j++)
+	{
+		for(i=0; i<nx ; i++)
+			val[(j*ld)+i] = i+100*j;
+	}
+}
+
+struct starpu_codelet cl_fill =
+{
+	.cpu_funcs = {matrix_fill},
+	.cpu_funcs_name = {"matrix_fill"},
+	.nbuffers = 1,
+	.modes = {STARPU_W},
+	.name = "matrix_fill"
+};
+
+void fmultiple_check_scale(void *buffers[], void *cl_arg)
+{
+	int start, factor;
+	unsigned i, j;
+
+	/* length of the matrix */
+	unsigned nx = STARPU_MATRIX_GET_NX(buffers[0]);
+	unsigned ny = STARPU_MATRIX_GET_NY(buffers[0]);
+	unsigned ld = STARPU_MATRIX_GET_LD(buffers[0]);
+	int *val = (int *)STARPU_MATRIX_GET_PTR(buffers[0]);
+
+	starpu_codelet_unpack_args(cl_arg, &start, &factor);
+
+	for(j=0; j<ny ; j++)
+	{
+		for(i=0; i<nx ; i++)
+		{
+			STARPU_ASSERT(val[(j*ld)+i] == start + factor*((int)(i+100*j)));
+			val[(j*ld)+i] *= 2;
+		}
+	}
+}
+
+#ifdef STARPU_USE_CUDA
+extern void fmultiple_check_scale_cuda(void *buffers[], void *cl_arg);
+#endif
+struct starpu_codelet cl_check_scale =
+{
+#ifdef STARPU_USE_CUDA
+	.cuda_funcs = {fmultiple_check_scale_cuda},
+	.cuda_flags = {STARPU_CUDA_ASYNC},
+#endif
+	.cpu_funcs = {fmultiple_check_scale},
+	.cpu_funcs_name = {"fmultiple_check_scale"},
+	.nbuffers = 1,
+	.modes = {STARPU_RW},
+	.name = "fmultiple_check_scale"
+};
+
+void fmultiple_check(void *buffers[], void *cl_arg)
+{
+	int start, factor;
+	unsigned i, j;
+
+	/* length of the matrix */
+	unsigned nx = STARPU_MATRIX_GET_NX(buffers[0]);
+	unsigned ny = STARPU_MATRIX_GET_NY(buffers[0]);
+	unsigned ld = STARPU_MATRIX_GET_LD(buffers[0]);
+	int *val = (int *)STARPU_MATRIX_GET_PTR(buffers[0]);
+
+	starpu_codelet_unpack_args(cl_arg, &start, &factor);
+
+	for(j=0; j<ny ; j++)
+	{
+		for(i=0; i<nx ; i++)
+		{
+			STARPU_ASSERT(val[(j*ld)+i] == start + factor*((int)(i+100*j)));
+		}
+	}
+}
+
+#ifdef STARPU_USE_CUDA
+extern void fmultiple_check_cuda(void *buffers[], void *cl_arg);
+#endif
+struct starpu_codelet cl_check =
+{
+#ifdef STARPU_USE_CUDA
+	.cuda_funcs = {fmultiple_check_cuda},
+	.cuda_flags = {STARPU_CUDA_ASYNC},
+#endif
+	.cpu_funcs = {fmultiple_check},
+	.cpu_funcs_name = {"fmultiple_check"},
+	.nbuffers = 1,
+	.modes = {STARPU_R},
+	.name = "fmultiple_check"
+};
+
+int main(void)
+{
+	int start, factor;
+	unsigned j, n=1;
+	int matrix[NX][NY];
+	int ret, i;
+
+	/* We haven't taken care otherwise */
+	STARPU_ASSERT((NX%PARTS) == 0);
+	STARPU_ASSERT((NY%PARTS) == 0);
+
+	starpu_data_handle_t handle;
+	starpu_data_handle_t vert_handle[PARTS];
+	starpu_data_handle_t horiz_handle[PARTS];
+
+	ret = starpu_init(NULL);
+	if (ret == -ENODEV)
+		return 77;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
+
+	/* Disable codelet on CPUs if we have a CUDA device, to force remote execution on the CUDA device */
+	if (starpu_cuda_worker_get_count()) {
+		cl_check_scale.cpu_funcs[0] = NULL;
+		cl_check_scale.cpu_funcs_name[0] = NULL;
+		cl_check.cpu_funcs[0] = NULL;
+		cl_check.cpu_funcs_name[0] = NULL;
+	}
+
+	/* Declare the whole matrix to StarPU */
+	starpu_matrix_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)matrix, NX, NX, NY, sizeof(matrix[0][0]));
+
+	/* Partition the matrix in PARTS vertical slices */
+	struct starpu_data_filter f_vert =
+	{
+		.filter_func = starpu_matrix_filter_block,
+		.nchildren = PARTS
+	};
+	starpu_data_partition_plan(handle, &f_vert, vert_handle);
+
+	/* Partition the matrix in PARTS horizontal slices */
+	struct starpu_data_filter f_horiz =
+	{
+		.filter_func = starpu_matrix_filter_vertical_block,
+		.nchildren = PARTS
+	};
+	starpu_data_partition_plan(handle, &f_horiz, horiz_handle);
+
+	/* Fill the matrix */
+	ret = starpu_task_insert(&cl_fill, STARPU_W, handle, 0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	factor = 1;
+
+	/* Look at readonly vertical and horizontal view of the matrix */
+
+	/* Check the values of the vertical slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = i*(NX/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, vert_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* Check the values of the horizontal slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*100*i*(NY/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, horiz_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* And of the main matrix */
+	start = 0;
+	ret = starpu_task_insert(&cl_check,
+			STARPU_R, handle,
+			STARPU_VALUE, &start, sizeof(start),
+			STARPU_VALUE, &factor, sizeof(factor),
+			0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+
+	/* Now look at the total view of the matrix */
+
+	/* Check and scale it */
+	start = 0;
+	ret = starpu_task_insert(&cl_check_scale,
+			STARPU_RW, handle,
+			STARPU_VALUE, &start, sizeof(start),
+			STARPU_VALUE, &factor, sizeof(factor),
+			0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	factor = 2;
+
+	/* Look again readonly vertical and horizontal slices */
+
+	/* Check the values of the vertical slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*i*(NX/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, vert_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* Check the values of the horizontal slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*100*i*(NY/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, horiz_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* And of the main matrix */
+	start = 0;
+	ret = starpu_task_insert(&cl_check,
+			STARPU_R, handle,
+			STARPU_VALUE, &start, sizeof(start),
+			STARPU_VALUE, &factor, sizeof(factor),
+			0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+
+	/* Now try to touch horizontal slices */
+
+	/* Check and scale the values of the horizontal slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*100*i*(NY/PARTS);
+		ret = starpu_task_insert(&cl_check_scale,
+				STARPU_RW, horiz_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	factor = 4;
+
+	/* And come back to read-only */
+
+	/* Check the values of the vertical slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*i*(NX/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, vert_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* Check the values of the horizontal slices */
+	for (i = 0; i < PARTS; i++)
+	{
+		start = factor*100*i*(NY/PARTS);
+		ret = starpu_task_insert(&cl_check,
+				STARPU_R, horiz_handle[i],
+				STARPU_VALUE, &start, sizeof(start),
+				STARPU_VALUE, &factor, sizeof(factor),
+				0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	}
+	/* And of the main matrix */
+	start = 0;
+	ret = starpu_task_insert(&cl_check,
+			STARPU_R, handle,
+			STARPU_VALUE, &start, sizeof(start),
+			STARPU_VALUE, &factor, sizeof(factor),
+			0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+
+	/* And access the whole matrix again */
+
+	/* And check and scale the values of the whole matrix */
+	start = 0;
+	ret = starpu_task_insert(&cl_check_scale,
+			STARPU_RW, handle,
+			STARPU_VALUE, &start, sizeof(start),
+			STARPU_VALUE, &factor, sizeof(factor),
+			0);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+	factor = 8;
+
+	/*
+	 * Unregister data from StarPU and shutdown.
+	 */
+	starpu_data_partition_clean(handle, PARTS, vert_handle);
+	starpu_data_partition_clean(handle, PARTS, horiz_handle);
+	starpu_data_unregister(handle);
+	starpu_shutdown();
+
+	return ret;
+
+enodev:
+	starpu_shutdown();
+	return 77;
+}

+ 15 - 8
examples/filters/fmultiple_submit_readonly.c

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
  * Copyright (C) 2017                                     CNRS
  * Copyright (C) 2017                                     CNRS
- * Copyright (C) 2015                                     Université de Bordeaux
+ * Copyright (C) 2015,2017                                Université de Bordeaux
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -17,7 +17,8 @@
 
 
 /*
 /*
  * This examplifies how to access the same matrix with different partitioned
  * This examplifies how to access the same matrix with different partitioned
- * views, doing the coherency through partition planning.
+ * views, doing the coherency through partition planning, but without having to
+ * explicitly submit partitioning/unpartitioning.
  *
  *
  * We first run a kernel on the whole matrix to fill it, then check the value
  * We first run a kernel on the whole matrix to fill it, then check the value
  * in parallel from the whole handle, from the horizontal slices, and from the
  * in parallel from the whole handle, from the horizontal slices, and from the
@@ -27,6 +28,8 @@
  * horizontal slices to check and scale them. Then we check again from the
  * horizontal slices to check and scale them. Then we check again from the
  * whole handle, the horizontal slices, and the vertical slices. Eventually we
  * whole handle, the horizontal slices, and the vertical slices. Eventually we
  * switch back to the whole matrix to check and scale it.
  * switch back to the whole matrix to check and scale it.
+ *
+ * Please keep this in sync with fmultiple_submit_implicit.c
  */
  */
 
 
 #include <starpu.h>
 #include <starpu.h>
@@ -95,11 +98,9 @@ struct starpu_codelet cl_check_scale =
 #ifdef STARPU_USE_CUDA
 #ifdef STARPU_USE_CUDA
 	.cuda_funcs = {fmultiple_check_scale_cuda},
 	.cuda_funcs = {fmultiple_check_scale_cuda},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
-#else
-	/* Only enable it on CPUs if we don't have a CUDA device, to force remote execution on the CUDA device */
+#endif
 	.cpu_funcs = {fmultiple_check_scale},
 	.cpu_funcs = {fmultiple_check_scale},
 	.cpu_funcs_name = {"fmultiple_check_scale"},
 	.cpu_funcs_name = {"fmultiple_check_scale"},
-#endif
 	.nbuffers = 1,
 	.nbuffers = 1,
 	.modes = {STARPU_RW},
 	.modes = {STARPU_RW},
 	.name = "fmultiple_check_scale"
 	.name = "fmultiple_check_scale"
@@ -135,11 +136,9 @@ struct starpu_codelet cl_check =
 #ifdef STARPU_USE_CUDA
 #ifdef STARPU_USE_CUDA
 	.cuda_funcs = {fmultiple_check_cuda},
 	.cuda_funcs = {fmultiple_check_cuda},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
-#else
-	/* Only enable it on CPUs if we don't have a CUDA device, to force remote execution on the CUDA device */
+#endif
 	.cpu_funcs = {fmultiple_check},
 	.cpu_funcs = {fmultiple_check},
 	.cpu_funcs_name = {"fmultiple_check"},
 	.cpu_funcs_name = {"fmultiple_check"},
-#endif
 	.nbuffers = 1,
 	.nbuffers = 1,
 	.modes = {STARPU_R},
 	.modes = {STARPU_R},
 	.name = "fmultiple_check"
 	.name = "fmultiple_check"
@@ -165,6 +164,14 @@ int main(void)
 		return 77;
 		return 77;
 	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
 	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
 
 
+	/* Disable codelet on CPUs if we have a CUDA device, to force remote execution on the CUDA device */
+	if (starpu_cuda_worker_get_count()) {
+		cl_check_scale.cpu_funcs[0] = NULL;
+		cl_check_scale.cpu_funcs_name[0] = NULL;
+		cl_check.cpu_funcs[0] = NULL;
+		cl_check.cpu_funcs_name[0] = NULL;
+	}
+
 	/* Declare the whole matrix to StarPU */
 	/* Declare the whole matrix to StarPU */
 	starpu_matrix_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)matrix, NX, NX, NY, sizeof(matrix[0][0]));
 	starpu_matrix_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)matrix, NX, NX, NY, sizeof(matrix[0][0]));
 
 

+ 1 - 0
include/starpu_data_filters.h

@@ -49,6 +49,7 @@ void starpu_data_partition_submit(starpu_data_handle_t initial_handle, unsigned
 void starpu_data_partition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children);
 void starpu_data_partition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children);
 void starpu_data_partition_readwrite_upgrade_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children);
 void starpu_data_partition_readwrite_upgrade_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children);
 void starpu_data_unpartition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node);
 void starpu_data_unpartition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node);
+void starpu_data_unpartition_submit_r(starpu_data_handle_t initial_handle, int gathering_node);
 void starpu_data_unpartition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node);
 void starpu_data_unpartition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gathering_node);
 void starpu_data_partition_clean(starpu_data_handle_t root_data, unsigned nparts, starpu_data_handle_t *children);
 void starpu_data_partition_clean(starpu_data_handle_t root_data, unsigned nparts, starpu_data_handle_t *children);
 
 

+ 1 - 0
include/starpu_task.h

@@ -47,6 +47,7 @@ extern "C"
 
 
 #define STARPU_CODELET_SIMGRID_EXECUTE	(1<<0)
 #define STARPU_CODELET_SIMGRID_EXECUTE	(1<<0)
 #define STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT	(1<<1)
 #define STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT	(1<<1)
+#define STARPU_CODELET_NOPLANS	(1<<2)
 #define STARPU_CUDA_ASYNC	(1<<0)
 #define STARPU_CUDA_ASYNC	(1<<0)
 #define STARPU_OPENCL_ASYNC	(1<<0)
 #define STARPU_OPENCL_ASYNC	(1<<0)
 
 

+ 9 - 0
src/core/task.c

@@ -584,6 +584,7 @@ static int _starpu_task_submit_head(struct starpu_task *task)
 		for (i = 0; i < nbuffers; i++)
 		for (i = 0; i < nbuffers; i++)
 		{
 		{
 			starpu_data_handle_t handle = STARPU_TASK_GET_HANDLE(task, i);
 			starpu_data_handle_t handle = STARPU_TASK_GET_HANDLE(task, i);
+			enum starpu_data_access_mode mode = STARPU_TASK_GET_MODE(task, i);
 			/* Make sure handles are valid */
 			/* Make sure handles are valid */
 			STARPU_ASSERT_MSG(handle->magic == _STARPU_TASK_MAGIC, "data %p is invalid (was it already unregistered?)", handle);
 			STARPU_ASSERT_MSG(handle->magic == _STARPU_TASK_MAGIC, "data %p is invalid (was it already unregistered?)", handle);
 			/* Make sure handles are not partitioned */
 			/* Make sure handles are not partitioned */
@@ -592,6 +593,14 @@ static int _starpu_task_submit_head(struct starpu_task *task)
 			 * for can_execute hooks */
 			 * for can_execute hooks */
 			if (handle->home_node != -1)
 			if (handle->home_node != -1)
 				_STARPU_TASK_SET_INTERFACE(task, starpu_data_get_interface_on_node(handle, handle->home_node), i);
 				_STARPU_TASK_SET_INTERFACE(task, starpu_data_get_interface_on_node(handle, handle->home_node), i);
+			if (!(task->cl->flags & STARPU_CODELET_NOPLANS) &&
+			    ((handle->nplans && !handle->nchildren) || handle->siblings))
+				/* This handle is involved with asynchronous
+				 * partitioning as a parent or a child, make
+				 * sure the right plan is active, submit
+				 * appropiate partitioning / unpartitioning if
+				 * not */
+				_starpu_data_partition_access_submit(handle, (mode & STARPU_W) != 0);
 		}
 		}
 
 
 		/* Check the type of worker(s) required by the task exist */
 		/* Check the type of worker(s) required by the task exist */

+ 12 - 1
src/datawizard/coherency.h

@@ -144,9 +144,16 @@ struct _starpu_data_state
 	/* In case we user filters, the handle may describe a sub-data */
 	/* In case we user filters, the handle may describe a sub-data */
 	struct _starpu_data_state *root_handle; /* root of the tree */
 	struct _starpu_data_state *root_handle; /* root of the tree */
 	struct _starpu_data_state *father_handle; /* father of the node, NULL if the current node is the root */
 	struct _starpu_data_state *father_handle; /* father of the node, NULL if the current node is the root */
+	starpu_data_handle_t *active_children; /* The currently active set of read-write children */
+	starpu_data_handle_t **active_readonly_children; /* The currently active set of read-only children */
+	unsigned nactive_readonly_children; /* Size of active_readonly_children array */
+	/* Our siblings in the father partitioning */
+	unsigned nsiblings; /* How many siblings */
+	starpu_data_handle_t *siblings;
 	unsigned sibling_index; /* indicate which child this node is from the father's perpsective (if any) */
 	unsigned sibling_index; /* indicate which child this node is from the father's perpsective (if any) */
 	unsigned depth; /* what's the depth of the tree ? */
 	unsigned depth; /* what's the depth of the tree ? */
 
 
+	/* Synchronous partitioning */
 	starpu_data_handle_t children;
 	starpu_data_handle_t children;
 	unsigned nchildren;
 	unsigned nchildren;
 	/* How many partition plans this handle has */
 	/* How many partition plans this handle has */
@@ -163,7 +170,11 @@ struct _starpu_data_state
 	 */
 	 */
 	unsigned partitioned;
 	unsigned partitioned;
 	/* Whether a partition plan is currently submitted in readonly mode */
 	/* Whether a partition plan is currently submitted in readonly mode */
-	unsigned readonly;
+	unsigned readonly:1;
+
+	/* Whether our father is currently partitioned into ourself */
+	unsigned active:1;
+	unsigned active_ro:1;
 
 
 	/* describe the state of the data in term of coherency */
 	/* describe the state of the data in term of coherency */
 	struct _starpu_data_replicate per_node[STARPU_MAXNODES];
 	struct _starpu_data_replicate per_node[STARPU_MAXNODES];

+ 181 - 5
src/datawizard/filters.c

@@ -19,6 +19,8 @@
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
  */
 
 
+#define STARPU_VERBOSE
+
 #include <datawizard/filters.h>
 #include <datawizard/filters.h>
 #include <datawizard/footprint.h>
 #include <datawizard/footprint.h>
 #include <datawizard/interfaces/data_interface.h>
 #include <datawizard/interfaces/data_interface.h>
@@ -234,9 +236,19 @@ static void _starpu_data_partition(starpu_data_handle_t initial_handle, starpu_d
 		child->switch_cl = NULL;
 		child->switch_cl = NULL;
 		child->partitioned = 0;
 		child->partitioned = 0;
 		child->readonly = 0;
 		child->readonly = 0;
+		child->active = inherit_state;
+		child->active_ro = 0;
                 child->mpi_data = initial_handle->mpi_data;
                 child->mpi_data = initial_handle->mpi_data;
 		child->root_handle = initial_handle->root_handle;
 		child->root_handle = initial_handle->root_handle;
 		child->father_handle = initial_handle;
 		child->father_handle = initial_handle;
+		child->active_children = NULL;
+		child->active_readonly_children = NULL;
+		child->nactive_readonly_children = 0;
+		child->nsiblings = nparts;
+		if (inherit_state)
+			child->siblings = NULL;
+		else
+			child->siblings = childrenp;
 		child->sibling_index = i;
 		child->sibling_index = i;
 		child->depth = initial_handle->depth + 1;
 		child->depth = initial_handle->depth + 1;
 
 
@@ -586,6 +598,7 @@ void starpu_data_partition_plan(starpu_data_handle_t initial_handle, struct star
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	struct starpu_codelet *cl = initial_handle->switch_cl;
 	struct starpu_codelet *cl = initial_handle->switch_cl;
 	int home_node = initial_handle->home_node;
 	int home_node = initial_handle->home_node;
+	starpu_data_handle_t *children;
 	if (home_node == -1)
 	if (home_node == -1)
 		/* Nothing better for now */
 		/* Nothing better for now */
 		/* TODO: pass -1, and make _starpu_fetch_nowhere_task_input
 		/* TODO: pass -1, and make _starpu_fetch_nowhere_task_input
@@ -594,11 +607,13 @@ void starpu_data_partition_plan(starpu_data_handle_t initial_handle, struct star
 		 */
 		 */
 		home_node = STARPU_MAIN_RAM;
 		home_node = STARPU_MAIN_RAM;
 
 
+	_STARPU_MALLOC(children, nparts * sizeof(*children));
 	for (i = 0; i < nparts; i++)
 	for (i = 0; i < nparts; i++)
 	{
 	{
-		_STARPU_CALLOC(childrenp[i], 1, sizeof(struct _starpu_data_state));
+		_STARPU_CALLOC(children[i], 1, sizeof(struct _starpu_data_state));
+		childrenp[i] = children[i];
 	}
 	}
-	_starpu_data_partition(initial_handle, childrenp, nparts, f, 0);
+	_starpu_data_partition(initial_handle, children, nparts, f, 0);
 
 
 	if (!cl)
 	if (!cl)
 	{
 	{
@@ -607,6 +622,7 @@ void starpu_data_partition_plan(starpu_data_handle_t initial_handle, struct star
 		cl = initial_handle->switch_cl;
 		cl = initial_handle->switch_cl;
 		cl->where = STARPU_NOWHERE;
 		cl->where = STARPU_NOWHERE;
 		cl->nbuffers = STARPU_VARIABLE_NBUFFERS;
 		cl->nbuffers = STARPU_VARIABLE_NBUFFERS;
+		cl->flags = STARPU_CODELET_NOPLANS;
 		cl->name = "data_partition_switch";
 		cl->name = "data_partition_switch";
 		cl->specific_nodes = 1;
 		cl->specific_nodes = 1;
 	}
 	}
@@ -624,6 +640,15 @@ void starpu_data_partition_clean(starpu_data_handle_t root_handle, unsigned npar
 {
 {
 	unsigned i;
 	unsigned i;
 
 
+	if (children[0]->active) {
+#ifdef STARPU_DEVEL
+#warning FIXME: better choose gathering node
+#endif
+		starpu_data_unpartition_submit(root_handle, nparts, children, STARPU_MAIN_RAM);
+	}
+
+	free(children[0]->siblings);
+
 	for (i = 0; i < nparts; i++)
 	for (i = 0; i < nparts; i++)
 		starpu_data_unregister_submit(children[i]);
 		starpu_data_unregister_submit(children[i]);
 
 
@@ -634,17 +659,26 @@ void starpu_data_partition_clean(starpu_data_handle_t root_handle, unsigned npar
 
 
 void starpu_data_partition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
 void starpu_data_partition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
 {
 {
+	unsigned i;
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	_starpu_spin_lock(&initial_handle->header_lock);
 	_starpu_spin_lock(&initial_handle->header_lock);
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 0, "One can't submit several partition plannings at the same time");
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 0, "One can't submit several partition plannings at the same time");
 	STARPU_ASSERT_MSG(initial_handle->readonly == 0, "One can't submit a partition planning while a readonly partitioning is active");
 	STARPU_ASSERT_MSG(initial_handle->readonly == 0, "One can't submit a partition planning while a readonly partitioning is active");
 	initial_handle->partitioned++;
 	initial_handle->partitioned++;
+	initial_handle->active_children = children[0]->siblings;
 	_starpu_spin_unlock(&initial_handle->header_lock);
 	_starpu_spin_unlock(&initial_handle->header_lock);
 
 
+	for (i = 0; i < nparts; i++)
+	{
+		_starpu_spin_lock(&children[i]->header_lock);
+		children[i]->active = 1;
+		_starpu_spin_unlock(&children[i]->header_lock);
+	}
+
 	if (!initial_handle->initialized)
 	if (!initial_handle->initialized)
 		/* No need for coherency, it is not initialized */
 		/* No need for coherency, it is not initialized */
 		return;
 		return;
-	unsigned i;
+
 	struct starpu_data_descr descr[nparts];
 	struct starpu_data_descr descr[nparts];
 	for (i = 0; i < nparts; i++)
 	for (i = 0; i < nparts; i++)
 	{
 	{
@@ -659,15 +693,27 @@ void starpu_data_partition_submit(starpu_data_handle_t initial_handle, unsigned
 
 
 void starpu_data_partition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
 void starpu_data_partition_readonly_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children)
 {
 {
+	unsigned i;
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	_starpu_spin_lock(&initial_handle->header_lock);
 	_starpu_spin_lock(&initial_handle->header_lock);
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 0 || initial_handle->readonly, "One can't submit a readonly partition planning at the same time as a readwrite partition planning");
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 0 || initial_handle->readonly, "One can't submit a readonly partition planning at the same time as a readwrite partition planning");
 	initial_handle->partitioned++;
 	initial_handle->partitioned++;
 	initial_handle->readonly = 1;
 	initial_handle->readonly = 1;
+	if (initial_handle->nactive_readonly_children < initial_handle->partitioned) {
+		_STARPU_REALLOC(initial_handle->active_readonly_children, initial_handle->partitioned * sizeof(initial_handle->active_readonly_children));
+		initial_handle->nactive_readonly_children = initial_handle->partitioned;
+	}
+	initial_handle->active_readonly_children[initial_handle->partitioned-1] = children[0]->siblings;
 	_starpu_spin_unlock(&initial_handle->header_lock);
 	_starpu_spin_unlock(&initial_handle->header_lock);
 
 
+	for (i = 0; i < nparts; i++)
+	{
+		_starpu_spin_lock(&children[i]->header_lock);
+		children[i]->active = 1;
+		_starpu_spin_unlock(&children[i]->header_lock);
+	}
+
 	STARPU_ASSERT_MSG(initial_handle->initialized, "It is odd to read-only-partition a data which does not have a value yet");
 	STARPU_ASSERT_MSG(initial_handle->initialized, "It is odd to read-only-partition a data which does not have a value yet");
-	unsigned i;
 	struct starpu_data_descr descr[nparts];
 	struct starpu_data_descr descr[nparts];
 	for (i = 0; i < nparts; i++)
 	for (i = 0; i < nparts; i++)
 	{
 	{
@@ -686,6 +732,8 @@ void starpu_data_partition_readwrite_upgrade_submit(starpu_data_handle_t initial
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 1, "One can't upgrade a readonly partition planning to readwrite while other readonly partition plannings are active");
 	STARPU_ASSERT_MSG(initial_handle->partitioned == 1, "One can't upgrade a readonly partition planning to readwrite while other readonly partition plannings are active");
 	STARPU_ASSERT_MSG(initial_handle->readonly == 1, "One can only upgrade a readonly partition planning");
 	STARPU_ASSERT_MSG(initial_handle->readonly == 1, "One can only upgrade a readonly partition planning");
 	initial_handle->readonly = 0;
 	initial_handle->readonly = 0;
+	initial_handle->active_children = initial_handle->active_readonly_children[0];
+	initial_handle->active_readonly_children[0] = NULL;
 	_starpu_spin_unlock(&initial_handle->header_lock);
 	_starpu_spin_unlock(&initial_handle->header_lock);
 
 
 	unsigned i;
 	unsigned i;
@@ -703,16 +751,37 @@ void starpu_data_partition_readwrite_upgrade_submit(starpu_data_handle_t initial
 
 
 void starpu_data_unpartition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gather_node)
 void starpu_data_unpartition_submit(starpu_data_handle_t initial_handle, unsigned nparts, starpu_data_handle_t *children, int gather_node)
 {
 {
+	unsigned i;
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	STARPU_ASSERT_MSG(initial_handle->sequential_consistency, "partition planning is currently only supported for data with sequential consistency");
 	STARPU_ASSERT_MSG(gather_node == initial_handle->home_node || gather_node == -1, "gathering node different from home node is currently not supported");
 	STARPU_ASSERT_MSG(gather_node == initial_handle->home_node || gather_node == -1, "gathering node different from home node is currently not supported");
 	_starpu_spin_lock(&initial_handle->header_lock);
 	_starpu_spin_lock(&initial_handle->header_lock);
 	STARPU_ASSERT_MSG(initial_handle->partitioned >= 1, "No partition planning is active for this handle");
 	STARPU_ASSERT_MSG(initial_handle->partitioned >= 1, "No partition planning is active for this handle");
+	if (initial_handle->readonly) {
+		/* Replace this children set with the last set in the list of readonly children sets */
+		for (i = 0; i < initial_handle->partitioned-1; i++) {
+			if (initial_handle->active_readonly_children[i] == children[0]->siblings) {
+				initial_handle->active_readonly_children[i] = initial_handle->active_readonly_children[initial_handle->partitioned-1];
+				initial_handle->active_readonly_children[initial_handle->partitioned-1] = NULL;
+				break;
+			}
+		}
+	} else {
+		initial_handle->active_children = NULL;
+	}
 	initial_handle->partitioned--;
 	initial_handle->partitioned--;
 	if (!initial_handle->partitioned)
 	if (!initial_handle->partitioned)
 		initial_handle->readonly = 0;
 		initial_handle->readonly = 0;
+	initial_handle->active_children = NULL;
 	_starpu_spin_unlock(&initial_handle->header_lock);
 	_starpu_spin_unlock(&initial_handle->header_lock);
 
 
-	unsigned i, n;
+	for (i = 0; i < nparts; i++)
+	{
+		_starpu_spin_lock(&children[i]->header_lock);
+		children[i]->active = 0;
+		_starpu_spin_unlock(&children[i]->header_lock);
+	}
+
+	unsigned n;
 	struct starpu_data_descr descr[nparts];
 	struct starpu_data_descr descr[nparts];
 	for (i = 0, n = 0; i < nparts; i++)
 	for (i = 0, n = 0; i < nparts; i++)
 	{
 	{
@@ -755,6 +824,113 @@ void starpu_data_unpartition_readonly_submit(starpu_data_handle_t initial_handle
 	starpu_task_insert(initial_handle->switch_cl, STARPU_W, initial_handle, STARPU_DATA_MODE_ARRAY, descr, n, 0);
 	starpu_task_insert(initial_handle->switch_cl, STARPU_W, initial_handle, STARPU_DATA_MODE_ARRAY, descr, n, 0);
 }
 }
 
 
+/* Unpartition everything below ancestor */
+void starpu_data_unpartition_submit_r(starpu_data_handle_t ancestor, int gathering_node)
+{
+	unsigned i, j, nsiblings;
+	if (!ancestor->partitioned)
+		/* It's already unpartitioned */
+		return;
+	_STARPU_DEBUG("ancestor %p needs unpartitioning\n", ancestor);
+	if (ancestor->readonly)
+	{
+		/* Uh, has to go through all read-only partitions */
+		for (i = 0; i < ancestor->partitioned; i++) {
+			starpu_data_handle_t *children = ancestor->active_readonly_children[i];
+			_STARPU_DEBUG("unpartition readonly children %p\n", children);
+			nsiblings = children[0]->nsiblings;
+			for (j = 0; j < nsiblings; j++) {
+				/* Make sure our children are unpartitioned */
+				starpu_data_unpartition_submit_r(children[j], gathering_node);
+			}
+			/* And unpartition them */
+			starpu_data_unpartition_submit(ancestor, nsiblings, children, gathering_node);
+		}
+	}
+	else
+	{
+		_STARPU_DEBUG("unpartition children %p\n", ancestor->active_children);
+		/* Only one partition */
+		nsiblings = ancestor->active_children[0]->nsiblings;
+		for (i = 0; i < nsiblings; i++)
+			starpu_data_unpartition_submit_r(ancestor->active_children[i], gathering_node);
+		/* And unpartition ourself */
+		starpu_data_unpartition_submit(ancestor, nsiblings, ancestor->active_children, gathering_node);
+	}
+}
+
+/* Make ancestor partition itself properly for target */
+static void _starpu_data_partition_access_look_up(starpu_data_handle_t ancestor, starpu_data_handle_t target, int write)
+{
+	/* First make sure ancestor has proper state, if not, ask father */
+	if (!ancestor->active || (write && ancestor->active_ro))
+	{
+		/* (The root is always active-rw) */
+		STARPU_ASSERT(ancestor->father_handle);
+		_STARPU_DEBUG("ancestor %p is not ready: %s, asking father %p\n", ancestor, ancestor->active ? ancestor->active_ro ? "RO" : "RW" : "NONE", ancestor->father_handle);
+		_starpu_data_partition_access_look_up(ancestor->father_handle, ancestor, write);
+		_STARPU_DEBUG("ancestor %p is now ready\n", ancestor);
+	}
+	else
+		_STARPU_DEBUG("ancestor %p was ready\n", ancestor);
+
+	/* We shouldn't be called for nothing */
+	STARPU_ASSERT(!ancestor->partitioned || !target || ancestor->active_children != target->siblings || (ancestor->readonly && write));
+
+	/* Then unpartition ancestor if needed */
+	if (ancestor->partitioned &&
+			/* Not the right children, unpartition ourself */
+			((target && ancestor->active_children != target->siblings) ||
+			/* We are partitioned and we want to write or some child
+			 * is writing and we want to read, unpartition ourself*/
+			(!target && (write || !ancestor->readonly))))
+	{
+#ifdef STARPU_DEVEL
+#warning FIXME: better choose gathering node
+#endif
+		starpu_data_unpartition_submit_r(ancestor, STARPU_MAIN_RAM);
+	}
+
+	if (!target)
+	{
+		_STARPU_DEBUG("ancestor %p is done\n", ancestor);
+		/* No child target, nothing more to do actually.  */
+		return;
+	}
+
+	/* Then partition ancestor towards target */
+	if (ancestor->partitioned)
+	{
+		_STARPU_DEBUG("ancestor %p is already partitioned RO, turn RW\n", ancestor);
+		/* Already partitioned, normally it's already for the target */
+		STARPU_ASSERT(ancestor->active_children == target->siblings);
+		/* And we are here just because we haven't partitioned rw */
+		STARPU_ASSERT(ancestor->readonly && write);
+		/* So we just need to upgrade ro to rw */
+		starpu_data_partition_readwrite_upgrade_submit(ancestor, target->nsiblings, target->siblings);
+	}
+	else
+	{
+		/* Just need to partition properly for the child */
+		if (write)
+		{
+			_STARPU_DEBUG("partition ancestor %p RW\n", ancestor);
+			starpu_data_partition_submit(ancestor, target->nsiblings, target->siblings);
+		}
+		else
+		{
+			_STARPU_DEBUG("partition ancestor %p RO\n", ancestor);
+			starpu_data_partition_readonly_submit(ancestor, target->nsiblings, target->siblings);
+		}
+	}
+}
+
+void _starpu_data_partition_access_submit(starpu_data_handle_t target, int write)
+{
+	_STARPU_DEBUG("accessing %p %s\n", target, write ? "RW" : "RO");
+	_starpu_data_partition_access_look_up(target, NULL, write);
+}
+
 /*
 /*
  * Given an integer N, NPARTS the number of parts it must be divided in, ID the
  * Given an integer N, NPARTS the number of parts it must be divided in, ID the
  * part currently considered, determines the CHUNK_SIZE and the OFFSET, taking
  * part currently considered, determines the CHUNK_SIZE and the OFFSET, taking

+ 5 - 1
src/datawizard/filters.h

@@ -1,7 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
  * Copyright (C) 2012                                     Inria
  * Copyright (C) 2012                                     Inria
- * Copyright (C) 2008-2011,2014                           Université de Bordeaux
+ * Copyright (C) 2008-2011,2014,2017                      Université de Bordeaux
  * Copyright (C) 2010,2015                                CNRS
  * Copyright (C) 2010,2015                                CNRS
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
@@ -31,4 +31,8 @@ _starpu_filter_nparts_compute_chunk_size_and_offset(unsigned n, unsigned nparts,
 					     size_t elemsize, unsigned id,
 					     size_t elemsize, unsigned id,
 					     unsigned ld, unsigned *chunk_size,
 					     unsigned ld, unsigned *chunk_size,
 					     size_t *offset);
 					     size_t *offset);
+
+
+/* submit asynchronous unpartitioning / partitioning to make target active read-only or read-write */
+void _starpu_data_partition_access_submit(starpu_data_handle_t target, int write);
 #endif
 #endif

+ 7 - 0
src/datawizard/interfaces/data_interface.c

@@ -280,8 +280,15 @@ static void _starpu_register_new_data(starpu_data_handle_t handle,
 	handle->switch_cl = NULL;
 	handle->switch_cl = NULL;
 	handle->partitioned = 0;
 	handle->partitioned = 0;
 	handle->readonly = 0;
 	handle->readonly = 0;
+	handle->active = 1;
+	handle->active_ro = 0;
 	handle->root_handle = handle;
 	handle->root_handle = handle;
 	handle->father_handle = NULL;
 	handle->father_handle = NULL;
+	handle->active_children = NULL;
+	handle->active_readonly_children = NULL;
+	handle->nactive_readonly_children = 0;
+	handle->nsiblings = 0;
+	handle->siblings = NULL;
 	handle->sibling_index = 0; /* could be anything for the root */
 	handle->sibling_index = 0; /* could be anything for the root */
 	handle->depth = 1; /* the tree is just a node yet */
 	handle->depth = 1; /* the tree is just a node yet */
         handle->mpi_data = NULL; /* invalid until set */
         handle->mpi_data = NULL; /* invalid until set */

+ 1 - 0
tests/Makefile.am

@@ -321,6 +321,7 @@ myPROGRAMS +=				\
 	datawizard/test_arbiter			\
 	datawizard/test_arbiter			\
 	datawizard/invalidate_pending_requests	\
 	datawizard/invalidate_pending_requests	\
 	datawizard/temporary_partition		\
 	datawizard/temporary_partition		\
+	datawizard/temporary_partition_implicit	\
 	datawizard/redux_acquire		\
 	datawizard/redux_acquire		\
 	disk/disk_copy				\
 	disk/disk_copy				\
 	disk/disk_copy_unpack			\
 	disk/disk_copy_unpack			\

+ 2 - 2
tests/datawizard/temporary_partition.c

@@ -2,7 +2,7 @@
  *
  *
  * Copyright (C) 2012-2013                                Inria
  * Copyright (C) 2012-2013                                Inria
  * Copyright (C) 2010-2013,2015,2017                      CNRS
  * Copyright (C) 2010-2013,2015,2017                      CNRS
- * Copyright (C) 2010,2013-2014,2016                      Université de Bordeaux
+ * Copyright (C) 2010,2013-2014,2016-2017                 Université de Bordeaux
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -82,7 +82,7 @@ int main(void)
 	/* Invalidate one random piece we don't care coherency about */
 	/* Invalidate one random piece we don't care coherency about */
 	starpu_data_invalidate_submit(handles[NPARTS/2]);
 	starpu_data_invalidate_submit(handles[NPARTS/2]);
 
 
-	/* Join */
+	/* Clean */
 	starpu_data_unpartition_submit(handle, NPARTS, handles, -1);
 	starpu_data_unpartition_submit(handle, NPARTS, handles, -1);
 	starpu_data_partition_clean(handle, NPARTS, handles);
 	starpu_data_partition_clean(handle, NPARTS, handles);
 
 

+ 107 - 0
tests/datawizard/temporary_partition_implicit.c

@@ -0,0 +1,107 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2012-2013                                Inria
+ * Copyright (C) 2010-2013,2015,2017                      CNRS
+ * Copyright (C) 2010,2013-2014,2016-2017                 Université de Bordeaux
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+
+#include <starpu.h>
+#include "../helper.h"
+
+#define SIZE (1<<20)
+#define NPARTS 16
+
+/*
+ * Test asynchronous partitioning on a temporary data without submitting explit
+ * partitioning/unpartitioning.
+ */
+
+static void codelet(void *descr[], void *_args)
+{
+	(void)descr;
+	(void)_args;
+}
+
+static struct starpu_codelet clw =
+{
+	.where = STARPU_CPU,
+	.cpu_funcs = {codelet},
+	.nbuffers = 1,
+	.modes = {STARPU_W}
+};
+
+static struct starpu_codelet clr =
+{
+	.where = STARPU_CPU,
+	.cpu_funcs = {codelet},
+	.nbuffers = 1,
+	.modes = {STARPU_R}
+};
+
+int main(void)
+{
+	int ret;
+	starpu_data_handle_t handle, handles[NPARTS];
+	int i;
+
+	ret = starpu_init(NULL);
+	if (ret == -ENODEV) return STARPU_TEST_SKIPPED;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
+
+	starpu_vector_data_register(&handle, -1, 0, SIZE, sizeof(char));
+
+	/* Fork */
+	struct starpu_data_filter f =
+	{
+		.filter_func = starpu_vector_filter_block,
+		.nchildren = NPARTS
+	};
+	starpu_data_partition_plan(handle, &f, handles);
+
+	/* Process in parallel */
+	for (i = 0; i < NPARTS; i++)
+	{
+		ret = starpu_task_insert(&clw,
+					 STARPU_W, handles[i],
+					 0);
+		if (ret == -ENODEV) goto enodev;
+		STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_insert");
+	}
+
+	/* Invalidate one random piece we don't care coherency about */
+	starpu_data_invalidate_submit(handles[NPARTS/2]);
+
+	/* Clean */
+	starpu_data_unpartition_submit(handle, NPARTS, handles, -1);
+	starpu_data_partition_clean(handle, NPARTS, handles);
+
+	/* Read result */
+	starpu_task_insert(&clr, STARPU_R, handle, 0);
+
+	starpu_data_unregister(handle);
+
+	starpu_shutdown();
+
+	return 0;
+
+enodev:
+	starpu_data_unpartition_submit(handle, NPARTS, handles, -1);
+	starpu_data_partition_clean(handle, NPARTS, handles);
+	starpu_data_unregister(handle);
+	starpu_shutdown();
+	/* yes, we do not perform the computation but we did detect that no one
+	 * could perform the kernel, so this is not an error from StarPU */
+	fprintf(stderr, "WARNING: No one can execute this task\n");
+	return STARPU_TEST_SKIPPED;
+}