浏览代码

Merge branch 'master' of https://scm.gforge.inria.fr/anonscm/git/starpu/starpu

root 5 年之前
父节点
当前提交
879cdfc927
共有 100 个文件被更改,包括 2336 次插入2465 次删除
  1. 4 0
      ChangeLog
  2. 8 4
      Makefile.am
  3. 4 0
      STARPU-VERSION
  4. 77 34
      configure.ac
  5. 3 2
      contrib/ci.inria.fr/job-1-check.sh
  6. 4 0
      doc/doxygen/Makefile.am
  7. 11 4
      doc/doxygen/chapters/470_simgrid.doxy
  8. 32 0
      doc/doxygen/chapters/501_environment_variables.doxy
  9. 34 0
      doc/doxygen/dev/starpu_check_include.sh
  10. 4 6
      examples/Makefile.am
  11. 21 1
      examples/cholesky/cholesky.sh
  12. 10 3
      examples/common/blas.h
  13. 2 0
      examples/mlr/mlr.c
  14. 1 1
      examples/mult/sgemm.sh
  15. 14 4
      examples/mult/xgemm.c
  16. 2 4
      examples/stencil/Makefile.am
  17. 8 0
      examples/tag_example/tag_example.c
  18. 11 0
      include/starpu.h
  19. 246 15
      include/starpu_bitmap.h
  20. 3 3
      include/starpu_sched_component.h
  21. 3 1
      include/starpu_task.h
  22. 27 0
      julia/Makefile.am
  23. 127 0
      julia/Manifest.toml
  24. 3 0
      julia/StarPU.jl/Project.toml
  25. 53 0
      julia/README
  26. 0 8
      julia/StarPU.jl/Makefile
  27. 0 4
      julia/StarPU.jl/Manifest.toml
  28. 0 2
      julia/StarPU.jl/REQUIRE
  29. 0 1290
      julia/StarPU.jl/src/StarPU.jl
  30. 0 349
      julia/StarPU.jl/src/compiler/cuda.jl
  31. 0 13
      julia/StarPU.jl/src/compiler/include.jl
  32. 0 38
      julia/StarPU.jl/src/compiler/utils.jl
  33. 0 133
      julia/StarPU.jl/src/jlstarpu_data_handles.c
  34. 0 73
      julia/StarPU.jl/src/jlstarpu_task.h
  35. 0 208
      julia/StarPU.jl/src/jlstarpu_task_submit.c
  36. 0 67
      julia/StarPU.jl/src/jlstarpu_utils.h
  37. 139 0
      julia/examples/Makefile.am
  38. 110 0
      julia/examples/axpy/axpy.jl
  39. 19 0
      julia/examples/axpy/axpy.sh
  40. 1 0
      julia/black_scholes/black_scholes.c
  41. 15 2
      julia/black_scholes/black_scholes.jl
  42. 93 0
      julia/examples/callback/callback.c
  43. 76 0
      julia/examples/callback/callback.jl
  44. 19 0
      julia/examples/callback/callback.sh
  45. 32 0
      julia/examples/check_deps/check_deps.jl
  46. 20 0
      julia/examples/check_deps/check_deps.sh
  47. 104 0
      julia/examples/dependency/end_dep.jl
  48. 18 0
      julia/examples/dependency/end_dep.sh
  49. 122 0
      julia/examples/dependency/tag_dep.jl
  50. 18 0
      julia/examples/dependency/tag_dep.sh
  51. 88 0
      julia/examples/dependency/task_dep.jl
  52. 18 0
      julia/examples/dependency/task_dep.sh
  53. 47 0
      julia/examples/execute.sh.in
  54. 130 0
      julia/examples/gemm/gemm.jl
  55. 21 0
      julia/examples/gemm/gemm.sh
  56. 56 0
      julia/examples/gemm/gemm_native.jl
  57. 5 7
      julia/mandelbrot/Makefile
  58. 79 0
      julia/examples/mandelbrot/cpu_mandelbrot.c
  59. 8 10
      julia/StarPU.jl/src/jlstarpu_simple_functions.c
  60. 78 56
      julia/mandelbrot/mandelbrot.c
  61. 26 12
      julia/mandelbrot/mandelbrot.jl
  62. 21 0
      julia/examples/mandelbrot/mandelbrot.sh
  63. 22 5
      julia/mandelbrot/mandelbrot_native.jl
  64. 11 15
      julia/mult/Makefile
  65. 0 0
      julia/examples/mult/README
  66. 24 13
      julia/mult/cpu_mult.c
  67. 2 1
      julia/mult/gpu_mult.cu
  68. 50 59
      julia/mult/mult.c
  69. 35 18
      julia/mult/mult.jl
  70. 0 0
      julia/examples/mult/mult.plot
  71. 57 0
      julia/examples/mult/mult_native.jl
  72. 22 0
      julia/examples/mult/mult_starpu.sh
  73. 38 0
      julia/examples/mult/perf.sh
  74. 0 0
      julia/examples/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat
  75. 0 0
      julia/examples/mult/res/mult_gen_gcc9_1x4.dat
  76. 0 0
      julia/examples/mult/res/mult_gen_gcc9_4x1.dat
  77. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s100_4x1.dat
  78. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s50_4x1.dat
  79. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat
  80. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat
  81. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat
  82. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat
  83. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat
  84. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat
  85. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2.dat
  86. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat
  87. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat
  88. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat
  89. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_4x1.dat
  90. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat
  91. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat
  92. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s80_4x1.dat
  93. 0 0
      julia/examples/mult/res/mult_gen_icc_s72_2x1_b4x2.dat
  94. 0 0
      julia/examples/mult/res/mult_gen_icc_s72_4x4_b4x2.dat
  95. 0 0
      julia/examples/mult/res/mult_native.dat
  96. 0 0
      julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat
  97. 0 0
      julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat
  98. 0 0
      julia/examples/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat
  99. 0 0
      julia/examples/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat
  100. 0 0
      julia/mult/res/mult_nogen_icc_s72x2_2x2_b4x2.dat

+ 4 - 0
ChangeLog

@@ -18,6 +18,7 @@ StarPU 1.4.0 (git revision xxxx)
 ==============================================
 New features:
   * Fault tolerance support with starpu_task_ft_failed().
+  * Julia programming interface.
   * Add get_max_size method to data interfaces for applications using data with
     variable size to express their maximal potential size.
   * New offline tool to draw graph showing elapsed time between sent
@@ -52,6 +53,9 @@ Small features:
   * Add STARPU_LIMIT_CPU_NUMA_MEM environment variable.
   * Add STARPU_WORKERS_GETBIND environment variable.
   * Add STARPU_SCHED_SIMPLE_DECIDE_ALWAYS modular scheduler flag.
+  * And STARPU_LIMIT_BANDWIDTH environment variable.
+  * Add field starpu_conf::precedence_over_environment_variables to ignore
+    environment variables when parameters are set directly in starpu_conf
 
 StarPU 1.3.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
 ====================================================================

+ 8 - 4
Makefile.am

@@ -57,6 +57,10 @@ if STARPU_BUILD_SC_HYPERVISOR
 SUBDIRS += sc_hypervisor
 endif
 
+if STARPU_USE_JULIA
+SUBDIRS += julia
+endif
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = libstarpu.pc starpu-1.0.pc starpu-1.1.pc starpu-1.2.pc starpu-1.3.pc
 
@@ -159,28 +163,28 @@ DISTCLEANFILES = STARPU-REVISION
 recheck:
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i recheck || RET=1 ; \
+		$(MAKE) -C $$i recheck || RET=1 ; \
 	done ; \
 	exit $$RET
 
 showfailed:
 	@RET=0 ; \
 	for i in $(SUBDIRS) ; do \
-		make -s -C $$i showfailed || RET=1 ; \
+		$(MAKE) -s -C $$i showfailed || RET=1 ; \
 	done ; \
 	exit $$RET
 
 showcheck:
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i showcheck || RET=1 ; \
+		$(MAKE) -C $$i showcheck || RET=1 ; \
 	done ; \
 	exit $$RET
 
 showsuite:
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i showsuite || RET=1 ; \
+		$(MAKE) -C $$i showsuite || RET=1 ; \
 	done ; \
 	exit $$RET
 

+ 4 - 0
STARPU-VERSION

@@ -60,3 +60,7 @@ LIBSOCL_INTERFACE_AGE=0		# set to CURRENT - PREVIOUS interface
 LIBSTARPURM_INTERFACE_CURRENT=0	# increment upon ABI change
 LIBSTARPURM_INTERFACE_REVISION=0	# increment upon implementation change
 LIBSTARPURM_INTERFACE_AGE=0	# set to CURRENT - PREVIOUS interface
+
+LIBSTARPUJULIA_INTERFACE_CURRENT=0	# increment upon ABI change
+LIBSTARPUJULIA_INTERFACE_REVISION=0	# increment upon implementation change
+LIBSTARPUJULIA_INTERFACE_AGE=0		# set to CURRENT - PREVIOUS interface

+ 77 - 34
configure.ac

@@ -59,6 +59,9 @@ AC_SUBST([LIBSTARPURM_INTERFACE_AGE])
 AC_SUBST([LIBSOCL_INTERFACE_CURRENT])
 AC_SUBST([LIBSOCL_INTERFACE_REVISION])
 AC_SUBST([LIBSOCL_INTERFACE_AGE])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_CURRENT])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_REVISION])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_AGE])
 
 AC_CANONICAL_SYSTEM
 
@@ -88,11 +91,21 @@ AC_CHECK_PROGS(PROG_DATE,gdate date)
 dnl locate pkg-config
 PKG_PROG_PKG_CONFIG
 
+AC_ARG_ENABLE(simgrid, [AS_HELP_STRING([--enable-simgrid],
+			[Enable simulating execution in simgrid])],
+			enable_simgrid=$enableval, enable_simgrid=no)
+
 if test x$enable_perf_debug = xyes; then
     enable_shared=no
 fi
+
 default_enable_mpi_check=maybe
-default_enable_mpi=maybe
+
+if test x$enable_simgrid = xyes ; then
+	default_enable_mpi=no
+else
+	default_enable_mpi=maybe
+fi
 
 ###############################################################################
 #                                                                             #
@@ -135,9 +148,6 @@ AC_ARG_WITH(simgrid-lib-dir,
 		enable_simgrid=yes
 	], [simgrid_lib_dir=no])
 
-AC_ARG_ENABLE(simgrid, [AS_HELP_STRING([--enable-simgrid],
-			[Enable simulating execution in simgrid])],
-			enable_simgrid=$enableval, enable_simgrid=no)
 if test x$enable_simgrid = xyes ; then
    	if test -n "$SIMGRID_CFLAGS" ; then
 	   	CFLAGS="$SIMGRID_CFLAGS $CFLAGS"
@@ -189,8 +199,8 @@ if test x$enable_simgrid = xyes ; then
 	AC_CHECK_TYPES([smx_actor_t], [AC_DEFINE([STARPU_HAVE_SMX_ACTOR_T], [1], [Define to 1 if you have the smx_actor_t type.])], [], [[#include <simgrid/simix.h>]])
 
 	# Latest functions
-	AC_CHECK_FUNCS([MSG_process_attach sg_actor_attach sg_actor_init sg_actor_set_stacksize MSG_zone_get_hosts sg_zone_get_hosts MSG_process_self_name MSG_process_userdata_init sg_actor_data])
-	AC_CHECK_FUNCS([xbt_mutex_try_acquire smpi_process_set_user_data SMPI_thread_create sg_zone_get_by_name sg_link_name sg_host_route sg_host_self sg_host_list sg_host_speed simcall_process_create sg_config_continue_after_help])
+	AC_CHECK_FUNCS([MSG_process_attach sg_actor_attach sg_actor_init sg_actor_set_stacksize sg_actor_on_exit MSG_zone_get_hosts sg_zone_get_hosts MSG_process_self_name MSG_process_userdata_init sg_actor_data])
+	AC_CHECK_FUNCS([xbt_mutex_try_acquire smpi_process_set_user_data SMPI_thread_create sg_zone_get_by_name sg_link_name sg_link_bandwidth_set sg_host_route sg_host_self sg_host_list sg_host_speed simcall_process_create sg_config_continue_after_help])
 	AC_CHECK_FUNCS([simgrid_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_SIMGRID_INIT], [1], [Define to 1 if you have the `simgrid_init' function.])])
 	AC_CHECK_FUNCS([xbt_barrier_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_XBT_BARRIER_INIT], [1], [Define to 1 if you have the `xbt_barrier_init' function.])])
 	AC_CHECK_FUNCS([sg_actor_sleep_for sg_actor_self sg_actor_ref sg_host_get_properties sg_host_send_to sg_host_sendto sg_cfg_set_int sg_actor_self_execute sg_actor_execute simgrid_get_clock])
@@ -372,6 +382,30 @@ AC_MSG_CHECKING(whether mpicxx is available)
 AC_MSG_RESULT($mpicxx_path)
 AC_SUBST(MPICXX, $mpicxx_path)
 
+# Check if mpiexec is available
+if test x$enable_simgrid = xyes ; then
+    DEFAULT_MPIEXEC=smpirun
+    AC_ARG_WITH(smpirun, [AS_HELP_STRING([--with-smpirun[=<name of smpirun or path to smpirun>]], [Name or path of the smpirun helper])], [DEFAULT_MPIEXEC=$withval])
+else
+    DEFAULT_MPIEXEC=mpiexec
+    AC_ARG_WITH(mpiexec, [AS_HELP_STRING([--with-mpiexec=<name of mpiexec or path to mpiexec>], [Name or path of mpiexec])], [DEFAULT_MPIEXEC=$withval])
+fi
+
+case $DEFAULT_MPIEXEC in
+    /*) mpiexec_path="$DEFAULT_MPIEXEC" ;;
+    *)  AC_PATH_PROG(mpiexec_path, $DEFAULT_MPIEXEC, [no], [$MPIPATH])
+esac
+AC_MSG_CHECKING(whether mpiexec is available)
+AC_MSG_RESULT($mpiexec_path)
+
+# We test if MPIEXEC exists
+if test ! -x $mpiexec_path; then
+    AC_MSG_RESULT(The mpiexec script '$mpiexec_path' is not valid)
+    default_enable_mpi_check=no
+    mpiexec_path=""
+fi
+AC_SUBST(MPIEXEC,$mpiexec_path)
+
 ###############################################################################
 #                                                                             #
 #                                    MPI                                      #
@@ -504,32 +538,6 @@ if test x$enable_mpi = xno ; then
     running_mpi_check=no
 fi
 
-if test x$enable_mpi = xyes -a x$running_mpi_check = xyes ; then
-    # Check if mpiexec is available
-    if test x$enable_simgrid = xyes ; then
-	DEFAULT_MPIEXEC=smpirun
-        AC_ARG_WITH(smpirun, [AS_HELP_STRING([--with-smpirun[=<name of smpirun or path to smpirun>]], [Name or path of the smpirun helper])], [DEFAULT_MPIEXEC=$withval])
-    else
-	DEFAULT_MPIEXEC=mpiexec
-	AC_ARG_WITH(mpiexec, [AS_HELP_STRING([--with-mpiexec=<name of mpiexec or path to mpiexec>], [Name or path of mpiexec])], [DEFAULT_MPIEXEC=$withval])
-    fi
-
-    case $DEFAULT_MPIEXEC in
-	/*) mpiexec_path="$DEFAULT_MPIEXEC" ;;
-	*)  AC_PATH_PROG(mpiexec_path, $DEFAULT_MPIEXEC, [no], [$MPIPATH])
-    esac
-    AC_MSG_CHECKING(whether mpiexec is available)
-    AC_MSG_RESULT($mpiexec_path)
-
-    # We test if MPIEXEC exists
-    if test ! -x $mpiexec_path; then
-        AC_MSG_RESULT(The mpiexec script '$mpiexec_path' is not valid)
-        running_mpi_check=no
-        mpiexec_path=""
-    fi
-    AC_SUBST(MPIEXEC,$mpiexec_path)
-fi
-
 AM_CONDITIONAL(STARPU_MPI_CHECK, test x$running_mpi_check = xyes)
 AC_MSG_CHECKING(whether MPI tests should be run)
 AC_MSG_RESULT($running_mpi_check)
@@ -552,7 +560,7 @@ fi
 if test x$enable_mpi = xyes ; then
     if test x$enable_simgrid = xyes ; then
         if test x$enable_shared = xyes ; then
-	    AC_MSG_ERROR([MPI with simgrid can not work with shared libraries, if you need the MPI support, theb use --disable-shared to fix this, else disable MPI with --disable-mpi])
+	    AC_MSG_ERROR([MPI with simgrid can not work with shared libraries, if you need the MPI support, then use --disable-shared to fix this, else disable MPI with --disable-mpi])
         else
 	    CFLAGS="$CFLAGS -fPIC"
 	    CXXFLAGS="$CXXFLAGS -fPIC"
@@ -1273,7 +1281,9 @@ AC_MSG_CHECKING(whether CUDA should be used)
 AC_MSG_RESULT($enable_cuda)
 AC_SUBST(STARPU_USE_CUDA, $enable_cuda)
 AM_CONDITIONAL(STARPU_USE_CUDA, test x$enable_cuda = xyes)
+cc_or_nvcc=$CC
 if test x$enable_cuda = xyes; then
+   	cc_or_nvcc=$NVCC
 	AC_DEFINE(STARPU_USE_CUDA, [1], [CUDA support is activated])
 
 	# On Darwin, the libstdc++ dependency is not automatically added by nvcc
@@ -1361,6 +1371,8 @@ if test x$enable_cuda = xyes; then
 	LIBS="${SAVED_LIBS}"
 fi
 
+AC_SUBST(CC_OR_NVCC, $cc_or_nvcc)
+
 have_magma=no
 if test x$enable_cuda = xyes; then
 	PKG_CHECK_MODULES([MAGMA],  [magma], [
@@ -3408,6 +3420,27 @@ AM_CONDITIONAL(AVAILABLE_DOC, [test x$available_doc != xno])
 
 ###############################################################################
 #                                                                             #
+#                                Julia                                        #
+#                                                                             #
+###############################################################################
+AC_ARG_ENABLE(julia, [AS_HELP_STRING([--enable-julia],
+			[enable the Julia extension])],
+			enable_julia=$enableval, enable_julia=no)
+if test "$enable_julia" = "yes" ; then
+   # Check whether the julia compiler is available
+   AC_PATH_PROG(juliapath, julia)
+   AC_MSG_CHECKING(whether julia is available)
+   AC_MSG_RESULT($juliapath)
+   if test ! -x $julia_path ; then
+      AC_MSG_ERROR(Julia compiler '$julia_path' is not valid)
+      enable_julia=no
+   fi
+fi
+AM_CONDITIONAL([STARPU_USE_JULIA], [test "x$enable_julia" = "xyes"])
+AC_SUBST(JULIA, $juliapath)
+
+###############################################################################
+#                                                                             #
 #                                Final settings                               #
 #                                                                             #
 ###############################################################################
@@ -3486,6 +3519,10 @@ AC_CONFIG_COMMANDS([executable-scripts], [
   test -e tools/starpu_paje_state_stats.R || ln -sf $ac_abs_top_srcdir/tools/starpu_paje_state_stats.R tools/starpu_paje_state_stats.R
   test -e tools/starpu_trace_state_stats.py || ln -sf $ac_abs_top_srcdir/tools/starpu_trace_state_stats.py tools/starpu_trace_state_stats.py
   chmod +x tools/starpu_trace_state_stats.py
+  chmod +x julia/examples/execute.sh
+  for x in julia/examples/check_deps/check_deps.sh julia/examples/mult/mult_starpu.sh julia/examples/mult/perf.sh julia/examples/variable/variable.sh julia/examples/task_insert_color/task_insert_color.sh julia/examples/vector_scal/vector_scal.sh julia/examples/mandelbrot/mandelbrot.sh julia/examples/callback/callback.sh julia/examples/dependency/task_dep.sh julia/examples/dependency/tag_dep.sh julia/examples/dependency/end_dep.sh julia/examples/axpy/axpy.sh julia/examples/gemm/gemm.sh; do
+      test -e $x || mkdir -p $(dirname $x) && ln -sf $ac_abs_top_srcdir/$x $(dirname $x)
+  done
 ])
 
 # Create links to ICD files in build/socl/vendors directory. SOCL will use this
@@ -3512,7 +3549,6 @@ AC_OUTPUT([
 	Makefile
 	src/Makefile
 	tools/Makefile
-	tools/replay-mpi/Makefile
 	tools/starpu_env
 	tools/starpu_codelet_profile
 	tools/starpu_codelet_histo_profile
@@ -3563,6 +3599,7 @@ AC_OUTPUT([
 	mpi/src/Makefile
 	mpi/tests/Makefile
 	mpi/examples/Makefile
+	mpi/tools/Makefile
 	sc_hypervisor/Makefile
 	sc_hypervisor/src/Makefile
 	sc_hypervisor/examples/Makefile
@@ -3575,6 +3612,11 @@ AC_OUTPUT([
 	doc/doxygen_dev/doxygen_filter.sh
 	tools/msvc/starpu_var.bat
 	min-dgels/Makefile
+	julia/Makefile
+	julia/src/Makefile
+	julia/src/dynamic_compiler/Makefile
+	julia/examples/Makefile
+	julia/examples/execute.sh
 ])
 
 AC_MSG_NOTICE([
@@ -3627,6 +3669,7 @@ AC_MSG_NOTICE([
 	       Native fortran support:                        $enable_build_fortran
 	       Native MPI fortran support:                    $use_mpi_fort
 	       Support for multiple linear regression models: $support_mlr
+	       JULIA enabled:                                 $enable_julia
 ])
 
 if test "$build_socl" = "yes" -a "$run_socl_check" = "no" ; then

+ 3 - 2
contrib/ci.inria.fr/job-1-check.sh

@@ -63,12 +63,13 @@ fi
 export CC=gcc
 
 CONFIGURE_OPTIONS="--enable-debug --enable-verbose --enable-mpi-check --disable-build-doc"
+CONFIGURE_CHECK=""
 day=$(date +%u)
 if test $day -le 5
 then
     CONFIGURE_CHECK="--enable-quick-check"
-else
-    CONFIGURE_CHECK="--enable-long-check"
+#else
+    # we do a normal check, a long check takes too long on VM nodes
 fi
 ../configure $CONFIGURE_OPTIONS $CONFIGURE_CHECK  $STARPU_CONFIGURE_OPTIONS
 

+ 4 - 0
doc/doxygen/Makefile.am

@@ -200,7 +200,9 @@ dox_inputs = $(DOX_CONFIG) 				\
 	$(top_srcdir)/include/starpu_expert.h		\
 	$(top_srcdir)/include/starpu_fxt.h		\
 	$(top_srcdir)/include/starpu_hash.h		\
+	$(top_srcdir)/include/starpu_helper.h		\
 	$(top_srcdir)/include/starpu_mic.h		\
+	$(top_srcdir)/include/starpu_mpi_ms.h		\
 	$(top_srcdir)/include/starpu_mod.f90		\
 	$(top_srcdir)/include/starpu_opencl.h		\
 	$(top_srcdir)/include/starpu_openmp.h		\
@@ -227,6 +229,8 @@ dox_inputs = $(DOX_CONFIG) 				\
 	$(top_srcdir)/include/starpu_util.h		\
 	$(top_srcdir)/include/starpu_worker.h		\
 	$(top_srcdir)/include/fstarpu_mod.f90		\
+	$(top_srcdir)/include/schedulers/starpu_heteroprio.h	\
+	$(top_srcdir)/starpufft/include/starpufft.h 	\
 	$(top_srcdir)/mpi/include/starpu_mpi.h 		\
 	$(top_srcdir)/mpi/include/starpu_mpi_lb.h	\
 	$(top_srcdir)/mpi/include/fstarpu_mpi_mod.f90		\

+ 11 - 4
doc/doxygen/chapters/470_simgrid.doxy

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2020       Federal University of Rio Grande do Sul (UFRGS)
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -132,7 +133,10 @@ machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
 Simulation step on the desktop machine, by setting the environment
 variable \ref STARPU_HOSTNAME to the name of the actual machine, to
 make StarPU use the performance models of the simulated machine even
-on the desktop machine.
+on the desktop machine. To use multiple performance models in different ranks,
+in case of smpi executions in a heterogeneous platform, it is possible to use the
+option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
+\ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
 
 If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
 use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
@@ -171,9 +175,12 @@ $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mp
 \endverbatim
 
 Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
-list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
-clusters: for each MPI node it will just replicate the architecture referred by
-\ref STARPU_HOSTNAME.
+list of MPI nodes to be used. In homogeneous MPI clusters: for each MPI node it
+will just replicate the architecture referred by
+\ref STARPU_HOSTNAME. To use multiple performance models in different ranks,
+in case of a heterogeneous platform, it is possible to use the
+option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
+\ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
 
 \section SimulationDebuggingApplications Debugging Applications
 

+ 32 - 0
doc/doxygen/chapters/501_environment_variables.doxy

@@ -2,6 +2,7 @@
  *
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2016       Uppsala University
+ * Copyright (C) 2020       Federal University of Rio Grande do Sul (UFRGS)
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -866,6 +867,20 @@ a homogenenous cluster, it is possible to share the models between
 machines by setting <c>export STARPU_HOSTNAME=some_global_name</c>.
 </dd>
 
+<dt>STARPU_MPI_HOSTNAMES</dt>
+<dd>
+\anchor STARPU_MPI_HOSTNAMES
+\addindex __env__STARPU_MPI_HOSTNAMES
+Similar to \ref STARPU_HOSTNAME but to define multiple nodes on a
+heterogeneous cluster. The variable is a list of hostnames that will be assigned
+to each StarPU-MPI rank considering their position and the value of
+\ref starpu_mpi_world_rank on each rank. When running, for example, on a
+heterogeneous cluster, it is possible to set individual models for each machine
+by setting <c>export STARPU_MPI_HOSTNAMES="name0 name1 name2"</c>. Where rank 0
+will receive name0, rank1 will receive name1, and so on.
+This variable has precedence over \ref STARPU_HOSTNAME.
+</dd>
+
 <dt>STARPU_OPENCL_PROGRAM_DIR</dt>
 <dd>
 \anchor STARPU_OPENCL_PROGRAM_DIR
@@ -986,6 +1001,23 @@ NUMA nodes used by StarPU. Any \ref STARPU_LIMIT_CPU_NUMA_devid_MEM additionally
 specified will take over STARPU_LIMIT_CPU_NUMA_MEM.
 </dd>
 
+<dt>STARPU_LIMIT_BANDWIDTH</dt>
+<dd>
+\anchor STARPU_LIMIT_BANDWIDTH
+\addindex __env__STARPU_LIMIT_BANDWIDTH
+Specify the maximum available PCI bandwidth of the system in MB/s. This can only
+be effective with simgrid simulation. This allows to easily override the
+bandwidths stored in the platform file generated from measurements on the native
+system. This can be used e.g. for convenient
+
+Specify the maximum number of megabytes that should be available to the
+application on each NUMA node. This is the same as specifying that same amount
+with \ref STARPU_LIMIT_CPU_NUMA_devid_MEM for each NUMA node number. The total
+memory available to StarPU will thus be this amount multiplied by the number of
+NUMA nodes used by StarPU. Any \ref STARPU_LIMIT_CPU_NUMA_devid_MEM additionally
+specified will take over STARPU_LIMIT_BANDWIDTH.
+</dd>
+
 <dt>STARPU_MINIMUM_AVAILABLE_MEM</dt>
 <dd>
 \anchor STARPU_MINIMUM_AVAILABLE_MEM

+ 34 - 0
doc/doxygen/dev/starpu_check_include.sh

@@ -0,0 +1,34 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+dir=$(dirname $0)
+
+cd $dir/../../../
+for d in $(find . -name include -not -wholename "*/build/*")
+do
+    for f in $(find $d -name "*h")
+    do
+	for i in doxygen-config.cfg.in Makefile.am
+	do
+	    x=`grep $f $dir/../$i`
+	    if test -z "$x"
+	    then
+		echo $f missing in $i
+	    fi
+	done
+    done
+done

+ 4 - 6
examples/Makefile.am

@@ -158,11 +158,8 @@ SHELL_TESTS			+=	mult/sgemm.sh
 endif
 endif
 
-if STARPU_HAVE_WINDOWS
 check_PROGRAMS		=	$(STARPU_EXAMPLES)
-else
-check_PROGRAMS		=	$(LOADER) $(STARPU_EXAMPLES)
-endif
+noinst_PROGRAMS		=
 
 if !STARPU_HAVE_WINDOWS
 ## test loader program
@@ -171,6 +168,7 @@ LOADER			=	loader
 loader_CPPFLAGS 	=	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 LOADER_BIN		=	$(abs_top_builddir)/examples/$(LOADER)
 loader_SOURCES		=	../tests/loader.c
+noinst_PROGRAMS		+=	loader
 else
 LOADER			=
 LOADER_BIN		=	$(top_builddir)/examples/loader-cross.sh
@@ -1118,10 +1116,10 @@ endif
 # - link over source file to build our own object
 fortran90/starpu_mod.f90:
 	@$(MKDIR_P) $(dir $@)
-	$(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
+	$(V_ln) $(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
 native_fortran/fstarpu_mod.f90:
 	@$(MKDIR_P) $(dir $@)
-	$(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
+	$(V_ln) $(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
 
 if STARPU_HAVE_FC
 # Fortran90 example

+ 21 - 1
examples/cholesky/cholesky.sh

@@ -22,6 +22,26 @@ ROOT=${0%.sh}
 [ -n "$STARPU_HOSTNAME" ] || export STARPU_HOSTNAME=mirage
 unset MALLOC_PERTURB_
 
+INCR=2
+STOP=32
+
+if [ -n "$STARPU_SIMGRID" ]
+then
+	INCR=4
+	STOP=14
+	# These use the thread factory, and are thus much longer
+	if [ -n "$STARPU_QUICK_CHECK" ]
+	then
+		INCR=8
+		STOP=10
+	fi
+	if [ -n "$STARPU_LONG_CHECK" ]
+	then
+		INCR=4
+		STOP=32
+	fi
+fi
+
 (
 echo -n "#"
 for STARPU_SCHED in $STARPU_SCHEDS ; do
@@ -29,7 +49,7 @@ for STARPU_SCHED in $STARPU_SCHEDS ; do
 done
 echo
 
-for size in `seq 2 2 30` ; do
+for size in `seq 2 $INCR $STOP` ; do
 	echo -n "$((size * 960))"
 	for STARPU_SCHED in $STARPU_SCHEDS
 	do

+ 10 - 3
examples/common/blas.h

@@ -88,6 +88,14 @@ void STARPU_DPOTRF(const char*uplo, const int n, double *a, const int lda);
 
 #if defined(STARPU_GOTO) || defined(STARPU_OPENBLAS) || defined(STARPU_SYSTEM_BLAS) || defined(STARPU_MKL) || defined(STARPU_ARMPL)
 
+#ifdef _STARPU_F2C_COMPATIBILITY
+/* for compatibility with F2C, FLOATRET may not be a float but a double in GOTOBLAS */
+/* Don't know how to detect this automatically */
+#define _STARPU_FLOATRET double
+#else
+#define _STARPU_FLOATRET float
+#endif
+
 extern void sgemm_ (const char *transa, const char *transb, const int *m,
                    const int *n, const int *k, const float *alpha, 
                    const float *A, const int *lda, const float *B, 
@@ -118,7 +126,7 @@ extern void dtrsm_ (const char *side, const char *uplo, const char *transa,
                    const char *diag, const int *m, const int *n,
                    const double *alpha, const double *A, const int *lda,
                    double *B, const int *ldb);
-extern double sasum_ (const int *n, const float *x, const int *incx);
+extern _STARPU_FLOATRET sasum_ (const int *n, const float *x, const int *incx);
 extern double dasum_ (const int *n, const double *x, const int *incx);
 extern void sscal_ (const int *n, const float *alpha, float *x,
                    const int *incx);
@@ -150,8 +158,7 @@ extern void daxpy_(const int *n, const double *alpha, const double *X, const int
 		double *Y, const int *incy);
 extern int isamax_(const int *n, const float *X, const int *incX);
 extern int idamax_(const int *n, const double *X, const int *incX);
-/* for some reason, FLOATRET is not a float but a double in GOTOBLAS */
-extern double sdot_(const int *n, const float *x, const int *incx, const float *y, const int *incy);
+extern _STARPU_FLOATRET sdot_(const int *n, const float *x, const int *incx, const float *y, const int *incy);
 extern double ddot_(const int *n, const double *x, const int *incx, const double *y, const int *incy);
 extern void sswap_(const int *n, float *x, const int *incx, float *y, const int *incy);
 extern void dswap_(const int *n, double *x, const int *incx, double *y, const int *incy);

+ 2 - 0
examples/mlr/mlr.c

@@ -110,7 +110,9 @@ static struct starpu_perfmodel cl_model_init =
    template.
  */
 
+/* M^2 * N^1 * K^0 */
 static unsigned combi1 [3]		= {	2,	1,	0 };
+/* M^0 * N^3 * K^1 */
 static unsigned combi2 [3]		= {	0,	3,	1 };
 
 static unsigned *combinations[] = { combi1, combi2 };

+ 1 - 1
examples/mult/sgemm.sh

@@ -32,7 +32,7 @@ if [ -n "$STARPU_MIC_SINK_PROGRAM_PATH" ] ; then
 	[ -x "$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm" ] && STARPU_MIC_SINK_PROGRAM_NAME=$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm
 fi
 
-STARPU_SCHED=dmdas STARPU_FXT_PREFIX=$PREFIX/ $PREFIX/sgemm
+STARPU_SCHED=dmdas STARPU_FXT_PREFIX=$PREFIX/ $PREFIX/sgemm -check
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -x -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_recdump ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_recdump -o perfs.rec

+ 14 - 4
examples/mult/xgemm.c

@@ -66,7 +66,7 @@ static starpu_data_handle_t A_handle, B_handle, C_handle;
 #define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
 #define PRINTF(fmt, ...) do { if (!getenv("STARPU_SSILENT")) {printf(fmt, ## __VA_ARGS__); fflush(stdout); }} while(0)
 
-static void check_output(void)
+static int check_output(void)
 {
 	/* compute C = C - AB */
 	CPU_GEMM("N", "N", ydim, xdim, zdim, (TYPE)-1.0f, A, ydim, B, zdim, (TYPE)1.0f, C, ydim);
@@ -78,6 +78,7 @@ static void check_output(void)
 	if (err < xdim*ydim*0.001)
 	{
 		FPRINTF(stderr, "Results are OK\n");
+		return 0;
 	}
 	else
 	{
@@ -86,6 +87,7 @@ static void check_output(void)
 
 		FPRINTF(stderr, "There were errors ... err = %f\n", err);
 		FPRINTF(stderr, "Max error : %e\n", C[max]);
+		return 1;
 	}
 }
 
@@ -150,6 +152,11 @@ static void partition_mult_data(void)
 	starpu_data_partition(A_handle, &horiz);
 
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
+
+	unsigned x, y;
+	for (x = 0; x < nslicesx; x++)
+	for (y = 0; y < nslicesy; y++)
+		starpu_data_set_coordinates(starpu_data_get_sub_data(C_handle, 2, x, y), 2, x, y);
 }
 
 #ifdef STARPU_USE_CUDA
@@ -236,7 +243,7 @@ static struct starpu_codelet cl =
 #endif
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.nbuffers = 3,
-	.modes = {STARPU_R, STARPU_R, STARPU_RW},
+	.modes = {STARPU_R, STARPU_R, STARPU_W},
 	.model = &starpu_gemm_model
 };
 
@@ -334,7 +341,7 @@ static void parse_args(int argc, char **argv)
 		}
 		else
 		{
-			fprintf(stderr,"Unrecognized option %s", argv[i]);
+			fprintf(stderr,"Unrecognized option %s\n", argv[i]);
 			exit(EXIT_FAILURE);
 		}
 	}
@@ -398,6 +405,7 @@ int main(int argc, char **argv)
 				ret = starpu_task_submit(task);
 				if (ret == -ENODEV)
 				{
+				     check = 0;
 				     ret = 77;
 				     goto enodev;
 				}
@@ -448,8 +456,10 @@ enodev:
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(C_handle);
 
+#ifndef STARPU_SIMGRID
 	if (check)
-		check_output();
+		ret = check_output();
+#endif
 
 	starpu_free_flags(A, zdim*ydim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);
 	starpu_free_flags(B, xdim*zdim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);

+ 2 - 4
examples/stencil/Makefile.am

@@ -56,11 +56,8 @@ endif
 # What to install and what to check #
 #####################################
 
-if STARPU_HAVE_WINDOWS
 check_PROGRAMS	=	$(STARPU_EXAMPLES)
-else
-check_PROGRAMS	=	$(LOADER) $(STARPU_EXAMPLES)
-endif
+noinst_PROGRAMS	=
 
 if !STARPU_SIMGRID
 if USE_MPI
@@ -79,6 +76,7 @@ LOADER			=	loader
 loader_CPPFLAGS 	= 	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 LOADER_BIN		=	./$(LOADER)
 loader_SOURCES		=	../../tests/loader.c
+noinst_PROGRAMS		+=	loader
 else
 LOADER			=
 LOADER_BIN		=	$(top_builddir)/examples/stencil/loader-cross.sh

+ 8 - 0
examples/tag_example/tag_example.c

@@ -222,6 +222,14 @@ int main(int argc, char **argv)
 {
 	int ret;
 
+#ifdef STARPU_HAVE_HELGRIND_H
+	if (RUNNING_ON_VALGRIND) {
+		ni /= 2;
+		nj /= 2;
+		nk /= 2;
+	}
+#endif
+
 	ret = starpu_init(NULL);
 	if (ret == -ENODEV)
 		exit(77);

+ 11 - 0
include/starpu.h

@@ -126,6 +126,17 @@ struct starpu_conf
 	void (*sched_policy_init)(unsigned);
 
 	/**
+	   For all parameters specified in this structure that can
+	   also be set with environment variables, by default,
+	   StarPU chooses the value of the environment variable
+	   against the value set in starpu_conf. Setting the parameter
+	   starpu_conf::precedence_over_environment_variables to 1 allows to give precedence
+	   to the value set in the structure over the environment
+	   variable.
+	 */
+	int precedence_over_environment_variables;
+
+	/**
 	   Number of CPU cores that StarPU can use. This can also be
 	   specified with the environment variable \ref STARPU_NCPU.
 	   (default = -1)

+ 246 - 15
include/starpu_bitmap.h

@@ -18,6 +18,12 @@
 #ifndef __STARPU_BITMAP_H__
 #define __STARPU_BITMAP_H__
 
+#include <starpu_util.h>
+#include <starpu_config.h>
+
+#include <string.h>
+#include <stdlib.h>
+
 #ifdef __cplusplus
 extern "C"
 {
@@ -28,43 +34,268 @@ extern "C"
    @brief This is the interface for the bitmap utilities provided by StarPU.
    @{
  */
+#ifndef _STARPU_LONG_BIT
+#define _STARPU_LONG_BIT ((int)(sizeof(unsigned long) * 8))
+#endif
+
+#define _STARPU_BITMAP_SIZE ((STARPU_NMAXWORKERS - 1)/_STARPU_LONG_BIT) + 1
 
 /** create a empty starpu_bitmap */
-struct starpu_bitmap *starpu_bitmap_create(void) STARPU_ATTRIBUTE_MALLOC;
+static inline struct starpu_bitmap *starpu_bitmap_create(void) STARPU_ATTRIBUTE_MALLOC;
+/** zero a starpu_bitmap */
+static inline void starpu_bitmap_init(struct starpu_bitmap *b);
 /** free \p b */
-void starpu_bitmap_destroy(struct starpu_bitmap *b);
+static inline void starpu_bitmap_destroy(struct starpu_bitmap *b);
 
 /** set bit \p e in \p b */
-void starpu_bitmap_set(struct starpu_bitmap *b, int e);
+static inline void starpu_bitmap_set(struct starpu_bitmap *b, int e);
 /** unset bit \p e in \p b */
-void starpu_bitmap_unset(struct starpu_bitmap *b, int e);
+static inline void starpu_bitmap_unset(struct starpu_bitmap *b, int e);
 /** unset all bits in \p b */
-void starpu_bitmap_unset_all(struct starpu_bitmap *b);
+static inline void starpu_bitmap_unset_all(struct starpu_bitmap *b);
 
 /** return true iff bit \p e is set in \p b */
-int starpu_bitmap_get(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_get(struct starpu_bitmap *b, int e);
 /** Basically compute \c starpu_bitmap_unset_all(\p a) ; \p a = \p b & \p c; */
-void starpu_bitmap_unset_and(struct starpu_bitmap *a, struct starpu_bitmap *b, struct starpu_bitmap *c);
+static inline void starpu_bitmap_unset_and(struct starpu_bitmap *a, struct starpu_bitmap *b, struct starpu_bitmap *c);
 /** Basically compute \p a |= \p b */
-void starpu_bitmap_or(struct starpu_bitmap *a, struct starpu_bitmap *b);
+static inline void starpu_bitmap_or(struct starpu_bitmap *a, struct starpu_bitmap *b);
 /** return 1 iff \p e is set in \p b1 AND \p e is set in \p b2 */
-int starpu_bitmap_and_get(struct starpu_bitmap *b1, struct starpu_bitmap *b2, int e);
+static inline int starpu_bitmap_and_get(struct starpu_bitmap *b1, struct starpu_bitmap *b2, int e);
 /** return the number of set bits in \p b */
-int starpu_bitmap_cardinal(struct starpu_bitmap *b);
+static inline int starpu_bitmap_cardinal(struct starpu_bitmap *b);
 
 /** return the index of the first set bit of \p b, -1 if none */
-int starpu_bitmap_first(struct starpu_bitmap *b);
+static inline int starpu_bitmap_first(struct starpu_bitmap *b);
 /** return the position of the last set bit of \p b, -1 if none */
-int starpu_bitmap_last(struct starpu_bitmap *b);
+static inline int starpu_bitmap_last(struct starpu_bitmap *b);
 /** return the position of set bit right after \p e in \p b, -1 if none */
-int starpu_bitmap_next(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_next(struct starpu_bitmap *b, int e);
 /** todo */
-int starpu_bitmap_has_next(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_has_next(struct starpu_bitmap *b, int e);
 
 /** @} */
 
-#ifdef __cplusplus
+struct starpu_bitmap
+{
+	unsigned long bits[_STARPU_BITMAP_SIZE];
+	int cardinal;
+};
+
+#ifdef _STARPU_DEBUG_BITMAP
+static int _starpu_check_bitmap(struct starpu_bitmap *b)
+{
+	int card = b->cardinal;
+	int i = starpu_bitmap_first(b);
+	int j;
+	for(j = 0; j < card; j++)
+	{
+		if(i == -1)
+			return 0;
+		int tmp = starpu_bitmap_next(b,i);
+		if(tmp == i)
+			return 0;
+		i = tmp;
+	}
+	if(i != -1)
+		return 0;
+	return 1;
 }
+#else
+#define _starpu_check_bitmap(b) 1
 #endif
 
+static int _starpu_count_bit_static(unsigned long e)
+{
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4)
+	return __builtin_popcountl(e);
+#else
+	int c = 0;
+	while(e)
+	{
+		c += e&1;
+		e >>= 1;
+	}
+	return c;
 #endif
+}
+
+static inline struct starpu_bitmap *starpu_bitmap_create()
+{
+	return (struct starpu_bitmap *) calloc(1, sizeof(struct starpu_bitmap));
+}
+
+static inline void starpu_bitmap_init(struct starpu_bitmap *b)
+{
+	memset(b, 0, sizeof(*b));
+}
+
+static inline void starpu_bitmap_destroy(struct starpu_bitmap * b)
+{
+	free(b);
+}
+
+static inline void starpu_bitmap_set(struct starpu_bitmap * b, int e)
+{
+	if(!starpu_bitmap_get(b, e))
+		b->cardinal++;
+	else
+		return;
+	STARPU_ASSERT(e/_STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	b->bits[e/_STARPU_LONG_BIT] |= (1ul << (e%_STARPU_LONG_BIT));
+	STARPU_ASSERT(_starpu_check_bitmap(b));
+}
+static inline void starpu_bitmap_unset(struct starpu_bitmap *b, int e)
+{
+	if(starpu_bitmap_get(b, e))
+		b->cardinal--;
+	else
+		return;
+	STARPU_ASSERT(e/_STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	if(e / _STARPU_LONG_BIT > _STARPU_BITMAP_SIZE)
+		return;
+	b->bits[e/_STARPU_LONG_BIT] &= ~(1ul << (e%_STARPU_LONG_BIT));
+	STARPU_ASSERT(_starpu_check_bitmap(b));
+}
+
+static inline void starpu_bitmap_unset_all(struct starpu_bitmap * b)
+{
+	memset(b->bits, 0, _STARPU_BITMAP_SIZE * sizeof(unsigned long));
+}
+
+static inline void starpu_bitmap_unset_and(struct starpu_bitmap * a, struct starpu_bitmap * b, struct starpu_bitmap * c)
+{
+	a->cardinal = 0;
+	int i;
+	for(i = 0; i < _STARPU_BITMAP_SIZE; i++)
+	{
+		a->bits[i] = b->bits[i] & c->bits[i];
+		a->cardinal += _starpu_count_bit_static(a->bits[i]);
+	}
+}
+
+static inline int starpu_bitmap_get(struct starpu_bitmap * b, int e)
+{
+	STARPU_ASSERT(e / _STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	if(e / _STARPU_LONG_BIT >= _STARPU_BITMAP_SIZE)
+		return 0;
+	return (b->bits[e/_STARPU_LONG_BIT] & (1ul << (e%_STARPU_LONG_BIT))) ?
+		1:
+		0;
+}
+
+static inline void starpu_bitmap_or(struct starpu_bitmap * a, struct starpu_bitmap * b)
+{
+	int i;
+	a->cardinal = 0;
+	for(i = 0; i < _STARPU_BITMAP_SIZE; i++)
+	{
+		a->bits[i] |= b->bits[i];
+		a->cardinal += _starpu_count_bit_static(a->bits[i]);
+	}
+}
+
+
+static inline int starpu_bitmap_and_get(struct starpu_bitmap * b1, struct starpu_bitmap * b2, int e)
+{
+	return starpu_bitmap_get(b1,e) && starpu_bitmap_get(b2,e);
+}
+
+static inline int starpu_bitmap_cardinal(struct starpu_bitmap * b)
+{
+	return b->cardinal;
+}
+
+
+static inline int _starpu_get_first_bit_rank(unsigned long ms)
+{
+	STARPU_ASSERT(ms != 0);
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4))
+	return __builtin_ffsl(ms) - 1;
+#else
+	unsigned long m = 1ul;
+	int i = 0;
+	while(!(m&ms))
+		i++,m<<=1;
+	return i;
+#endif
+}
+
+static inline int _starpu_get_last_bit_rank(unsigned long l)
+{
+	STARPU_ASSERT(l != 0);
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4))
+	return 8*sizeof(l) - __builtin_clzl(l);
+#else
+	int ibit = _STARPU_LONG_BIT - 1;
+	while((!(1ul << ibit)) & l)
+		ibit--;
+	STARPU_ASSERT(ibit >= 0);
+	return ibit;
+#endif
+}
+
+static inline int starpu_bitmap_first(struct starpu_bitmap * b)
+{
+	int i = 0;
+	while(i < _STARPU_BITMAP_SIZE && !b->bits[i])
+		i++;
+	if( i == _STARPU_BITMAP_SIZE)
+		return -1;
+	int nb_long = i;
+	unsigned long ms = b->bits[i];
+
+	return (nb_long * _STARPU_LONG_BIT) + _starpu_get_first_bit_rank(ms);
+}
+
+static inline int starpu_bitmap_has_next(struct starpu_bitmap * b, int e)
+{
+	int nb_long = (e+1) / _STARPU_LONG_BIT;
+	int nb_bit = (e+1) % _STARPU_LONG_BIT;
+	unsigned long mask = (~0ul) << nb_bit;
+	if(b->bits[nb_long] & mask)
+		return 1;
+	for(nb_long++; nb_long < _STARPU_BITMAP_SIZE; nb_long++)
+		if(b->bits[nb_long])
+			return 1;
+	return 0;
+}
+
+static inline int starpu_bitmap_last(struct starpu_bitmap * b)
+{
+	if(b->cardinal == 0)
+		return -1;
+	int ilong;
+	for(ilong = _STARPU_BITMAP_SIZE - 1; ilong >= 0; ilong--)
+	{
+		if(b->bits[ilong])
+			break;
+	}
+	STARPU_ASSERT(ilong >= 0);
+	unsigned long l = b->bits[ilong];
+	return ilong * _STARPU_LONG_BIT + _starpu_get_last_bit_rank(l);
+}
+
+static inline int starpu_bitmap_next(struct starpu_bitmap *b, int e)
+{
+	int nb_long = e / _STARPU_LONG_BIT;
+	int nb_bit = e % _STARPU_LONG_BIT;
+	unsigned long rest = nb_bit == _STARPU_LONG_BIT - 1 ? 0 : (~0ul << (nb_bit + 1)) & b->bits[nb_long];
+	if(nb_bit != (_STARPU_LONG_BIT - 1) && rest)
+	{
+		int i = _starpu_get_first_bit_rank(rest);
+		STARPU_ASSERT(i >= 0 && i < _STARPU_LONG_BIT);
+		return (nb_long * _STARPU_LONG_BIT) + i;
+	}
+
+	for(nb_long++;nb_long < _STARPU_BITMAP_SIZE; nb_long++)
+		if(b->bits[nb_long])
+			return nb_long * _STARPU_LONG_BIT + _starpu_get_first_bit_rank(b->bits[nb_long]);
+	return -1;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __STARPU_BITMAP_H__ */

+ 3 - 3
include/starpu_sched_component.h

@@ -69,14 +69,14 @@ struct starpu_sched_component
 	/** The tree containing the component*/
 	struct starpu_sched_tree *tree;
 	/** set of underlying workers */
-	struct starpu_bitmap *workers;
+	struct starpu_bitmap workers;
 	/**
 	   subset of starpu_sched_component::workers that is currently available in the context
 	   The push method should take this value into account, it is set with:
 	   component->workers UNION tree->workers UNION
 	   component->child[i]->workers_in_ctx iff exist x such as component->children[i]->parents[x] == component
 	*/
-	struct starpu_bitmap *workers_in_ctx;
+	struct starpu_bitmap workers_in_ctx;
 	/** private data */
 	void *data;
 	char *name;
@@ -188,7 +188,7 @@ struct starpu_sched_tree
 	/**
 	   set of workers available in this context, this value is used to mask workers in modules
 	*/
-	struct starpu_bitmap *workers;
+	struct starpu_bitmap workers;
 	/**
 	   context id of the scheduler
 	*/

+ 3 - 1
include/starpu_task.h

@@ -538,7 +538,9 @@ struct starpu_codelet
 
 	/**
 	   Optional color of the codelet. This can be useful for
-	   debugging purposes.
+	   debugging purposes. Value 0 acts like if this field wasn't specified.
+	   Color representation is hex triplet (for example: 0xff0000 is red,
+	   0x0000ff is blue, 0xffa500 is orange, ...).
 	*/
 	unsigned color;
 

+ 27 - 0
julia/Makefile.am

@@ -0,0 +1,27 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+include $(top_srcdir)/starpu.mk
+
+SUBDIRS = src examples
+
+EXTRA_DIST = README
+
+recheck:
+	RET=0 ; \
+	for i in $(SUBDIRS) ; do \
+		make -C $$i recheck || RET=1 ; \
+	done ; \
+	exit $$RET

+ 127 - 0
julia/Manifest.toml

@@ -0,0 +1,127 @@
+# This file is machine-generated - editing it directly is not advised
+
+[[Base64]]
+uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
+
+[[CBinding]]
+deps = ["Libdl", "Random", "Test"]
+git-tree-sha1 = "6f457df38ae2ba239d5e43b80493bb907de826b2"
+repo-rev = "655e9862947d17423f2fb91ea1014e1cb73c1be1"
+repo-url = "https://github.com/analytech-solutions/CBinding.jl.git"
+uuid = "d43a6710-96b8-4a2d-833c-c424785e5374"
+version = "0.8.1"
+
+[[CEnum]]
+git-tree-sha1 = "62847acab40e6855a9b5905ccb99c2b5cf6b3ebb"
+uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
+version = "0.2.0"
+
+[[Clang]]
+deps = ["CEnum", "DataStructures", "LLVM_jll", "Libdl"]
+git-tree-sha1 = "45013227beea038ecc17e8c07cd7c7b05ed26067"
+repo-rev = "master"
+repo-url = "https://github.com/phuchant/Clang.jl.git"
+uuid = "40e3b903-d033-50b4-a0cc-940c62c95e31"
+version = "0.11.0"
+
+[[DataStructures]]
+deps = ["InteractiveUtils", "OrderedCollections"]
+git-tree-sha1 = "6166ecfaf2b8bbf2b68d791bc1d54501f345d314"
+uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
+version = "0.17.15"
+
+[[Dates]]
+deps = ["Printf"]
+uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
+
+[[Distributed]]
+deps = ["Random", "Serialization", "Sockets"]
+uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
+
+[[InteractiveUtils]]
+deps = ["Markdown"]
+uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
+
+[[LLVM_jll]]
+deps = ["Libdl", "Pkg"]
+git-tree-sha1 = "c037c15f36c185c613e5b2589d5833720dab3f76"
+uuid = "86de99a1-58d6-5da7-8064-bd56ce2e322c"
+version = "8.0.1+0"
+
+[[LibGit2]]
+deps = ["Printf"]
+uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
+
+[[Libdl]]
+uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
+
+[[LinearAlgebra]]
+deps = ["Libdl"]
+uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+
+[[Logging]]
+uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
+
+[[Markdown]]
+deps = ["Base64"]
+uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
+
+[[OrderedCollections]]
+git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
+uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
+version = "1.2.0"
+
+[[Pkg]]
+deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
+uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
+
+[[Printf]]
+deps = ["Unicode"]
+uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
+
+[[REPL]]
+deps = ["InteractiveUtils", "Markdown", "Sockets"]
+uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
+
+[[Random]]
+deps = ["Serialization"]
+uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+
+[[RecipesBase]]
+git-tree-sha1 = "54f8ceb165a0f6d083f0d12cb4996f5367c6edbc"
+uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
+version = "1.0.1"
+
+[[SHA]]
+uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
+
+[[Serialization]]
+uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
+
+[[Sockets]]
+uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
+
+[[SparseArrays]]
+deps = ["LinearAlgebra", "Random"]
+uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
+
+[[Statistics]]
+deps = ["LinearAlgebra", "SparseArrays"]
+uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
+
+[[Test]]
+deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
+uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+
+[[ThreadPools]]
+deps = ["Printf", "RecipesBase", "Statistics"]
+git-tree-sha1 = "48e35097fdc6d1706a9b90c5eee62f54402aa62c"
+uuid = "b189fb0b-2eb5-4ed4-bc0c-d34c51242431"
+version = "1.1.0"
+
+[[UUIDs]]
+deps = ["Random", "SHA"]
+uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
+
+[[Unicode]]
+uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

+ 3 - 0
julia/StarPU.jl/Project.toml

@@ -4,4 +4,7 @@ authors = ["barthou "]
 version = "0.1.0"
 
 [deps]
+CBinding = "d43a6710-96b8-4a2d-833c-c424785e5374"
+Clang = "40e3b903-d033-50b4-a0cc-940c62c95e31"
 Libdl = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
+ThreadPools = "b189fb0b-2eb5-4ed4-bc0c-d34c51242431"

+ 53 - 0
julia/README

@@ -0,0 +1,53 @@
+Contents
+========
+
+* Installing Julia
+* Installing StarPU module for Julia
+* Running Examples
+
+Installing Julia
+----------------
+Julia version 1.3+ is required and can be downloaded from
+https://julialang.org/downloads/.
+
+
+Installing StarPU module for Julia
+----------------------------------
+First, build the jlstarpu_c_wrapper library:
+
+$ make
+
+Then, you need to add the lib/ directory to your library path and the julia/
+directory to your Julia load path:
+
+$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/lib
+$ export JULIA_LOAD_PATH=$JULIA_LOAD_PATH:$PWD
+
+This step can also be done by sourcing the setenv.sh script:
+
+$ . setenv.sh
+
+Running Examples
+----------------
+
+You can find several examples in the examples/ directory.
+
+For each example X, three versions are provided:
+
+- X.c: Original C+starpu code
+- X_native.jl: Native Julia version (without StarPU)
+- X.jl: Julia version using StarPU
+
+
+To run the original C+StarPU code:
+$ make cstarpu.dat
+
+To run the native Julia version:
+$ make julia_native.dat
+
+To run the Julia version using StarPU:
+$ make julia_generatedc.dat
+
+
+
+

+ 0 - 8
julia/StarPU.jl/Makefile

@@ -1,8 +0,0 @@
-SRCS=src/jlstarpu_task_submit.c src/jlstarpu_simple_functions.c src/jlstarpu_data_handles.c
-CC = gcc
-CFLAGS += $(shell pkg-config --cflags starpu-1.3)
-LDFLAGS += $(shell pkg-config --libs starpu-1.3)
-
-lib/libjlstarpu_c_wrapper.so: ${SRCS}
-	test -d lib || mkdir lib
-	$(CC) -O3 -shared -fPIC $(CFLAGS) $^ -o $@ $(LDFLAGS)

+ 0 - 4
julia/StarPU.jl/Manifest.toml

@@ -1,4 +0,0 @@
-# This file is machine-generated - editing it directly is not advised
-
-[[Libdl]]
-uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

+ 0 - 2
julia/StarPU.jl/REQUIRE

@@ -1,2 +0,0 @@
-julia 1.0
-Libdl

文件差异内容过多而无法显示
+ 0 - 1290
julia/StarPU.jl/src/StarPU.jl


+ 0 - 349
julia/StarPU.jl/src/compiler/cuda.jl

@@ -1,349 +0,0 @@
-
-
-function is_indep_for_expr(x :: StarpuExpr)
-    return isa(x, StarpuExprFor) && x.is_independant
-end
-
-
-function extract_init_indep_finish(expr :: StarpuExpr) # TODO : it is not a correct extraction (example : if (cond) {@indep for ...} else {return} would not work)
-                                                            # better use apply() (NOTE :assert_no_indep_for already exists) to find recursively every for loops
-    init = StarpuExpr[]
-    finish = StarpuExpr[]
-
-    if is_indep_for_expr(expr)
-        return init, StarpuIndepFor(expr), finish
-    end
-
-    if !isa(expr, StarpuExprBlock)
-        return [expr], nothing, finish
-    end
-
-    for i in (1 : length(expr.exprs))
-
-        if !is_indep_for_expr(expr.exprs[i])
-            continue
-        end
-
-        init = expr.exprs[1 : i-1]
-        indep = StarpuIndepFor(expr.exprs[i])
-        finish = expr.exprs[i+1 : end]
-
-        if any(is_indep_for_expr, finish)
-            error("Sequence of several independant loops is not allowed") #same it may be tricked by a Block(Indep_for(...))
-        end
-
-        return init, indep, finish
-    end
-
-    return expr.exprs, nothing, finish
-end
-
-
-
-
-function analyse_variable_declarations(expr :: StarpuExpr, already_defined :: Vector{StarpuExprTypedVar} = StarpuExprTypedVar[])
-
-    undefined_variables = Symbol[]
-    defined_variable_names = map((x -> x.name), already_defined)
-    defined_variable_types = map((x -> x.typ), already_defined)
-
-    function func_to_apply(x :: StarpuExpr)
-
-        if isa(x, StarpuExprFunction)
-            error("No function declaration allowed in this section")
-        end
-
-        if isa(x, StarpuExprVar) || isa(x, StarpuExprTypedVar)
-
-            if !(x.name in defined_variable_names) && !(x.name in undefined_variables)
-                push!(undefined_variables, x.name)
-            end
-
-            return x
-        end
-
-        if isa(x, StarpuExprAffect) || isa(x, StarpuExprFor)
-
-            if isa(x, StarpuExprAffect)
-
-                var = x.var
-
-                if !isa(var, StarpuExprTypedVar)
-                    return x
-                end
-
-                name = var.name
-                typ = var.typ
-
-            else
-                name = x.iter
-                typ = Int64
-            end
-
-            if name in defined_variable_names
-                error("Multiple definition of variable $name")
-            end
-
-            filter!((sym -> sym != name), undefined_variables)
-            push!(defined_variable_names, name)
-            push!(defined_variable_types, typ)
-
-            return x
-        end
-
-        return x
-    end
-
-    apply(func_to_apply, expr)
-    defined_variable = map(StarpuExprTypedVar, defined_variable_names, defined_variable_types)
-
-    return defined_variable, undefined_variables
-end
-
-
-
-function find_variable(name :: Symbol, vars :: Vector{StarpuExprTypedVar})
-
-    for x in vars
-        if x.name == name
-            return x
-        end
-    end
-
-    return nothing
-end
-
-
-
-function add_device_to_interval_call(expr :: StarpuExpr)
-
-    function func_to_apply(x :: StarpuExpr)
-
-        if isa(x, StarpuExprCall) && x.func == :jlstarpu_interval_size
-            return StarpuExprCall(:jlstarpu_interval_size__device, x.args)
-        end
-
-        return x
-    end
-
-    return apply(func_to_apply, expr)
-end
-
-
-
-function transform_to_cuda_kernel(func :: StarpuExprFunction)
-
-    cpu_func = transform_to_cpu_kernel(func)
-
-    init, indep, finish = extract_init_indep_finish(cpu_func.body)
-
-    if indep == nothing
-        error("No independant for loop has been found") # TODO can fail because extraction is not correct yet
-    end
-
-    prekernel_instr, kernel_args, kernel_instr = analyse_sets(indep)
-
-    kernel_call = StarpuExprCudaCall(:cudaKernel, (@parse nblocks), (@parse THREADS_PER_BLOCK), StarpuExpr[])
-    prekernel_instr = vcat(init, prekernel_instr)
-    kernel_instr = vcat(kernel_instr, indep.body)
-
-    indep_for_def, indep_for_undef = analyse_variable_declarations(StarpuExprBlock(kernel_instr), kernel_args)
-    prekernel_def, prekernel_undef = analyse_variable_declarations(StarpuExprBlock(prekernel_instr), cpu_func.args)
-
-    for undef_var in indep_for_undef
-
-        found_var = find_variable(undef_var, prekernel_def)
-
-        if found_var == nothing # TODO : error then ?
-            continue
-        end
-
-        push!(kernel_args, found_var)
-    end
-
-    call_args = map((x -> StarpuExprVar(x.name)), kernel_args)
-    kernelname=Symbol("KERNEL_",func.func);
-    cuda_call = StarpuExprCudaCall(kernelname, (@parse nblocks), (@parse THREADS_PER_BLOCK), call_args)
-    push!(prekernel_instr, cuda_call)
-    push!(prekernel_instr, @parse cudaStreamSynchronize(starpu_cuda_get_local_stream()))
-    prekernel_instr = vcat(prekernel_instr, finish)
-
-    prekernel_name = Symbol("CUDA_", func.func)
-    prekernel = StarpuExprFunction(Nothing, prekernel_name, cpu_func.args, StarpuExprBlock(prekernel_instr))
-    prekernel = flatten_blocks(prekernel)
-
-    kernel = StarpuExprFunction(Nothing, kernelname, kernel_args, StarpuExprBlock(kernel_instr))
-    kernel = add_device_to_interval_call(kernel)
-    kernel = flatten_blocks(kernel)
-    
-    return prekernel, kernel
-end
-
-
-struct StarpuIndepFor
-
-    iters :: Vector{Symbol}
-    sets :: Vector{StarpuExprInterval}
-
-    body :: StarpuExpr
-end
-
-
-function assert_no_indep_for(expr :: StarpuExpr)
-
-    function func_to_run(x :: StarpuExpr)
-        if (isa(x, StarpuExprFor) && x.is_independant)
-            error("Invalid usage of intricated @indep for loops")
-        end
-
-        return x
-    end
-
-    return apply(func_to_run, expr)
-end
-
-
-function StarpuIndepFor(expr :: StarpuExprFor)
-
-    if !expr.is_independant
-        error("For expression must be prefixed by @indep")
-    end
-
-    iters = []
-    sets = []
-    for_loop = expr
-
-    while isa(for_loop, StarpuExprFor) && for_loop.is_independant
-
-        push!(iters, for_loop.iter)
-        push!(sets, for_loop.set)
-        for_loop = for_loop.body
-
-        while (isa(for_loop, StarpuExprBlock) && length(for_loop.exprs) == 1)
-            for_loop = for_loop.exprs[1]
-        end
-    end
-
-    return StarpuIndepFor(iters, sets, assert_no_indep_for(for_loop))
-end
-
-
-function translate_index_code(dims :: Vector{StarpuExprVar})
-
-    ndims = length(dims)
-
-    if ndims == 0
-        error("No dimension specified")
-    end
-
-    prod = StarpuExprValue(1)
-    output = StarpuExpr[]
-    reversed_dim = reverse(dims)
-    thread_index_patern = @parse € :: Int64 = (€ / €) % €
-    thread_id = @parse THREAD_ID
-
-    for i in (1 : ndims)
-        index_lvalue = StarpuExprVar(Symbol(:kernel_ids__index_, ndims - i + 1))
-        expr = replace_pattern(thread_index_patern, index_lvalue, thread_id, prod, reversed_dim[i])
-        push!(output, expr)
-
-        prod = StarpuExprCall(:(*), [prod, reversed_dim[i]])
-    end
-
-    thread_id_pattern = @parse begin
-
-        € :: Int64 = blockIdx.x * blockDim.x + threadIdx.x
-
-        if (€ >= €)
-            return
-        end
-    end
-
-    bound_verif = replace_pattern(thread_id_pattern, thread_id, thread_id, prod)
-    push!(output, bound_verif)
-
-    return reverse(output)
-end
-
-
-
-
-
-
-
-function kernel_index_declarations(ind_for :: StarpuIndepFor)
-
-    pre_kernel_instr = StarpuExpr[]
-    kernel_args = StarpuExprTypedVar[]
-    kernel_instr = StarpuExpr[]
-
-    decl_pattern = @parse € :: Int64 = €
-    interv_size_decl_pattern = @parse € :: Int64 = jlstarpu_interval_size(€, €, €)
-    iter_pattern = @parse € :: Int64 = € + € * €
-
-    dims = StarpuExprVar[]
-    ker_instr_to_add_later_on = StarpuExpr[]
-
-    for k in (1 : length(ind_for.sets))
-
-        set = ind_for.sets[k]
-
-        start_var = starpu_parse(Symbol(:kernel_ids__start_, k))
-        start_decl = replace_pattern(decl_pattern, start_var, set.start)
-
-        step_var = starpu_parse(Symbol(:kernel_ids__step_, k))
-        step_decl = replace_pattern(decl_pattern, step_var, set.step)
-
-        dim_var = starpu_parse(Symbol(:kernel_ids__dim_, k))
-        dim_decl = replace_pattern(interv_size_decl_pattern, dim_var, start_var, step_var, set.stop)
-
-        push!(dims, dim_var)
-
-        push!(pre_kernel_instr, start_decl, step_decl, dim_decl)
-        push!(kernel_args, StarpuExprTypedVar(start_var.name, Int64))
-        push!(kernel_args, StarpuExprTypedVar(step_var.name, Int64))
-        push!(kernel_args, StarpuExprTypedVar(dim_var.name, Int64))
-
-        iter_var = starpu_parse(ind_for.iters[k])
-        index_var = starpu_parse(Symbol(:kernel_ids__index_, k))
-        iter_decl = replace_pattern(iter_pattern, iter_var, start_var, index_var, step_var)
-
-        push!(ker_instr_to_add_later_on, iter_decl)
-    end
-
-
-    return dims, ker_instr_to_add_later_on, pre_kernel_instr , kernel_args, kernel_instr
-end
-
-
-
-function analyse_sets(ind_for :: StarpuIndepFor)
-
-
-    decl_pattern = @parse € :: Int64 = €
-    nblocks_decl_pattern = @parse € :: Int64 = (€ + THREADS_PER_BLOCK - 1)/THREADS_PER_BLOCK
-
-    dims, ker_instr_to_add, pre_kernel_instr, kernel_args, kernel_instr  = kernel_index_declarations(ind_for)
-
-    dim_prod = @parse 1
-
-    for d in dims
-        dim_prod = StarpuExprCall(:(*), [dim_prod, d])
-    end
-
-    nthreads_var = @parse nthreads
-    nthreads_decl = replace_pattern(decl_pattern, nthreads_var, dim_prod)
-    push!(pre_kernel_instr, nthreads_decl)
-
-    nblocks_var = @parse nblocks
-    nblocks_decl = replace_pattern(nblocks_decl_pattern, nblocks_var, nthreads_var)
-    push!(pre_kernel_instr, nblocks_decl)
-
-
-    index_decomposition = translate_index_code(dims)
-
-    push!(kernel_instr, index_decomposition...)
-    push!(kernel_instr, ker_instr_to_add...)
-
-    return pre_kernel_instr, kernel_args, kernel_instr
-end

+ 0 - 13
julia/StarPU.jl/src/compiler/include.jl

@@ -1,13 +0,0 @@
-export starpu_new_cpu_kernel_file
-export starpu_new_cuda_kernel_file
-export @codelet
-export @target
-
-include("utils.jl")
-include("expressions.jl")
-include("parsing.jl")
-include("expression_manipulation.jl")
-include("c.jl")
-include("cuda.jl")
-include("file_generation.jl")
-

+ 0 - 38
julia/StarPU.jl/src/compiler/utils.jl

@@ -1,38 +0,0 @@
-import Base.print
-
-function print_newline(io :: IO, indent = 0, n_lines = 1)
-    for i in (1 : n_lines)
-        print(io, "\n")
-    end
-
-    for i in (1 : indent)
-        print(io, " ")
-    end
-end
-
-starpu_indent_size = 4
-
-function rand_char()
-    r = rand(UInt) % 62
-
-    if (0 <= r < 10)
-        return '0' + r
-    elseif (10 <= r < 36)
-        return 'a' + (r - 10)
-    else
-        return 'A' + (r - 36)
-    end
-end
-
-function rand_string(size = 8)
-    output = ""
-
-    for i in (1 : size)
-        output *= string(rand_char())
-    end
-    return output
-end
-
-function system(cmd :: String)
-    ccall((:system, "libc"), Cint, (Cstring,), cmd)
-end

+ 0 - 133
julia/StarPU.jl/src/jlstarpu_data_handles.c

@@ -1,133 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-
-#include "jlstarpu.h"
-
-enum jlstarpu_data_filter_func
-{
-	JLSTARPU_MATRIX_FILTER_VERTICAL_BLOCK = 0,
-	JLSTARPU_MATRIX_FILTER_BLOCK,
-	JLSTARPU_VECTOR_FILTER_BLOCK,
-};
-
-struct jlstarpu_data_filter
-{
-	enum jlstarpu_data_filter_func func;
-	unsigned int nchildren;
-
-};
-
-
-void * jlstarpu_translate_data_filter_func(enum jlstarpu_data_filter_func func)
-{
-	switch (func){
-	case JLSTARPU_MATRIX_FILTER_VERTICAL_BLOCK:
-		return starpu_matrix_filter_vertical_block;
-	case JLSTARPU_MATRIX_FILTER_BLOCK:
-		return starpu_matrix_filter_block;
-	case JLSTARPU_VECTOR_FILTER_BLOCK:
-		return starpu_vector_filter_block;
-	default:
-		return NULL;
-	}
-
-}
-
-void jlstarpu_translate_data_filter(const struct jlstarpu_data_filter * const input,struct starpu_data_filter * output)
-{
-	memset(output, 0, sizeof(struct starpu_data_filter));
-	output->filter_func = jlstarpu_translate_data_filter_func(input->func);
-	output->nchildren = input->nchildren;
-}
-
-void jlstarpu_data_partition(starpu_data_handle_t handle,const struct jlstarpu_data_filter * const jl_filter)
-{
-	struct starpu_data_filter filter;
-	jlstarpu_translate_data_filter(jl_filter, &filter);
-	starpu_data_partition(handle, &filter);
-}
-
-
-void jlstarpu_data_map_filters_1_arg(starpu_data_handle_t handle,
-	const struct jlstarpu_data_filter * const jl_filter
-	)
-{
-	struct starpu_data_filter filter;
-	jlstarpu_translate_data_filter(jl_filter, &filter);
-
-	starpu_data_map_filters(handle, 1, &filter);
-
-}
-
-
-void jlstarpu_data_map_filters_2_arg
-(
-	starpu_data_handle_t handle,
-	const struct jlstarpu_data_filter * const jl_filter_1,
-	const struct jlstarpu_data_filter * const jl_filter_2
-	)
-{
-	struct starpu_data_filter filter_1;
-	jlstarpu_translate_data_filter(jl_filter_1, &filter_1);
-
-	struct starpu_data_filter filter_2;
-	jlstarpu_translate_data_filter(jl_filter_2, &filter_2);
-
-
-	starpu_data_map_filters(handle, 2, &filter_1, &filter_2);
-
-}
-
-
-
-
-#define JLSTARPU_GET(interface, field, ret_type)			\
-									\
-	ret_type jlstarpu_##interface##_get_##field(const struct starpu_##interface##_interface * const x) \
-	{								\
-		return (ret_type) x->field;				\
-	}								\
-
-
-
-
-
-JLSTARPU_GET(vector, ptr, void *)
-JLSTARPU_GET(vector, nx, uint32_t)
-JLSTARPU_GET(vector, elemsize, size_t)
-
-
-
-JLSTARPU_GET(matrix, ptr, void *)
-JLSTARPU_GET(matrix, ld, uint32_t)
-JLSTARPU_GET(matrix, nx, uint32_t)
-JLSTARPU_GET(matrix, ny, uint32_t)
-JLSTARPU_GET(matrix, elemsize, size_t)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

+ 0 - 73
julia/StarPU.jl/src/jlstarpu_task.h

@@ -1,73 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_task.h
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-#ifndef JLSTARPU_TASK_H_
-#define JLSTARPU_TASK_H_
-
-
-#include "jlstarpu.h"
-
-struct jlstarpu_codelet
-{
-	uint32_t where;
-
-	starpu_cpu_func_t cpu_func;
-	char * cpu_func_name;
-
-	starpu_cuda_func_t cuda_func;
-	starpu_opencl_func_t opencl_func;
-
-	int nbuffer;
-	enum starpu_data_access_mode * modes;
-
-	struct starpu_perfmodel * model;
-
-};
-
-
-
-struct jlstarpu_task
-{
-	struct starpu_codelet * cl;
-	starpu_data_handle_t * handles;
-	unsigned int synchronous;
-
-	void * cl_arg;
-	size_t cl_arg_size;
-};
-
-
-#if 0
-
-struct cl_args_decorator
-{
-	struct jlstarpu_function_launcher * launcher;
-	void * cl_args;
-};
-
-#endif
-
-
-
-
-
-#endif /* JLSTARPU_TASK_H_ */

+ 0 - 208
julia/StarPU.jl/src/jlstarpu_task_submit.c

@@ -1,208 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_task_submit.c
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-
-#include "jlstarpu.h"
-
-
-struct starpu_codelet * jlstarpu_new_codelet()
-{
-	struct starpu_codelet * output;
-	TYPE_MALLOC(output, 1);
-
-	starpu_codelet_init(output);
-
-	return output;
-}
-
-
-#if 0
-struct starpu_codelet * jlstarpu_translate_codelet(struct jlstarpu_codelet * const input)
-{
-	struct starpu_codelet * output;
-	TYPE_MALLOC(output, 1);
-
-	starpu_codelet_init(output);
-
-	output->where = input->where;
-	output->cpu_funcs[0] = input->cpu_func;
-	output->cpu_funcs_name[0] = input->cpu_func_name;
-
-	output->cuda_funcs[0] = input->cuda_func;
-	output->opencl_funcs[0] = input->opencl_func;
-
-	output->nbuffers = input->nbuffer;
-	memcpy(&(output->modes), input->modes, input->nbuffer * sizeof(enum starpu_data_access_mode));
-
-	output->model = input->model;
-
-	return output;
-}
-#endif
-
-void jlstarpu_codelet_update(const struct jlstarpu_codelet * const input, struct starpu_codelet * const output)
-{
-	output->where = input->where;
-
-	output->cpu_funcs[0] = input->cpu_func;
-	output->cpu_funcs_name[0] = input->cpu_func_name;
-
-	output->cuda_funcs[0] = input->cuda_func;
-	output->opencl_funcs[0] = input->opencl_func;
-
-	output->nbuffers = input->nbuffer;
-	memcpy(&(output->modes), input->modes, input->nbuffer * sizeof(enum starpu_data_access_mode));
-
-	output->model = input->model;
-
-}
-#if 0
-void jlstarpu_free_codelet(struct starpu_codelet * cl)
-{
-	free(cl);
-}
-#endif
-
-void jlstarpu_hello() {
-	fprintf(stderr,"coucou !");
-}
-
-#if 0
-struct starpu_task * jlstarpu_translate_task(const struct jlstarpu_task * const input)
-{
-	struct starpu_task * output = starpu_task_create();
-
-	if (output == NULL){
-		return NULL;
-	}
-
-	output->cl = input->cl;
-	memcpy(&(output->handles), input->handles, input->cl->nbuffers * sizeof(starpu_data_handle_t));
-	output->synchronous = input->synchronous;
-
-
-	return output;
-}
-#endif
-
-char *starpu_find_function(char *name, char *device) {
-	return NULL;
-}
-
-void jlstarpu_task_update(const struct jlstarpu_task * const input, struct starpu_task * const output)
-{
-	output->cl = input->cl;
-	memcpy(&(output->handles), input->handles, input->cl->nbuffers * sizeof(starpu_data_handle_t));
-	output->synchronous = input->synchronous;
-	output->cl_arg = input->cl_arg;
-	output->cl_arg_size = input->cl_arg_size;
-}
-
-/*
-
-void print_perfmodel(struct starpu_perfmodel * p)
-{
-	printf("Perfmodel at address %p:\n");
-	printf("\ttype : %u\n", p->type);
-	printf("\tcost_function : %p\n", p->cost_function);
-	printf("\tarch_cost_function : %p\n", p->arch_cost_function);
-	printf("\tsize_base : %p\n", p->size_base);
-	printf("\tfootprint : %p\n", p->footprint);
-	printf("\tsymbol : %s\n", p->symbol);
-	printf("\tis_loaded : %u\n", p->is_loaded);
-	printf("\tbenchmarking : %u\n", p->benchmarking);
-	printf("\tis_init : %u\n", p->is_init);
-	printf("\tparameters : %p\n", p->parameters);
-	printf("\tparameters_names : %p\n", p->parameters_names);
-	printf("\tnparameters : %u\n", p->nparameters);
-	printf("\tcombinations : %p\n", p->combinations);
-	printf("\tncombinations : %u\n", p->ncombinations);
-	printf("\tstate : %p\n", p->state);
-
-}
-
-
-*/
-
-#if 0
-/*
- * TODO : free memory
- */
-int jlstarpu_task_submit(const struct jlstarpu_task * const jl_task)
-{
-	DEBUG_PRINT("Inside C wrapper");
-
-	struct starpu_task * task;
-	int ret_code;
-
-
-	DEBUG_PRINT("Translating task...");
-	task = jlstarpu_translate_task(jl_task);
-
-	if (task == NULL){
-		fprintf(stderr, "Error while creating the task.\n");
-		return EXIT_FAILURE;
-	}
-
-	DEBUG_PRINT("Task translated");
-	DEBUG_PRINT("Submitting task to StarPU...");
-	ret_code = starpu_task_submit(task);
-	DEBUG_PRINT("starpu_task_submit has returned");
-
-
-	if (ret_code != 0){
-		fprintf(stderr, "Error while submitting task.\n");
-		return ret_code;
-	}
-
-
-	DEBUG_PRINT("Done");
-	DEBUG_PRINT("END OF STARPU FUNCTION");
-
-
-	return ret_code;
-}
-
-#endif
-
-
-
-
-
-
-
-#define JLSTARPU_UPDATE_FUNC(type, field)\
-	\
-	void jlstarpu_##type##_update_##field(const struct jlstarpu_##type * const input, struct starpu_##type * const output)\
-	{\
-		output->field = input->field;\
-	}
-
-
-
-
-
-
-
-
-
-

+ 0 - 67
julia/StarPU.jl/src/jlstarpu_utils.h

@@ -1,67 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_utils.h
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-#ifndef JLSTARPU_UTILS_H_
-#define JLSTARPU_UTILS_H_
-
-#include "jlstarpu.h"
-
-
-#define TYPE_MALLOC(ptr, nb_elements) \
-		do {\
-			if ((nb_elements) == 0){ \
-				ptr = NULL; \
-			} else { \
-				ptr = malloc((nb_elements) * sizeof(*(ptr))); \
-				if (ptr == NULL){ \
-					fprintf(stderr, "\033[31mCRITICAL : MALLOC HAS RETURNED NULL\n\033[0m");\
-					fflush(stderr);\
-					exit(1);\
-				} \
-			} \
-		} while(0)
-
-
-
-//#define DEBUG
-#ifdef DEBUG
-
-#define DEBUG_PRINT(...)\
-		do {\
-			fprintf(stderr, "\x1B[34m%s : \x1B[0m", __FUNCTION__);\
-			fprintf(stderr, __VA_ARGS__);\
-			fprintf(stderr, "\n");\
-			fflush(stderr);\
-		} while (0)
-
-
-
-
-#else
-
-#define DEBUG_PRINT(...)
-
-#endif
-
-
-
-#endif /* JLSTARPU_UTILS_H_ */

+ 139 - 0
julia/examples/Makefile.am

@@ -0,0 +1,139 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+include $(top_srcdir)/starpu.mk
+
+noinst_PROGRAMS		=
+
+if STARPU_HAVE_WINDOWS
+LOADER_BIN		=
+else
+loader_CPPFLAGS 	= 	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
+if !STARPU_SIMGRID
+LOADER			=	loader
+LOADER_BIN		=	$(abs_top_builddir)/julia/examples/$(LOADER)
+noinst_PROGRAMS		+=	loader
+endif
+loader_SOURCES		=	../../tests/loader.c
+endif
+
+if STARPU_HAVE_AM111
+TESTS_ENVIRONMENT	=	top_builddir="$(abs_top_builddir)" top_srcdir="$(abs_top_srcdir)"
+else
+TESTS_ENVIRONMENT 	=	top_builddir="$(abs_top_builddir)" top_srcdir="$(abs_top_srcdir)" $(LOADER_BIN)
+endif
+
+BUILT_SOURCES =
+
+CLEANFILES = *.gcno *.gcda *.linkinfo starpu_idle_microsec.log
+
+EXTRA_DIST =					\
+	axpy/axpy.jl				\
+	axpy/axpy.sh				\
+	black_scholes/black_scholes.jl		\
+	callback/callback.jl			\
+	callback/callback.sh			\
+	check_deps/check_deps.jl		\
+	check_deps/check_deps.sh		\
+	dependency/end_dep.jl			\
+	dependency/end_dep.sh			\
+	dependency/tag_dep.jl			\
+	dependency/tag_dep.sh			\
+	dependency/task_dep.sh			\
+	dependency/task_dep.jl			\
+	gemm/gemm.jl				\
+	gemm/gemm_native.jl			\
+	gemm/gemm.sh				\
+	mandelbrot/mandelbrot_native.jl		\
+	mandelbrot/mandelbrot.jl		\
+	mandelbrot/mandelbrot.sh		\
+	mult/mult_native.jl			\
+	mult/mult.jl				\
+	mult/perf.sh				\
+	mult/mult_starpu.sh			\
+	task_insert_color/task_insert_color.jl	\
+	task_insert_color/task_insert_color.sh	\
+	variable/variable.jl			\
+	variable/variable_native.jl		\
+	variable/variable.sh			\
+	vector_scal/vector_scal.jl		\
+	vector_scal/vector_scal.sh
+
+examplebindir = $(libdir)/starpu/julia
+
+examplebin_PROGRAMS =
+
+if STARPU_USE_CUDA
+if STARPU_COVERITY
+include $(top_srcdir)/starpu-mynvcc.mk
+else
+NVCCFLAGS += --compiler-options -fno-strict-aliasing  -I$(top_srcdir)/include/ -I$(top_builddir)/include/ $(HWLOC_CFLAGS)
+
+.cu.cubin:
+	$(V_nvcc) $(NVCC) -cubin $< -o $@ $(NVCCFLAGS)
+
+.cu.o:
+	$(V_nvcc) $(NVCC) $< -c -o $@ $(NVCCFLAGS)
+endif
+endif
+
+AM_CFLAGS = -Wall $(STARPU_CUDA_CPPFLAGS) $(STARPU_OPENCL_CPPFLAGS) $(FXT_CFLAGS) $(MAGMA_CFLAGS) $(HWLOC_CFLAGS) $(GLOBAL_AM_CFLAGS) -Wno-unused
+LIBS = $(top_builddir)/src/@LIBSTARPU_LINK@ ../src/libstarpujulia-@STARPU_EFFECTIVE_VERSION@.la -lm @LIBS@ $(FXT_LIBS) $(MAGMA_LIBS)
+AM_CPPFLAGS = -I$(top_srcdir)/include/ -I$(top_srcdir)/examples/ -I$(top_builddir)/include
+AM_LDFLAGS = $(STARPU_OPENCL_LDFLAGS) $(STARPU_CUDA_LDFLAGS) $(FXT_LDFLAGS) $(STARPU_COI_LDFLAGS) $(STARPU_SCIF_LDFLAGS)
+
+check_PROGRAMS = $(LOADER) $(starpu_julia_EXAMPLES)
+SHELL_TESTS	=
+STARPU_JULIA_EXAMPLES	=
+
+if BUILD_EXAMPLES
+examplebin_PROGRAMS 	+=	$(STARPU_JULIA_EXAMPLES)
+
+TESTS			=	$(SHELL_TESTS) $(STARPU_JULIA_EXAMPLES)
+endif
+
+######################
+#      Examples      #
+######################
+
+SHELL_TESTS	+=	check_deps/check_deps.sh
+
+STARPU_JULIA_EXAMPLES	+=	mult/mult
+mult_mult_SOURCES	=	mult/mult.c mult/cpu_mult.c
+SHELL_TESTS		+=	mult/mult_starpu.sh
+
+STARPU_JULIA_EXAMPLES				+=	task_insert_color/task_insert_color
+task_insert_color_task_insert_color_SOURCES	=	task_insert_color/task_insert_color.c
+SHELL_TESTS					+=	task_insert_color/task_insert_color.sh
+
+SHELL_TESTS	+=	variable/variable.sh
+SHELL_TESTS	+=	vector_scal/vector_scal.sh
+
+STARPU_JULIA_EXAMPLES		+=	mandelbrot/mandelbrot
+mandelbrot_mandelbrot_SOURCES	=	mandelbrot/mandelbrot.c mandelbrot/cpu_mandelbrot.c mandelbrot/cpu_mandelbrot.h
+SHELL_TESTS			+=	mandelbrot/mandelbrot.sh
+
+STARPU_JULIA_EXAMPLES		+= 	callback/callback
+callback_callback_SOURCES	=	callback/callback.c
+SHELL_TESTS			+=	callback/callback.sh
+
+SHELL_TESTS			+=	dependency/tag_dep.sh
+SHELL_TESTS			+=	dependency/task_dep.sh
+SHELL_TESTS			+=	dependency/end_dep.sh
+
+if !NO_BLAS_LIB
+SHELL_TESTS			+=	axpy/axpy.sh
+SHELL_TESTS			+=	gemm/gemm.sh
+endif

+ 110 - 0
julia/examples/axpy/axpy.jl

@@ -0,0 +1,110 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+using Printf
+const EPSILON = 1e-6
+
+function check(alpha, X, Y)
+    for i in 1:length(X)
+        expected_value = alpha * X[i] + 4.0
+        if abs(Y[i] - expected_value) > expected_value * EPSILON
+            error("at ", i, ", ", alpha, "*", X[i], "+4.0=", Y[i], ", expected ", expected_value)
+        end
+    end
+end
+
+@target STARPU_CPU+STARPU_CUDA
+@codelet function axpy(X :: Vector{Float32}, Y :: Vector{Float32}, alpha ::Float32) :: Nothing
+    STARPU_SAXPY(length(X), alpha, X, 1, Y, 1)
+    return
+end
+
+function axpy(N, NBLOCKS, alpha, display = true)
+    X = Array(fill(1.0f0, N))
+    Y = Array(fill(4.0f0, N))
+
+    starpu_memory_pin(X)
+    starpu_memory_pin(Y)
+
+    block_filter = starpu_data_filter(STARPU_VECTOR_FILTER_BLOCK, NBLOCKS)
+
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+    cl = starpu_codelet(
+        cpu_func = CPU_CODELETS["axpy"],
+        cuda_func = CUDA_CODELETS["axpy"],
+        #cuda_func = STARPU_SAXPY,
+        modes = [STARPU_R, STARPU_RW],
+        perfmodel = perfmodel
+    )
+
+    if display
+        println("BEFORE x[0] = ", X[1])
+        println("BEFORE y[0] = ", Y[1])
+    end
+
+    t_start = time_ns()
+
+    @starpu_block let
+        hX,hY = starpu_data_register(X, Y)
+
+        starpu_data_partition(hX, block_filter)
+        starpu_data_partition(hY, block_filter)
+
+        for b in 1:NBLOCKS
+            task = starpu_task(cl = cl, handles = [hX[b],hY[b]], cl_arg=(Float32(alpha),),
+                               tag=starpu_tag_t(b))
+            starpu_task_submit(task)
+        end
+
+        starpu_task_wait_for_all()
+    end
+
+    t_end = time_ns()
+
+    timing = (t_end-t_start)/1000
+
+    if display
+        @printf("timing -> %d us %.2f MB/s\n", timing, 3*N*4/timing)
+        println("AFTER y[0] = ", Y[1], " (ALPHA=", alpha, ")")
+    end
+
+    check(alpha, X, Y)
+
+    starpu_memory_unpin(X)
+    starpu_memory_unpin(Y)
+end
+
+function main()
+    N = 16 * 1024 * 1024
+    NBLOCKS = 8
+    alpha = 3.41
+
+    starpu_init()
+    starpu_cublas_init()
+
+    # warmup
+    axpy(10, 1, alpha, false)
+
+    axpy(N, NBLOCKS, alpha)
+
+    starpu_shutdown()
+end
+
+main()

+ 19 - 0
julia/examples/axpy/axpy.sh

@@ -0,0 +1,19 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh axpy/axpy.jl
+

+ 1 - 0
julia/black_scholes/black_scholes.c

@@ -1,5 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2019       Mael Keryell
  *
  * StarPU is free software; you can redistribute it and/or modify

+ 15 - 2
julia/black_scholes/black_scholes.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 using StarPU
 
@@ -115,8 +130,6 @@ using StarPU
     return 0
 end
 
-
-@debugprint "starpu_init"
 starpu_init()
 
 function black_scholes_starpu(data ::Matrix{Float64}, res ::Matrix{Float64}, nslices ::Int64)

+ 93 - 0
julia/examples/callback/callback.c

@@ -0,0 +1,93 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+
+/*
+ * This is an example of using a callback. We submit a task, whose callback
+ * submits another task (without any callback).
+ */
+
+#include <starpu.h>
+
+#define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
+
+starpu_data_handle_t handle;
+
+void cpu_codelet(void *descr[], void *_args)
+{
+	(void)_args;
+	int *val = (int *)STARPU_VARIABLE_GET_PTR(descr[0]);
+
+	*val += 1;
+}
+
+struct starpu_codelet cl =
+{
+	.modes = { STARPU_RW },
+	.cpu_funcs = {cpu_codelet},
+	.cpu_funcs_name = {"cpu_codelet"},
+	.nbuffers = 1,
+	.name = "callback"
+};
+
+void callback_func(void *callback_arg)
+{
+	int ret;
+
+	(void)callback_arg;
+
+	struct starpu_task *task = starpu_task_create();
+	task->cl = &cl;
+	task->handles[0] = handle;
+
+	ret = starpu_task_submit(task);
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+}
+
+int main(void)
+{
+	int v=40;
+	int ret;
+
+	ret = starpu_init(NULL);
+	if (ret == -ENODEV)
+		return 77;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
+
+	starpu_variable_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)&v, sizeof(int));
+
+	struct starpu_task *task = starpu_task_create();
+	task->cl = &cl;
+	task->callback_func = callback_func;
+	task->callback_arg = NULL;
+	task->handles[0] = handle;
+
+	ret = starpu_task_submit(task);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+
+	starpu_task_wait_for_all();
+	starpu_data_unregister(handle);
+
+	FPRINTF(stderr, "v -> %d\n", v);
+
+	starpu_shutdown();
+
+	return (v == 42) ? 0 : 1;
+
+enodev:
+	starpu_shutdown();
+	return 77;
+}

+ 76 - 0
julia/examples/callback/callback.jl

@@ -0,0 +1,76 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function variable(val ::Ref{Int32}) :: Nothing
+    val[] = val[] + 1
+
+    return
+end
+
+function callback(args)
+    cl = args[1]
+    handles = args[2]
+
+    task = starpu_task(cl = cl, handles=handles)
+    starpu_task_submit(task)
+end
+
+function variable_with_starpu(val ::Ref{Int32})
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+    cl = starpu_codelet(
+        cpu_func = CPU_CODELETS["variable"],
+        # cuda_func = CUDA_CODELETS["matrix_mult"],
+        #opencl_func="ocl_matrix_mult",
+        modes = [STARPU_RW],
+        perfmodel = perfmodel
+    )
+
+    @starpu_block let
+	hVal = starpu_data_register(val)
+
+        task = starpu_task(cl = cl, handles = [hVal], callback=callback, callback_arg=(cl, [hVal]))
+        starpu_task_submit(task)
+
+        starpu_task_wait_for_all()
+    end
+end
+
+function display()
+    v = Ref(Int32(40))
+
+    variable_with_starpu(v)
+
+    println("variable -> ", v[])
+    if v[] == 42
+        println("result is correct")
+    else
+        println("result is incorret")
+    end
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+display()
+starpu_shutdown()
+GC.enable(true)

+ 19 - 0
julia/examples/callback/callback.sh

@@ -0,0 +1,19 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh callback/callback.jl
+

+ 32 - 0
julia/examples/check_deps/check_deps.jl

@@ -0,0 +1,32 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+import Pkg
+
+try
+    using CBinding
+    using Clang
+    using ThreadPools
+catch
+    Pkg.activate((@__DIR__)*"/../..")
+    Pkg.instantiate()
+    using Clang
+    using CBinding
+    using ThreadPools
+end
+
+using StarPU
+
+starpu_translate_headers()

+ 20 - 0
julia/examples/check_deps/check_deps.sh

@@ -0,0 +1,20 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh check_deps/check_deps.jl
+
+

+ 104 - 0
julia/examples/dependency/end_dep.jl

@@ -0,0 +1,104 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA() :: Nothing
+    # print("[Task A] Value = ", val[]);
+    # do nothing
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function callbackB(task)
+    sleep(1)
+    starpu_task_end_dep_release(task)
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function callbackC(task)
+    starpu_task_end_dep_release(task)
+end
+
+
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        handle = starpu_data_register(value)
+
+	starpu_data_set_sequential_consistency_flag(handle, 0)
+
+        taskA = starpu_task(cl = clA, detach=0)
+        taskB = starpu_task(cl = clB, handles = [handle], callback=callbackB, callback_arg=taskA)
+	taskC = starpu_task(cl = clC, handles = [handle], callback=callbackC, callback_arg=taskA)
+
+	starpu_task_end_dep_add(taskA, 2)
+        starpu_task_declare_deps(taskC, taskB)
+
+        starpu_task_submit(taskA)
+        starpu_task_submit(taskB)
+        starpu_task_submit(taskC)
+        starpu_task_wait(taskA)
+
+        starpu_data_acquire_on_node(handle, STARPU_MAIN_RAM, STARPU_R);
+	# Waiting for taskA should have also waited for taskB and taskC
+        if value[] != 48
+            error("Incorrect value $(value[]) (expected 48)")
+        end
+	starpu_data_release_on_node(handle, STARPU_MAIN_RAM);
+    end
+
+
+    println("Value = ", value[])
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+main()
+starpu_shutdown()
+GC.enable(true)

+ 18 - 0
julia/examples/dependency/end_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/end_dep.jl

+ 122 - 0
julia/examples/dependency/tag_dep.jl

@@ -0,0 +1,122 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA(val ::Ref{Int32}) :: Nothing
+    # print("[Task A] Value = ", val[]);
+    val[] = val[] * 2
+end
+
+function callbackA(arg)
+    clB = arg[1]
+    handle = arg[2]
+    tagHoldC = arg[3]
+
+    taskB = starpu_task(cl = clB, handles = [handle],
+                        callback = starpu_tag_notify_from_apps,
+                        callback_arg = tagHoldC,
+                        sequential_consistency=false)
+
+    starpu_task_submit(taskB)
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] +1
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+
+# Submit taskA and hold it
+# Submit taskC and hold it
+# Release taskA
+# Execute taskA       --> callback: submit taskB
+# Execute taskB       --> callback: release taskC
+#
+# All three tasks use the same data in RW, taskB is submitted after
+# taskC, so taskB should normally only execute after taskC but as the
+# sequential consistency for (taskB, data) is unset, taskB can
+# execute straightaway
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+    tagHoldA :: starpu_tag_t = 32
+    tagHoldC :: starpu_tag_t = 84
+    tagA :: starpu_tag_t = 421
+    tagC :: starpu_tag_t = 842
+
+    starpu_tag_declare_deps(tagA, tagHoldA)
+    starpu_tag_declare_deps(tagC, tagHoldC)
+
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        handle = starpu_data_register(value)
+
+        taskA = starpu_task(cl = clA, handles = [handle], tag = tagA,
+                            callback = callbackA,
+                            callback_arg=(clB, handle, tagHoldC))
+        starpu_task_submit(taskA)
+
+        taskC = starpu_task(cl = clC, handles = [handle], tag = tagC)
+        starpu_task_submit(taskC)
+
+        # Release taskA (we want to make sure it will execute after taskC has been submitted)
+        starpu_tag_notify_from_apps(tagHoldA)
+
+        starpu_task_wait_for_all()
+    end
+
+    if value[] != 50
+        error("Incorrect value $(value[]) (expected 50)")
+    end
+
+    println("Value = ", value[])
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+main()
+starpu_shutdown()
+GC.enable(true)

+ 18 - 0
julia/examples/dependency/tag_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/tag_dep.jl

+ 88 - 0
julia/examples/dependency/task_dep.jl

@@ -0,0 +1,88 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA(val ::Ref{Int32}) :: Nothing
+    # print("[Task A] Value = ", val[]);
+    val[] = val[] * 2
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] +1
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        starpu_data_set_default_sequential_consistency_flag(0)
+
+        handle = starpu_data_register(value)
+
+        taskA = starpu_task(cl = clA, handles = [handle])
+        taskB = starpu_task(cl = clB, handles = [handle])
+        taskC = starpu_task(cl = clC, handles = [handle])
+
+        starpu_task_declare_deps(taskA, taskB)
+        starpu_task_declare_deps(taskC, taskA, taskB)
+
+        starpu_task_submit(taskA)
+        starpu_task_submit(taskB)
+        starpu_task_submit(taskC)
+
+        starpu_task_wait_for_all()
+    end
+
+    if value[] != 52
+        error("Incorrect value $(value[]) (expected 52)")
+    end
+
+    println("Value = ", value[])
+end
+
+starpu_init()
+main()
+starpu_shutdown()

+ 18 - 0
julia/examples/dependency/task_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/task_dep.jl

+ 47 - 0
julia/examples/execute.sh.in

@@ -0,0 +1,47 @@
+#!@REALBASH@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+set -x
+export JULIA_LOAD_PATH=@STARPU_SRC_DIR@/julia:$JULIA_LOAD_PATH
+export STARPU_BUILD_DIR=@STARPU_BUILD_DIR@
+export STARPU_SRC_DIR=@STARPU_SRC_DIR@
+export STARPU_JULIA_LIB=@STARPU_BUILD_DIR@/julia/src/.libs/libstarpujulia-1.3.so
+export STARPU_JULIA_BUILD=@STARPU_BUILD_DIR@/julia
+export JULIA_NUM_THREADS=8
+srcdir=@STARPU_SRC_DIR@/julia/examples
+
+if test "$1" == "-calllib"
+then
+    shift
+    pwd
+    rm -f extern_tasks.so
+    make -f @STARPU_BUILD_DIR@/julia/src/dynamic_compiler/Makefile extern_tasks.so SOURCES_CPU=$srcdir/$1
+    shift
+    export JULIA_TASK_LIB=$PWD/extern_tasks.so
+fi
+
+srcfile=$1
+if test ! -f $srcdir/$srcfile
+then
+    echo "Error. File $srcdir/$srcfile not found"
+    exit 1
+fi
+shift
+#cd $srcdir/$(dirname $srcfile)
+#@JULIA@ $(basename $srcfile) $*
+@JULIA@ $srcdir/$srcfile $*
+

+ 130 - 0
julia/examples/gemm/gemm.jl

@@ -0,0 +1,130 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU+STARPU_CUDA
+@codelet function gemm(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32) :: Nothing
+
+    M :: Int32 = height(A)
+    N :: Int32 = width(B)
+    K :: Int32 = width(A)
+    lda :: Int32 = height(A)
+    ldb :: Int32 = height(B)
+    ldc :: Int32 = height(C)
+    STARPU_SGEMM("N", "N", M, N, K, alpha, A, lda, B, ldb, beta, C, ldc)
+
+    return
+end
+
+function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32, nslicesx, nslicesy)
+    scale= 3
+    tmin=0
+    vert = starpu_data_filter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
+    @starpu_block let
+        hA,hB,hC = starpu_data_register(A, B, C)
+        starpu_data_partition(hB, vert)
+        starpu_data_partition(hA, horiz)
+        starpu_data_map_filters(hC, vert, horiz)
+        tmin=0
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+        cl = starpu_codelet(
+            cpu_func = CPU_CODELETS["gemm"],
+            cuda_func = CUDA_CODELETS["gemm"],
+            modes = [STARPU_R, STARPU_R, STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        for i in (1 : 10 )
+            t=time_ns()
+            @starpu_sync_tasks begin
+                for taskx in (1 : nslicesx)
+                    for tasky in (1 : nslicesy)
+                        handles = [hA[tasky], hB[taskx], hC[taskx, tasky]]
+                        task = starpu_task(cl = cl, handles = handles, cl_arg=(alpha, beta))
+                        starpu_task_submit(task)
+                        #@starpu_async_cl matrix_mult(hA[tasky], hB[taskx], hC[taskx, tasky])
+                    end
+                end
+            end
+            t=time_ns()-t
+            if (tmin==0 || tmin>t)
+                tmin=t
+            end
+        end
+    end
+    return tmin
+end
+
+
+function approximately_equals(
+    A :: Matrix{Cfloat},
+    B :: Matrix{Cfloat},
+    eps = 1e-2
+)
+    (height, width) = size(A)
+
+    for j in (1 : width)
+        for i in (1 : height)
+            if (abs(A[i,j] - B[i,j]) > eps * max(abs(B[i,j]), abs(A[i,j])))
+                println("A[$i,$j] : $(A[i,j]), B[$i,$j] : $(B[i,j])")
+                return false
+            end
+        end
+    end
+
+    return true
+end
+
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        starpu_memory_pin(A)
+        starpu_memory_pin(B)
+        starpu_memory_pin(C)
+        alpha = 4.0f0
+        beta = 2.0f0
+        mt =  multiply_with_starpu(A, B, C, alpha, beta, nslicesx, nslicesy)
+        gflop = 2 * dim * dim * dim * 1.e-9
+        gflops = gflop / (mt * 1.e-9)
+        size=dim*dim*dim*4*3/1024/1024
+        println(io,"$dim $gflops")
+        println("$dim $gflops")
+        starpu_memory_unpin(A)
+        starpu_memory_unpin(B)
+        starpu_memory_unpin(C)
+    end
+end
+
+if size(ARGS, 1) < 1
+    filename="x.dat"
+else
+    filename=ARGS[1]
+end
+
+starpu_init()
+
+io=open(filename,"w")
+compute_times(io,64,512,4096,2,2)
+close(io)
+
+starpu_shutdown()
+

+ 21 - 0
julia/examples/gemm/gemm.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh gemm/gemm.jl
+$(dirname $0)/../execute.sh gemm/gemm_native.jl
+
+

+ 56 - 0
julia/examples/gemm/gemm_native.jl

@@ -0,0 +1,56 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using LinearAlgebra.BLAS
+
+function gemm_without_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32)
+    tmin = 0
+    for i in (1 : 10 )
+        t=time_ns()
+        gemm!('N', 'N', alpha, A, B, beta, C)
+        t=time_ns() - t
+        if (tmin==0 || tmin>t)
+            tmin=t
+        end
+    end
+    return tmin
+end
+
+
+function compute_times(io,start_dim, step_dim, stop_dim)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        alpha = 4.0f0
+        beta = 2.0f0
+        mt =  gemm_without_starpu(A, B, C, alpha, beta)
+        gflop = 2 * dim * dim * dim * 1.e-9
+        gflops = gflop / (mt * 1.e-9)
+        size=dim*dim*dim*4*3/1024/1024
+        println(io,"$dim $gflops")
+        println("$dim $gflops")
+    end
+end
+
+if size(ARGS, 1) < 1
+    filename="x.dat"
+else
+    filename=ARGS[1]
+end
+io=open(filename,"w")
+compute_times(io,64,512,4096)
+close(io)
+

+ 5 - 7
julia/mandelbrot/Makefile

@@ -21,12 +21,10 @@ ifneq ($(ENABLE_CUDA),yes)
 	CUDA_OBJECTS:=
 endif
 
-LIBPATH=${PWD}/../StarPU.jl/lib
-
 all: ${EXTERNLIB}
 
 mandelbrot: mandelbrot.c cpu_mandelbrot.o #gpu_mandelbrot.o
-	$(CC) $(CPU_CFLAGS) $^ -o $@ $(LDFLAGS)
+	$(CC) $(CPU_CFLAGS) $^ -o $@ $(LDFLAGS) -lm
 
 %.o: %.c
 	$(CC) -c -fPIC $(CPU_CFLAGS) $^ -o $@
@@ -47,12 +45,12 @@ clean:
 
 # Performance Tests
 cstarpu.dat: mandelbrot
-	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mandelbrot -0.800671 -0.158392 32 32 4096 4 > $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mandelbrot > $@
 julia_generatedc.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl $@
 julia_native.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot_native.jl $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot_native.jl $@
 julia_calllib.dat: ${EXTERNLIB}
-	LD_LIBRARY_PATH+=${LIBPATH} JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl julia_calllib.dat
+	JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl julia_calllib.dat
 
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat

+ 79 - 0
julia/examples/mandelbrot/cpu_mandelbrot.c

@@ -0,0 +1,79 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+#include <stdio.h>
+#include <starpu.h>
+#include <math.h>
+#include "cpu_mandelbrot.h"
+
+void cpu_mandelbrot(void *descr[], void *cl_arg)
+{
+        long long *pixels;
+
+        pixels = (long long int *)STARPU_MATRIX_GET_PTR(descr[0]);
+        struct params *params = (struct params *) cl_arg;
+
+        long width = STARPU_MATRIX_GET_NY(descr[0]);
+        long height = STARPU_MATRIX_GET_NX(descr[0]);
+        double zoom = width * 0.25296875;
+        double iz = 1. / zoom;
+        float diverge = 4.0;
+        float max_iterations = (width/2) * 0.049715909 * log10(zoom);
+        float imi = 1. / max_iterations;
+        double centerr = params->centerr;
+        double centeri = params->centeri;
+        long offset = params->offset;
+        long dim = params->dim;
+        double cr = 0;
+        double zr = 0;
+        double ci = 0;
+        double zi = 0;
+        long n = 0;
+        double tmp = 0;
+        int ldP = STARPU_MATRIX_GET_LD(descr[0]);
+
+        long long x,y;
+
+        for (y = 0; y < height; y++)
+	{
+                for (x = 0; x < width; x++)
+		{
+                        cr = centerr + (x - (dim/2)) * iz;
+			zr = cr;
+                        ci = centeri + (y+offset - (dim/2)) * iz;
+                        zi = ci;
+
+                        for (n = 0; n <= max_iterations; n++)
+			{
+				if (zr*zr + zi*zi>diverge) break;
+                                tmp = zr*zr - zi*zi + cr;
+                                zi = 2*zr*zi + ci;
+                                zr = tmp;
+                        }
+			if (n<max_iterations)
+				pixels[y +x*ldP] = round(15.*n*imi);
+			else
+				pixels[y +x*ldP] = 0;
+		}
+	}
+}
+
+char* CPU = "cpu_mandelbrot";
+char* GPU = "gpu_mandelbrot";
+extern char *starpu_find_function(char *name, char *device)
+{
+	if (!strcmp(device,"gpu")) return GPU;
+	return CPU;
+}

+ 8 - 10
julia/StarPU.jl/src/jlstarpu_simple_functions.c

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -13,14 +13,12 @@
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
-#include "jlstarpu.h"
-
-int jlstarpu_init(void)
+struct params
 {
-	return starpu_init(NULL);
-}
+        double centerr;
+        double centeri;
+        long offset;
+        long dim;
+};
+
 
-void jlstarpu_set_to_zero(void * ptr, unsigned int size)
-{
-	memset(ptr, 0, size);
-}

+ 78 - 56
julia/mandelbrot/mandelbrot.c

@@ -1,5 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2019       Mael Keryell
  *
  * StarPU is free software; you can redistribute it and/or modify
@@ -13,36 +14,35 @@
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
+
 #include <stdio.h>
 #include <stdlib.h>
 #include <starpu.h>
+#include "cpu_mandelbrot.h"
 
 void cpu_mandelbrot(void **, void *);
 void gpu_mandelbrot(void **, void *);
 
 static struct starpu_perfmodel model =
 {
-		.type = STARPU_HISTORY_BASED,
-		.symbol = "history_perf"
+	.type = STARPU_HISTORY_BASED,
+	.symbol = "history_perf"
 };
 
 static struct starpu_codelet cl =
 {
-	.cpu_funcs = {cpu_mandelbrot},
+ 	.cpu_funcs = {cpu_mandelbrot},
 	//.cuda_funcs = {gpu_mandelbrot},
-	.nbuffers = 2,
-	.modes = {STARPU_W, STARPU_R},
+	.nbuffers = 1,
+	.modes = {STARPU_W},
 	.model = &model
 };
 
-
-void mandelbrot_with_starpu(long long *pixels, float *params, long long dim, long long nslicesx)
+void mandelbrot_with_starpu(long long *pixels, struct params *p, long long dim, long long nslicesx)
 {
 	starpu_data_handle_t pixels_handle;
-	starpu_data_handle_t params_handle;
 
 	starpu_matrix_data_register(&pixels_handle, STARPU_MAIN_RAM, (uintptr_t)pixels, dim, dim, dim, sizeof(long long));
-	starpu_matrix_data_register(&params_handle, STARPU_MAIN_RAM, (uintptr_t)params, 4*nslicesx, 4*nslicesx, 1, sizeof(float));
 
 	struct starpu_data_filter horiz =
 	{
@@ -51,90 +51,95 @@ void mandelbrot_with_starpu(long long *pixels, float *params, long long dim, lon
 	};
 
 	starpu_data_partition(pixels_handle, &horiz);
-	starpu_data_partition(params_handle, &horiz);
 
 	long long taskx;
 
-	for (taskx = 0; taskx < nslicesx; taskx++){
+	for (taskx = 0; taskx < nslicesx; taskx++)
+	{
 		struct starpu_task *task = starpu_task_create();
 
 		task->cl = &cl;
 		task->handles[0] = starpu_data_get_child(pixels_handle, taskx);
-		task->handles[1] = starpu_data_get_child(params_handle, taskx);
+		task->cl_arg = p;
+		task->cl_arg_size = sizeof(*p);
 		if (starpu_task_submit(task)!=0) fprintf(stderr,"submit task error\n");
 	}
 
 	starpu_task_wait_for_all();
 
 	starpu_data_unpartition(pixels_handle, STARPU_MAIN_RAM);
-	starpu_data_unpartition(params_handle, STARPU_MAIN_RAM);
-
 	starpu_data_unregister(pixels_handle);
-	starpu_data_unregister(params_handle);
 }
 
 void pixels2img(long long *pixels, long long width, long long height, const char *filename)
 {
-  FILE *fp = fopen(filename, "w");
-  if (!fp)
-    return;
+	FILE *fp = fopen(filename, "w");
+	if (!fp)
+		return;
 
-  int MAPPING[16][3] = {{66,30,15},{25,7,26},{9,1,47},{4,4,73},{0,7,100},{12,44,138},{24,82,177},{57,125,209},{134,181,229},{211,236,248},{241,233,191},{248,201,95},{255,170,0},{204,128,0},{153,87,0},{106,52,3}};
+	int MAPPING[16][3] = {{66,30,15},{25,7,26},{9,1,47},{4,4,73},{0,7,100},{12,44,138},{24,82,177},{57,125,209},{134,181,229},{211,236,248},{241,233,191},{248,201,95},{255,170,0},{204,128,0},{153,87,0},{106,52,3}};
 
-  fprintf(fp, "P3\n%lld %lld\n255\n", width, height);
-  long long i, j;
-  for (i = 0; i < height; ++i) {
-    for (j = 0; j < width; ++j) {
-      fprintf(fp, "%d %d %d ", MAPPING[pixels[j*width+i]][0], MAPPING[pixels[j*width+i]][1], MAPPING[pixels[j*width+i]][2]);
-    }
-  }
+	fprintf(fp, "P3\n%lld %lld\n255\n", width, height);
+	long long i, j;
+	for (i = 0; i < height; ++i)
+	{
+		for (j = 0; j < width; ++j)
+		{
+			fprintf(fp, "%d %d %d ", MAPPING[pixels[j*width+i]][0], MAPPING[pixels[j*width+i]][1], MAPPING[pixels[j*width+i]][2]);
+		}
+	}
 
-  fclose(fp);
+	fclose(fp);
 }
 
-double min_times(double cr, double ci, long long dim, long long nslices)
+double min_times(double cr, double ci, long long dim, long long nslices, int gen_images)
 {
 	long long *pixels = calloc(dim*dim, sizeof(long long));
-	float *params = calloc(4*nslices, sizeof(float));
+	struct params *p = calloc(nslices, sizeof(struct params));
 
 	double t_min = 0;
 	long long i;
 
-	for (i=0; i<nslices; i++) {
-		params[4*i+0] = cr;
-		params[4*i+1] = ci;
-		params[4*i+2] = i*dim/nslices;
-		params[4*i+3] = dim;
+	for (i=0; i<nslices; i++)
+	{
+		p[i].centerr = cr;
+		p[i].centeri = ci;
+		p[i].offset = i*dim/nslices;
+		p[i].dim = dim;
 	}
 
 	double start, stop, exec_t;
-	for (i = 0; i < 10; i++){
+	for (i = 0; i < 10; i++)
+	{
 		start = starpu_timing_now(); // starpu_timing_now() gives the time in microseconds.
-		mandelbrot_with_starpu(pixels, params, dim, nslices);
+		mandelbrot_with_starpu(pixels, &p[i], dim, nslices);
 		stop = starpu_timing_now();
 		exec_t = (stop-start)*1.e3;
 		if (t_min==0 || t_min>exec_t)
 		  t_min = exec_t;
 	}
 
-	char filename[64];
-	snprintf(filename, 64, "out%lld.ppm", dim);
-	pixels2img(pixels,dim,dim,filename);
+	if (gen_images == 1)
+	{
+		char filename[64];
+		snprintf(filename, 64, "out%lld.ppm", dim);
+		pixels2img(pixels,dim,dim,filename);
+	}
 
 	free(pixels);
-	free(params);
+	free(p);
 
 	return t_min;
 }
 
-void display_times(double cr, double ci, long long start_dim, long long step_dim, long long stop_dim, long long nslices)
+void display_times(double cr, double ci, long long start_dim, long long step_dim, long long stop_dim, long long nslices, int gen_images)
 {
-
 	long long dim;
 
-	for (dim = start_dim; dim <= stop_dim; dim += step_dim) {
+	for (dim = start_dim; dim <= stop_dim; dim += step_dim)
+	{
 		printf("Dimension: %lld...\n", dim);
-		double res = min_times(cr, ci, dim, nslices);
+		double res = min_times(cr, ci, dim, nslices, gen_images);
 		res = res / dim / dim; // time per pixel
 		printf("%lld %lf\n", dim, res);
 	}
@@ -142,23 +147,40 @@ void display_times(double cr, double ci, long long start_dim, long long step_dim
 
 int main(int argc, char **argv)
 {
-	if (argc != 7){
-		printf("Usage: %s cr ci start_dim step_dim stop_dim nslices(must divide dims)\n", argv[0]);
-		return 1;
+	double cr, ci;
+	long long start_dim, step_dim, stop_dim, nslices;
+	int gen_images;
+
+	if (argc != 8)
+	{
+		printf("Usage: %s cr ci start_dim step_dim stop_dim nslices(must divide dims) gen_images. Using default parameters\n", argv[0]);
+
+		cr = -0.800671;
+		ci = -0.158392;
+		start_dim = 32;
+		step_dim = 32;
+		stop_dim = 512;
+		nslices = 4;
+		gen_images = 0;
 	}
-	if (starpu_init(NULL) != EXIT_SUCCESS){
+	else
+	{
+		cr = (float) atof(argv[1]);
+		ci = (float) atof(argv[2]);
+		start_dim = atoll(argv[3]);
+		step_dim = atoll(argv[4]);
+		stop_dim = atoll(argv[5]);
+		nslices = atoll(argv[6]);
+		gen_images = atoi(argv[7]);
+	}
+
+	if (starpu_init(NULL) != EXIT_SUCCESS)
+	{
 		fprintf(stderr, "ERROR\n");
 		return 77;
 	}
 
-	double cr = (float) atof(argv[1]);
-	double ci = (float) atof(argv[2]);
-	long long start_dim = atoll(argv[3]);
-	long long step_dim = atoll(argv[4]);
-	long long stop_dim = atoll(argv[5]);
-	long long nslices = atoll(argv[6]);
-
-	display_times(cr, ci, start_dim, step_dim, stop_dim, nslices);
+	display_times(cr, ci, start_dim, step_dim, stop_dim, nslices, gen_images);
 
 	starpu_shutdown();
 

+ 26 - 12
julia/mandelbrot/mandelbrot.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 using StarPU
 using LinearAlgebra
@@ -34,7 +49,7 @@ using LinearAlgebra
                 zi = 2*zr*zi + ci
                 zr = tmp
             end
-            
+
             if (n < max_iterations)
                 pixels[y,x] = round(15 * n * imi)
             else
@@ -46,17 +61,16 @@ using LinearAlgebra
     return
 end
 
-@debugprint "starpu_init"
 starpu_init()
 
 function mandelbrot_with_starpu(A ::Matrix{Int64}, cr ::Float64, ci ::Float64, dim ::Int64, nslicesx ::Int64)
-    horiz = StarpuDataFilter(STARPU_MATRIX_FILTER_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesx)
     @starpu_block let
 	hA = starpu_data_register(A)
 	starpu_data_partition(hA,horiz)
 
 	@starpu_sync_tasks for taskx in (1 : nslicesx)
-                @starpu_async_cl mandelbrot(hA[taskx]) [STARPU_W] [cr, ci, (taskx-1)*dim/nslicesx, dim]
+                @starpu_async_cl mandelbrot(hA[taskx]) [STARPU_W] (cr, ci, Int64((taskx-1)*dim/nslicesx), dim)
 	end
     end
 end
@@ -74,9 +88,9 @@ function pixels2img(pixels ::Matrix{Int64}, width ::Int64, height ::Int64, filen
     end
 end
 
-function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
+function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64, gen_images)
     tmin=0;
-    
+
     pixels ::Matrix{Int64} = zeros(dim, dim)
     for i = 1:10
         t = time_ns();
@@ -86,21 +100,21 @@ function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
             tmin=t
         end
     end
-    pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    if (gen_images == 1)
+        pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    end
     return tmin
 end
 
-function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64)
+function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64, gen_images)
     for dim in (start_dim : step_dim : stop_dim)
-        res = min_times(cr, ci, dim, nslices)
+        res = min_times(cr, ci, dim, nslices, gen_images)
         res=res/dim/dim; # time per pixel
         println("$(dim) $(res)")
     end
 end
 
 
-display_time(-0.800671,-0.158392,32,32,4096,4)
+display_time(-0.800671,-0.158392,32,32,512,4, 0)
 
-@debugprint "starpu_shutdown"
 starpu_shutdown()
-

+ 21 - 0
julia/examples/mandelbrot/mandelbrot.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh mandelbrot/mandelbrot.jl
+$(dirname $0)/../execute.sh mandelbrot/mandelbrot_native.jl
+$(dirname $0)/../execute.sh -calllib mandelbrot/cpu_mandelbrot.c mandelbrot/mandelbrot.jl
+

+ 22 - 5
julia/mandelbrot/mandelbrot_native.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 using LinearAlgebra
 
 function mandelbrot(pixels, centerr ::Float64, centeri ::Float64, offset ::Int64, dim ::Int64) :: Nothing
@@ -68,7 +83,7 @@ function pixels2img(pixels ::Matrix{Int64}, width ::Int64, height ::Int64, filen
     end
 end
 
-function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
+function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64, gen_images)
     tmin=0;
 
     pixels ::Matrix{Int64} = zeros(dim, dim)
@@ -80,17 +95,19 @@ function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
             tmin=t
         end
     end
-    pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    if (gen_images == 1)
+        pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    end
     return tmin
 end
 
-function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64)
+function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64, gen_images)
     for dim in (start_dim : step_dim : stop_dim)
-        res = min_times(cr, ci, dim, nslices)
+        res = min_times(cr, ci, dim, nslices, gen_images)
         res=res/dim/dim; # time per pixel
         println("$(dim) $(res)")
     end
 end
 
 
-display_time(-0.800671,-0.158392,32,32,4096,4)
+display_time(-0.800671,-0.158392,32,32,512,4, 0)

+ 11 - 15
julia/mult/Makefile

@@ -1,9 +1,6 @@
-# tile size. Should be changed in mult.jl as well
-STRIDE=72
-
 # ICC compiler
 #CC =icc
-#CFLAGS=-restrict -unroll4 -ipo -falign-loops=256 -O3 -DSTRIDE=${STRIDE} -march=native $(shell pkg-config --cflags starpu-1.3)
+#CFLAGS=-restrict -unroll4 -ipo -falign-loops=256 -O3 -march=native $(shell pkg-config --cflags starpu-1.3)
 # GCC compiler
 CC=gcc
 NVCC=nvcc
@@ -14,7 +11,7 @@ ifeq ($(ENABLE_CUDA),yes)
         LD := ${NVCC}
 endif
 
-CFLAGS = -O3 -g -DSTRIDE=${STRIDE} $(shell pkg-config --cflags starpu-1.3)
+CFLAGS = -O3 -g $(shell pkg-config --cflags starpu-1.3)
 CPU_CFLAGS = ${CFLAGS} -Wall -mavx -fomit-frame-pointer -march=native -ffast-math
 CUDA_CFLAGS = ${CFLAGS}
 LDFLAGS +=$(shell pkg-config --libs starpu-1.3)
@@ -28,9 +25,6 @@ ifneq ($(ENABLE_CUDA),yes)
 	CUDA_OBJECTS:=
 endif
 
-
-LIBPATH=${PWD}/../StarPU.jl/lib
-
 all: ${EXTERNLIB}
 
 mult: mult.c cpu_mult.o #gpu_mult.o
@@ -53,14 +47,16 @@ ${GENERATEDLIB}: $(C_OBJECTS) $(CUDA_OBJECTS)
 clean:
 	rm -f mult *.so *.o genc_*.c gencuda_*.cu *.dat
 
+tjulia: julia_generatedc.dat
 # Performance Tests
+STRIDE=72
 cstarpu.dat: mult
-	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mult > $@
-julia_generatedc.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $@
-julia_native.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult_native.jl $@
-julia_calllib.dat: ${EXTERNLIB}
-	LD_LIBRARY_PATH+=${LIBPATH} JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl julia_calllib.dat
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mult $(STRIDE) > $@
+julia_generatedc.dat: mult.jl
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $(STRIDE) $@
+julia_native.dat: mult_native.jl
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult_native.jl $(STRIDE) $@
+julia_calllib.dat: ${EXTERNLIB} mult.jl
+	JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $(STRIDE) julia_calllib.dat
 
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat

julia/mult/README → julia/examples/mult/README


+ 24 - 13
julia/mult/cpu_mult.c

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2018       Alexis Juven
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -13,26 +14,30 @@
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
+
 #include <stdint.h>
 #include <stdio.h>
 #include <string.h>
 #include <starpu.h>
+
 /*
  * The codelet is passed 3 matrices, the "descr" union-type field gives a
  * description of the layout of those 3 matrices in the local memory (ie. RAM
  * in the case of CPU, GPU frame buffer in the case of GPU etc.). Since we have
  * registered data with the "matrix" data interface, we use the matrix macros.
  */
-void cpu_mult(void *descr[], void *arg)
+void cpu_mult(void *descr[], void *cl_arg)
 {
-	(void)arg;
+	int stride;
 	float *subA, *subB, *subC;
+
+	stride = *((int *)cl_arg);
+
 	/* .blas.ptr gives a pointer to the first element of the local copy */
 	subA = (float *)STARPU_MATRIX_GET_PTR(descr[0]);
 	subB = (float *)STARPU_MATRIX_GET_PTR(descr[1]);
 	subC = (float *)STARPU_MATRIX_GET_PTR(descr[2]);
 
-
 	/* .blas.nx is the number of rows (consecutive elements) and .blas.ny
 	 * is the number of lines that are separated by .blas.ld elements (ld
 	 * stands for leading dimension).
@@ -50,14 +55,18 @@ void cpu_mult(void *descr[], void *arg)
 	int i,j,k,ii,jj,kk;
 	for (i = 0; i < nyC*nxC; i++) subC[i] = 0;
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d %d/%d on %d\n",nyC,nyA,nxC,starpu_worker_get_id(),STARPU_NMAXWORKERS,starpu_worker_get_devid(starpu_worker_get_id()));
-	for (i=0;i<nyC;i+=STRIDE) {
-		for (k=0;k<nyA;k+=STRIDE) {
-			for (j=0;j<nxC;j+=STRIDE) {
-				
-				for (ii = i; ii < i+STRIDE; ii+=2) {
+	for (i=0;i<nyC;i+=stride)
+	{
+		for (k=0;k<nyA;k+=stride)
+		{
+			for (j=0;j<nxC;j+=stride)
+			{
+				for (ii = i; ii < i+stride; ii+=2)
+				{
 					float *sC0=subC+ii*ldC+j;
 					float *sC1=subC+ii*ldC+ldC+j;
-					for (kk = k; kk < k+STRIDE; kk+=4) {
+					for (kk = k; kk < k+stride; kk+=4)
+					{
 						float alpha00=subB[kk +  ii*ldB];
 						float alpha01=subB[kk+1+ii*ldB];
 						float alpha10=subB[kk+  ii*ldB+ldB];
@@ -70,7 +79,8 @@ void cpu_mult(void *descr[], void *arg)
 						float *sA1=subA+kk*ldA+ldA+j;
 						float *sA2=subA+kk*ldA+2*ldA+j;
 						float *sA3=subA+kk*ldA+3*ldA+j;
-						for (jj = 0; jj < STRIDE; jj+=1) {
+						for (jj = 0; jj < stride; jj+=1)
+						{
 							sC0[jj] += alpha00*sA0[jj]+alpha01*sA1[jj]+alpha02*sA2[jj]+alpha03*sA3[jj];
 							sC1[jj] += alpha10*sA0[jj]+alpha11*sA1[jj]+alpha12*sA2[jj]+alpha13*sA3[jj];
 						}
@@ -80,11 +90,12 @@ void cpu_mult(void *descr[], void *arg)
 		}
 	}
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d\n",nyC,nyA,nxC);
-
 }
+
 char* CPU = "cpu_mult";
 char* GPU = "gpu_mult";
-extern char *starpu_find_function(char *name, char *device) {
+extern char *starpu_find_function(char *name, char *device)
+{
 	if (!strcmp(device,"gpu")) return GPU;
 	return CPU;
 }

+ 2 - 1
julia/mult/gpu_mult.cu

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2018       Alexis Juven
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by

+ 50 - 59
julia/mult/mult.c

@@ -1,10 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
- * Copyright (C) 2018                                     Alexis Juven
- * Copyright (C) 2012,2013                                Inria
- * Copyright (C) 2009-2011,2013-2015                      Université de Bordeaux
- * Copyright (C) 2010                                     Mehdi Juhoor
- * Copyright (C) 2010-2013,2015,2017                      CNRS
+ * Copyright (C) 2018       Alexis Juven
+ * Copyright (C) 2010-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  *
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
@@ -40,8 +37,6 @@
 
 #include <starpu.h>
 
-
-
 /*
  * That program should compute C = A * B
  *
@@ -63,43 +58,32 @@
 
  */
 
-
-
-
-
 //void gpu_mult(void **, void *);
 void cpu_mult(void **, void *);
 
-
 static struct starpu_perfmodel model =
 {
-		.type = STARPU_HISTORY_BASED,
-		.symbol = "history_perf"
+	.type = STARPU_HISTORY_BASED,
+	.symbol = "history_perf"
 };
 
 static struct starpu_codelet cl =
 {
-		.cpu_funcs = {cpu_mult},
-		.cpu_funcs_name = {"cpu_mult"},
-		//.cuda_funcs = {gpu_mult},
-		.nbuffers = 3,
-		.modes = {STARPU_R, STARPU_R, STARPU_W},
-		.model = &model
+	.cpu_funcs = {cpu_mult},
+	.cpu_funcs_name = {"cpu_mult"},
+	//.cuda_funcs = {gpu_mult},
+	.nbuffers = 3,
+	.modes = {STARPU_R, STARPU_R, STARPU_W},
+	.model = &model
 };
 
-
-void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigned ydim,  unsigned zdim, unsigned nslicesx, unsigned nslicesy)
+void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigned ydim,  unsigned zdim, unsigned nslicesx, unsigned nslicesy, int stride)
 {
 	starpu_data_handle_t A_handle, B_handle, C_handle;
 
-
-	starpu_matrix_data_register(&A_handle, STARPU_MAIN_RAM, (uintptr_t)A,
-			ydim, ydim, zdim, sizeof(float));
-	starpu_matrix_data_register(&B_handle, STARPU_MAIN_RAM, (uintptr_t)B,
-			zdim, zdim, xdim, sizeof(float));
-	starpu_matrix_data_register(&C_handle, STARPU_MAIN_RAM, (uintptr_t)C,
-			ydim, ydim, xdim, sizeof(float));
-
+	starpu_matrix_data_register(&A_handle, STARPU_MAIN_RAM, (uintptr_t)A, ydim, ydim, zdim, sizeof(float));
+	starpu_matrix_data_register(&B_handle, STARPU_MAIN_RAM, (uintptr_t)B, zdim, zdim, xdim, sizeof(float));
+	starpu_matrix_data_register(&C_handle, STARPU_MAIN_RAM, (uintptr_t)C, ydim, ydim, xdim, sizeof(float));
 
 	struct starpu_data_filter vert =
 	{
@@ -113,31 +97,32 @@ void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigne
 			.nchildren = nslicesy
 	};
 
-
 	starpu_data_partition(B_handle, &vert);
 	starpu_data_partition(A_handle, &horiz);
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
 
 	unsigned taskx, tasky;
 
-	for (taskx = 0; taskx < nslicesx; taskx++){
-		for (tasky = 0; tasky < nslicesy; tasky++){
-
+	for (taskx = 0; taskx < nslicesx; taskx++)
+	{
+		for (tasky = 0; tasky < nslicesy; tasky++)
+		{
 			struct starpu_task *task = starpu_task_create();
 
 			task->cl = &cl;
 			task->handles[0] = starpu_data_get_sub_data(A_handle, 1, tasky);
 			task->handles[1] = starpu_data_get_sub_data(B_handle, 1, taskx);
 			task->handles[2] = starpu_data_get_sub_data(C_handle, 2, taskx, tasky);
+			task->cl_arg = &stride;
+			task->cl_arg_size = sizeof(stride);
 
-			if (starpu_task_submit(task)!=0) fprintf(stderr,"submit task error\n");
-
+			int ret = starpu_task_submit(task);
+			STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
 		}
 	}
 
 	starpu_task_wait_for_all();
 
-
 	starpu_data_unpartition(A_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(B_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(C_handle, STARPU_MAIN_RAM);
@@ -145,31 +130,27 @@ void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigne
 	starpu_data_unregister(A_handle);
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(C_handle);
-
 }
 
-
-
 void init_rand(float * m, unsigned width, unsigned height)
 {
 	unsigned i,j;
 
-	for (j = 0 ; j < height ; j++){
-		for (i = 0 ; i < width ; i++){
+	for (j = 0 ; j < height ; j++)
+	{
+		for (i = 0 ; i < width ; i++)
+		{
 			m[j+i*height] = (float)(starpu_drand48());
 		}
 	}
 }
 
-
 void init_zero(float * m, unsigned width, unsigned height)
 {
 	memset(m, 0, sizeof(float) * width * height);
 }
 
-
-
-double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, unsigned nsclicesx, unsigned nsclicesy)
+double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, unsigned nsclicesx, unsigned nsclicesy, int stride)
 {
 	unsigned i;
 
@@ -179,8 +160,8 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 
 	double exec_times=-1;
 
-	for (i = 0 ; i < nb_test ; i++){
-
+	for (i = 0 ; i < nb_test ; i++)
+	{
 		double start, stop, exec_t;
 
 		init_rand(A, zdim, ydim);
@@ -188,7 +169,7 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 		init_zero(C, xdim, ydim);
 
 		start = starpu_timing_now();
-		multiply_with_starpu(A, B, C, xdim, ydim, zdim, nsclicesx, nsclicesy);
+		multiply_with_starpu(A, B, C, xdim, ydim, zdim, nsclicesx, nsclicesy, stride);
 		stop = starpu_timing_now();
 
 		exec_t = (stop - start)*1.e3; // Put in ns instead of us
@@ -201,34 +182,44 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 	return exec_times;
 }
 
-
-void display_times(unsigned start_dim, unsigned step_dim, unsigned stop_dim, unsigned nb_tests, unsigned nsclicesx, unsigned nsclicesy)
+void display_times(unsigned start_dim, unsigned step_dim, unsigned stop_dim, unsigned nb_tests, unsigned nsclicesx, unsigned nsclicesy, int stride)
 {
 	unsigned dim;
 
-	for (dim = start_dim ; dim <= stop_dim ; dim += step_dim){
-		double t = min_time(nb_tests, dim, dim, dim, nsclicesx, nsclicesy);
+	for (dim = start_dim ; dim <= stop_dim ; dim += step_dim)
+	{
+		double t = min_time(nb_tests, dim, dim, dim, nsclicesx, nsclicesy, stride);
 		printf("%f %f\n", dim*dim*4.*3./1024./1024, (2.*dim-1.)*dim*dim/t);
 	}
-
 }
 
+#define STRIDE_DEFAULT 8
 
 int main(int argc, char * argv[])
 {
-	if (starpu_init(NULL) != EXIT_SUCCESS){
+	int stride=STRIDE_DEFAULT;
+	if (argc >= 2)
+		stride = atoi(argv[1]);
+	if (stride % 4 != 0)
+	{
+		fprintf(stderr, "STRIDE must be a multiple of 4 (%d)\n", stride);
+		return -1;
+	}
+
+	if (starpu_init(NULL) != EXIT_SUCCESS)
+	{
 		fprintf(stderr, "ERROR\n");
 		return 77;
 	}
 
-	unsigned start_dim = 16*STRIDE;
-	unsigned step_dim = 4*STRIDE;
-	unsigned stop_dim = 4096;
+	unsigned start_dim = 16*stride;
+	unsigned step_dim = 4*stride;
+	unsigned stop_dim = 128*stride;
 	unsigned nb_tests = 10;
 	unsigned nsclicesx = 2;
 	unsigned nsclicesy = 2;
 
-	display_times(start_dim, step_dim, stop_dim, nb_tests, nsclicesx, nsclicesy);
+	display_times(start_dim, step_dim, stop_dim, nb_tests, nsclicesx, nsclicesy, stride);
 
 	starpu_shutdown();
 

+ 35 - 18
julia/mult/mult.jl

@@ -1,12 +1,24 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 using StarPU
 using LinearAlgebra
 
-#shoud be the same as in the makefile
-const STRIDE = 72
-
 @target STARPU_CPU+STARPU_CUDA
-@codelet function matrix_mult(m1 :: Matrix{Float32}, m2 :: Matrix{Float32}, m3 :: Matrix{Float32}) :: Nothing
+@codelet function matrix_mult(m1 :: Matrix{Float32}, m2 :: Matrix{Float32}, m3 :: Matrix{Float32}, stride ::Int32) :: Nothing
 
     width_m2 :: Int32 = width(m2)
     height_m1 :: Int32 = height(m1)
@@ -57,25 +69,24 @@ const STRIDE = 72
 end
 
 
-@debugprint "starpu_init"
 starpu_init()
 
-function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy)
+function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy, stride)
     scale= 3
     tmin=0
-    vert = StarpuDataFilter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
-    horiz = StarpuDataFilter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
+    vert = starpu_data_filter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
     @starpu_block let
         hA,hB,hC = starpu_data_register(A, B, C)
         starpu_data_partition(hB, vert)
         starpu_data_partition(hA, horiz)
         starpu_data_map_filters(hC, vert, horiz)
         tmin=0
-        perfmodel = StarpuPerfmodel(
-            perf_type = STARPU_HISTORY_BASED,
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
             symbol = "history_perf"
         )
-        cl = StarpuCodelet(
+        cl = starpu_codelet(
             cpu_func = CPU_CODELETS["matrix_mult"],
             # cuda_func = CUDA_CODELETS["matrix_mult"],
             #opencl_func="ocl_matrix_mult",
@@ -89,7 +100,7 @@ function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: M
                 for taskx in (1 : nslicesx)
                     for tasky in (1 : nslicesy)
                         handles = [hA[tasky], hB[taskx], hC[taskx, tasky]]
-                        task = StarpuTask(cl = cl, handles = handles)
+                        task = starpu_task(cl = cl, handles = handles, cl_arg=(Int32(stride),))
                         starpu_task_submit(task)
                         #@starpu_async_cl matrix_mult(hA[tasky], hB[taskx], hC[taskx, tasky])
                     end
@@ -124,12 +135,12 @@ function approximately_equals(
     return true
 end
 
-function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy, stride)
     for dim in (start_dim : step_dim : stop_dim)
         A = Array(rand(Cfloat, dim, dim))
         B = Array(rand(Cfloat, dim, dim))
         C = zeros(Float32, dim, dim)
-        mt =  multiply_with_starpu(A, B, C, nslicesx, nslicesy)
+        mt =  multiply_with_starpu(A, B, C, nslicesx, nslicesy, stride)
         flops = (2*dim-1)*dim*dim/mt
         size=dim*dim*4*3/1024/1024
         println(io,"$size $flops")
@@ -137,10 +148,16 @@ function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
     end
 end
 
-
-io=open(ARGS[1],"w")
-compute_times(io,16*STRIDE,4*STRIDE,4096,2,2)
+if size(ARGS, 1) < 2
+    stride=4
+    filename="x.dat"
+else
+    stride=parse(Int, ARGS[1])
+    filename=ARGS[2]
+end
+io=open(filename,"w")
+compute_times(io,16*stride,4*stride,128*stride,2,2,stride)
 close(io)
-@debugprint "starpu_shutdown"
+
 starpu_shutdown()
 

julia/mult/mult.plot → julia/examples/mult/mult.plot


+ 57 - 0
julia/examples/mult/mult_native.jl

@@ -0,0 +1,57 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+import Libdl
+using StarPU
+using LinearAlgebra
+
+function multiply_without_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy, stride)
+    tmin = 0
+    for i in (1 : 10 )
+        t=time_ns()
+        C = A * B;
+        t=time_ns() - t
+        if (tmin==0 || tmin>t)
+            tmin=t
+        end
+    end
+    return tmin
+end
+
+
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy, stride)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        mt =  multiply_without_starpu(A, B, C, nslicesx, nslicesy, stride)
+        flops = (2*dim-1)*dim*dim/mt
+        size=dim*dim*4*3/1024/1024
+        println(io,"$size $flops")
+        println("$size $flops")
+    end
+end
+
+if size(ARGS, 1) < 2
+    stride=4
+    filename="x.dat"
+else
+    stride=parse(Int, ARGS[1])
+    filename=ARGS[2]
+end
+io=open(filename,"w")
+compute_times(io,16*stride,4*stride,128*stride,2,2,stride)
+close(io)
+

+ 22 - 0
julia/examples/mult/mult_starpu.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh mult/mult.jl
+$(dirname $0)/../execute.sh mult/mult_native.jl
+$(dirname $0)/../execute.sh -calllib mult/cpu_mult.c mult/mult.jl
+
+

+ 38 - 0
julia/examples/mult/perf.sh

@@ -0,0 +1,38 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+stride=72
+#stride=4
+
+export STARPU_NOPENCL=0
+export STARPU_SCHED=dmda
+export STARPU_CALIBRATE=1
+
+rm -f ./cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat
+
+$(dirname $0)/mult $stride > ./cstarpu.dat
+$(dirname $0)/../execute.sh mult/mult.jl $stride julia_generatedc.dat
+$(dirname $0)/../execute.sh mult/mult_native.jl $stride julia_native.dat
+$(dirname $0)/../execute.sh -calllib mult/cpu_mult.c mult/mult.jl $stride julia_calllib.dat
+
+(
+    cat <<EOF
+set output "comparison.pdf"
+set term pdf
+plot "julia_native.dat" w l,"cstarpu.dat" w l,"julia_generatedc.dat" w l,"julia_calllib.dat" w l
+EOF
+) | gnuplot

julia/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_gen_gcc9_1x4.dat → julia/examples/mult/res/mult_gen_gcc9_1x4.dat


julia/mult/res/mult_gen_gcc9_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_4x1.dat


julia/mult/res/mult_gen_gcc9_s100_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s100_4x1.dat


julia/mult/res/mult_gen_gcc9_s50_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s50_4x1.dat


julia/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat


julia/mult/res/mult_gen_gcc9_s72_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s72_4x1.dat


julia/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s80_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s80_4x1.dat


julia/mult/res/mult_gen_icc_s72_2x1_b4x2.dat → julia/examples/mult/res/mult_gen_icc_s72_2x1_b4x2.dat


julia/mult/res/mult_gen_icc_s72_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_icc_s72_4x4_b4x2.dat


julia/mult/res/mult_native.dat → julia/examples/mult/res/mult_native.dat


julia/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat → julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat


julia/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat


julia/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat


+ 0 - 0
julia/mult/res/mult_nogen_icc_s72x2_2x2_b4x2.dat


部分文件因为文件数量过多而无法显示