Explorar o código

Merge branch 'master' of https://scm.gforge.inria.fr/anonscm/git/starpu/starpu

root %!s(int64=5) %!d(string=hai) anos
pai
achega
879cdfc927
Modificáronse 100 ficheiros con 2336 adicións e 2465 borrados
  1. 4 0
      ChangeLog
  2. 8 4
      Makefile.am
  3. 4 0
      STARPU-VERSION
  4. 77 34
      configure.ac
  5. 3 2
      contrib/ci.inria.fr/job-1-check.sh
  6. 4 0
      doc/doxygen/Makefile.am
  7. 11 4
      doc/doxygen/chapters/470_simgrid.doxy
  8. 32 0
      doc/doxygen/chapters/501_environment_variables.doxy
  9. 34 0
      doc/doxygen/dev/starpu_check_include.sh
  10. 4 6
      examples/Makefile.am
  11. 21 1
      examples/cholesky/cholesky.sh
  12. 10 3
      examples/common/blas.h
  13. 2 0
      examples/mlr/mlr.c
  14. 1 1
      examples/mult/sgemm.sh
  15. 14 4
      examples/mult/xgemm.c
  16. 2 4
      examples/stencil/Makefile.am
  17. 8 0
      examples/tag_example/tag_example.c
  18. 11 0
      include/starpu.h
  19. 246 15
      include/starpu_bitmap.h
  20. 3 3
      include/starpu_sched_component.h
  21. 3 1
      include/starpu_task.h
  22. 27 0
      julia/Makefile.am
  23. 127 0
      julia/Manifest.toml
  24. 3 0
      julia/StarPU.jl/Project.toml
  25. 53 0
      julia/README
  26. 0 8
      julia/StarPU.jl/Makefile
  27. 0 4
      julia/StarPU.jl/Manifest.toml
  28. 0 2
      julia/StarPU.jl/REQUIRE
  29. 0 1290
      julia/StarPU.jl/src/StarPU.jl
  30. 0 349
      julia/StarPU.jl/src/compiler/cuda.jl
  31. 0 13
      julia/StarPU.jl/src/compiler/include.jl
  32. 0 38
      julia/StarPU.jl/src/compiler/utils.jl
  33. 0 133
      julia/StarPU.jl/src/jlstarpu_data_handles.c
  34. 0 73
      julia/StarPU.jl/src/jlstarpu_task.h
  35. 0 208
      julia/StarPU.jl/src/jlstarpu_task_submit.c
  36. 0 67
      julia/StarPU.jl/src/jlstarpu_utils.h
  37. 139 0
      julia/examples/Makefile.am
  38. 110 0
      julia/examples/axpy/axpy.jl
  39. 19 0
      julia/examples/axpy/axpy.sh
  40. 1 0
      julia/black_scholes/black_scholes.c
  41. 15 2
      julia/black_scholes/black_scholes.jl
  42. 93 0
      julia/examples/callback/callback.c
  43. 76 0
      julia/examples/callback/callback.jl
  44. 19 0
      julia/examples/callback/callback.sh
  45. 32 0
      julia/examples/check_deps/check_deps.jl
  46. 20 0
      julia/examples/check_deps/check_deps.sh
  47. 104 0
      julia/examples/dependency/end_dep.jl
  48. 18 0
      julia/examples/dependency/end_dep.sh
  49. 122 0
      julia/examples/dependency/tag_dep.jl
  50. 18 0
      julia/examples/dependency/tag_dep.sh
  51. 88 0
      julia/examples/dependency/task_dep.jl
  52. 18 0
      julia/examples/dependency/task_dep.sh
  53. 47 0
      julia/examples/execute.sh.in
  54. 130 0
      julia/examples/gemm/gemm.jl
  55. 21 0
      julia/examples/gemm/gemm.sh
  56. 56 0
      julia/examples/gemm/gemm_native.jl
  57. 5 7
      julia/mandelbrot/Makefile
  58. 79 0
      julia/examples/mandelbrot/cpu_mandelbrot.c
  59. 8 10
      julia/StarPU.jl/src/jlstarpu_simple_functions.c
  60. 78 56
      julia/mandelbrot/mandelbrot.c
  61. 26 12
      julia/mandelbrot/mandelbrot.jl
  62. 21 0
      julia/examples/mandelbrot/mandelbrot.sh
  63. 22 5
      julia/mandelbrot/mandelbrot_native.jl
  64. 11 15
      julia/mult/Makefile
  65. 0 0
      julia/examples/mult/README
  66. 24 13
      julia/mult/cpu_mult.c
  67. 2 1
      julia/mult/gpu_mult.cu
  68. 50 59
      julia/mult/mult.c
  69. 35 18
      julia/mult/mult.jl
  70. 0 0
      julia/examples/mult/mult.plot
  71. 57 0
      julia/examples/mult/mult_native.jl
  72. 22 0
      julia/examples/mult/mult_starpu.sh
  73. 38 0
      julia/examples/mult/perf.sh
  74. 0 0
      julia/examples/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat
  75. 0 0
      julia/examples/mult/res/mult_gen_gcc9_1x4.dat
  76. 0 0
      julia/examples/mult/res/mult_gen_gcc9_4x1.dat
  77. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s100_4x1.dat
  78. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s50_4x1.dat
  79. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat
  80. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat
  81. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat
  82. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat
  83. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat
  84. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat
  85. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2.dat
  86. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat
  87. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat
  88. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat
  89. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_4x1.dat
  90. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat
  91. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat
  92. 0 0
      julia/examples/mult/res/mult_gen_gcc9_s80_4x1.dat
  93. 0 0
      julia/examples/mult/res/mult_gen_icc_s72_2x1_b4x2.dat
  94. 0 0
      julia/examples/mult/res/mult_gen_icc_s72_4x4_b4x2.dat
  95. 0 0
      julia/examples/mult/res/mult_native.dat
  96. 0 0
      julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat
  97. 0 0
      julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat
  98. 0 0
      julia/examples/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat
  99. 0 0
      julia/examples/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat
  100. 0 0
      julia/mult/res/mult_nogen_icc_s72x2_2x2_b4x2.dat

+ 4 - 0
ChangeLog

@@ -18,6 +18,7 @@ StarPU 1.4.0 (git revision xxxx)
 ==============================================
 ==============================================
 New features:
 New features:
   * Fault tolerance support with starpu_task_ft_failed().
   * Fault tolerance support with starpu_task_ft_failed().
+  * Julia programming interface.
   * Add get_max_size method to data interfaces for applications using data with
   * Add get_max_size method to data interfaces for applications using data with
     variable size to express their maximal potential size.
     variable size to express their maximal potential size.
   * New offline tool to draw graph showing elapsed time between sent
   * New offline tool to draw graph showing elapsed time between sent
@@ -52,6 +53,9 @@ Small features:
   * Add STARPU_LIMIT_CPU_NUMA_MEM environment variable.
   * Add STARPU_LIMIT_CPU_NUMA_MEM environment variable.
   * Add STARPU_WORKERS_GETBIND environment variable.
   * Add STARPU_WORKERS_GETBIND environment variable.
   * Add STARPU_SCHED_SIMPLE_DECIDE_ALWAYS modular scheduler flag.
   * Add STARPU_SCHED_SIMPLE_DECIDE_ALWAYS modular scheduler flag.
+  * And STARPU_LIMIT_BANDWIDTH environment variable.
+  * Add field starpu_conf::precedence_over_environment_variables to ignore
+    environment variables when parameters are set directly in starpu_conf
 
 
 StarPU 1.3.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
 StarPU 1.3.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
 ====================================================================
 ====================================================================

+ 8 - 4
Makefile.am

@@ -57,6 +57,10 @@ if STARPU_BUILD_SC_HYPERVISOR
 SUBDIRS += sc_hypervisor
 SUBDIRS += sc_hypervisor
 endif
 endif
 
 
+if STARPU_USE_JULIA
+SUBDIRS += julia
+endif
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = libstarpu.pc starpu-1.0.pc starpu-1.1.pc starpu-1.2.pc starpu-1.3.pc
 pkgconfig_DATA = libstarpu.pc starpu-1.0.pc starpu-1.1.pc starpu-1.2.pc starpu-1.3.pc
 
 
@@ -159,28 +163,28 @@ DISTCLEANFILES = STARPU-REVISION
 recheck:
 recheck:
 	RET=0 ; \
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i recheck || RET=1 ; \
+		$(MAKE) -C $$i recheck || RET=1 ; \
 	done ; \
 	done ; \
 	exit $$RET
 	exit $$RET
 
 
 showfailed:
 showfailed:
 	@RET=0 ; \
 	@RET=0 ; \
 	for i in $(SUBDIRS) ; do \
 	for i in $(SUBDIRS) ; do \
-		make -s -C $$i showfailed || RET=1 ; \
+		$(MAKE) -s -C $$i showfailed || RET=1 ; \
 	done ; \
 	done ; \
 	exit $$RET
 	exit $$RET
 
 
 showcheck:
 showcheck:
 	RET=0 ; \
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i showcheck || RET=1 ; \
+		$(MAKE) -C $$i showcheck || RET=1 ; \
 	done ; \
 	done ; \
 	exit $$RET
 	exit $$RET
 
 
 showsuite:
 showsuite:
 	RET=0 ; \
 	RET=0 ; \
 	for i in $(SUBDIRS) ; do \
 	for i in $(SUBDIRS) ; do \
-		make -C $$i showsuite || RET=1 ; \
+		$(MAKE) -C $$i showsuite || RET=1 ; \
 	done ; \
 	done ; \
 	exit $$RET
 	exit $$RET
 
 

+ 4 - 0
STARPU-VERSION

@@ -60,3 +60,7 @@ LIBSOCL_INTERFACE_AGE=0		# set to CURRENT - PREVIOUS interface
 LIBSTARPURM_INTERFACE_CURRENT=0	# increment upon ABI change
 LIBSTARPURM_INTERFACE_CURRENT=0	# increment upon ABI change
 LIBSTARPURM_INTERFACE_REVISION=0	# increment upon implementation change
 LIBSTARPURM_INTERFACE_REVISION=0	# increment upon implementation change
 LIBSTARPURM_INTERFACE_AGE=0	# set to CURRENT - PREVIOUS interface
 LIBSTARPURM_INTERFACE_AGE=0	# set to CURRENT - PREVIOUS interface
+
+LIBSTARPUJULIA_INTERFACE_CURRENT=0	# increment upon ABI change
+LIBSTARPUJULIA_INTERFACE_REVISION=0	# increment upon implementation change
+LIBSTARPUJULIA_INTERFACE_AGE=0		# set to CURRENT - PREVIOUS interface

+ 77 - 34
configure.ac

@@ -59,6 +59,9 @@ AC_SUBST([LIBSTARPURM_INTERFACE_AGE])
 AC_SUBST([LIBSOCL_INTERFACE_CURRENT])
 AC_SUBST([LIBSOCL_INTERFACE_CURRENT])
 AC_SUBST([LIBSOCL_INTERFACE_REVISION])
 AC_SUBST([LIBSOCL_INTERFACE_REVISION])
 AC_SUBST([LIBSOCL_INTERFACE_AGE])
 AC_SUBST([LIBSOCL_INTERFACE_AGE])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_CURRENT])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_REVISION])
+AC_SUBST([LIBSTARPUJULIA_INTERFACE_AGE])
 
 
 AC_CANONICAL_SYSTEM
 AC_CANONICAL_SYSTEM
 
 
@@ -88,11 +91,21 @@ AC_CHECK_PROGS(PROG_DATE,gdate date)
 dnl locate pkg-config
 dnl locate pkg-config
 PKG_PROG_PKG_CONFIG
 PKG_PROG_PKG_CONFIG
 
 
+AC_ARG_ENABLE(simgrid, [AS_HELP_STRING([--enable-simgrid],
+			[Enable simulating execution in simgrid])],
+			enable_simgrid=$enableval, enable_simgrid=no)
+
 if test x$enable_perf_debug = xyes; then
 if test x$enable_perf_debug = xyes; then
     enable_shared=no
     enable_shared=no
 fi
 fi
+
 default_enable_mpi_check=maybe
 default_enable_mpi_check=maybe
-default_enable_mpi=maybe
+
+if test x$enable_simgrid = xyes ; then
+	default_enable_mpi=no
+else
+	default_enable_mpi=maybe
+fi
 
 
 ###############################################################################
 ###############################################################################
 #                                                                             #
 #                                                                             #
@@ -135,9 +148,6 @@ AC_ARG_WITH(simgrid-lib-dir,
 		enable_simgrid=yes
 		enable_simgrid=yes
 	], [simgrid_lib_dir=no])
 	], [simgrid_lib_dir=no])
 
 
-AC_ARG_ENABLE(simgrid, [AS_HELP_STRING([--enable-simgrid],
-			[Enable simulating execution in simgrid])],
-			enable_simgrid=$enableval, enable_simgrid=no)
 if test x$enable_simgrid = xyes ; then
 if test x$enable_simgrid = xyes ; then
    	if test -n "$SIMGRID_CFLAGS" ; then
    	if test -n "$SIMGRID_CFLAGS" ; then
 	   	CFLAGS="$SIMGRID_CFLAGS $CFLAGS"
 	   	CFLAGS="$SIMGRID_CFLAGS $CFLAGS"
@@ -189,8 +199,8 @@ if test x$enable_simgrid = xyes ; then
 	AC_CHECK_TYPES([smx_actor_t], [AC_DEFINE([STARPU_HAVE_SMX_ACTOR_T], [1], [Define to 1 if you have the smx_actor_t type.])], [], [[#include <simgrid/simix.h>]])
 	AC_CHECK_TYPES([smx_actor_t], [AC_DEFINE([STARPU_HAVE_SMX_ACTOR_T], [1], [Define to 1 if you have the smx_actor_t type.])], [], [[#include <simgrid/simix.h>]])
 
 
 	# Latest functions
 	# Latest functions
-	AC_CHECK_FUNCS([MSG_process_attach sg_actor_attach sg_actor_init sg_actor_set_stacksize MSG_zone_get_hosts sg_zone_get_hosts MSG_process_self_name MSG_process_userdata_init sg_actor_data])
-	AC_CHECK_FUNCS([xbt_mutex_try_acquire smpi_process_set_user_data SMPI_thread_create sg_zone_get_by_name sg_link_name sg_host_route sg_host_self sg_host_list sg_host_speed simcall_process_create sg_config_continue_after_help])
+	AC_CHECK_FUNCS([MSG_process_attach sg_actor_attach sg_actor_init sg_actor_set_stacksize sg_actor_on_exit MSG_zone_get_hosts sg_zone_get_hosts MSG_process_self_name MSG_process_userdata_init sg_actor_data])
+	AC_CHECK_FUNCS([xbt_mutex_try_acquire smpi_process_set_user_data SMPI_thread_create sg_zone_get_by_name sg_link_name sg_link_bandwidth_set sg_host_route sg_host_self sg_host_list sg_host_speed simcall_process_create sg_config_continue_after_help])
 	AC_CHECK_FUNCS([simgrid_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_SIMGRID_INIT], [1], [Define to 1 if you have the `simgrid_init' function.])])
 	AC_CHECK_FUNCS([simgrid_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_SIMGRID_INIT], [1], [Define to 1 if you have the `simgrid_init' function.])])
 	AC_CHECK_FUNCS([xbt_barrier_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_XBT_BARRIER_INIT], [1], [Define to 1 if you have the `xbt_barrier_init' function.])])
 	AC_CHECK_FUNCS([xbt_barrier_init], [AC_DEFINE([STARPU_SIMGRID_HAVE_XBT_BARRIER_INIT], [1], [Define to 1 if you have the `xbt_barrier_init' function.])])
 	AC_CHECK_FUNCS([sg_actor_sleep_for sg_actor_self sg_actor_ref sg_host_get_properties sg_host_send_to sg_host_sendto sg_cfg_set_int sg_actor_self_execute sg_actor_execute simgrid_get_clock])
 	AC_CHECK_FUNCS([sg_actor_sleep_for sg_actor_self sg_actor_ref sg_host_get_properties sg_host_send_to sg_host_sendto sg_cfg_set_int sg_actor_self_execute sg_actor_execute simgrid_get_clock])
@@ -372,6 +382,30 @@ AC_MSG_CHECKING(whether mpicxx is available)
 AC_MSG_RESULT($mpicxx_path)
 AC_MSG_RESULT($mpicxx_path)
 AC_SUBST(MPICXX, $mpicxx_path)
 AC_SUBST(MPICXX, $mpicxx_path)
 
 
+# Check if mpiexec is available
+if test x$enable_simgrid = xyes ; then
+    DEFAULT_MPIEXEC=smpirun
+    AC_ARG_WITH(smpirun, [AS_HELP_STRING([--with-smpirun[=<name of smpirun or path to smpirun>]], [Name or path of the smpirun helper])], [DEFAULT_MPIEXEC=$withval])
+else
+    DEFAULT_MPIEXEC=mpiexec
+    AC_ARG_WITH(mpiexec, [AS_HELP_STRING([--with-mpiexec=<name of mpiexec or path to mpiexec>], [Name or path of mpiexec])], [DEFAULT_MPIEXEC=$withval])
+fi
+
+case $DEFAULT_MPIEXEC in
+    /*) mpiexec_path="$DEFAULT_MPIEXEC" ;;
+    *)  AC_PATH_PROG(mpiexec_path, $DEFAULT_MPIEXEC, [no], [$MPIPATH])
+esac
+AC_MSG_CHECKING(whether mpiexec is available)
+AC_MSG_RESULT($mpiexec_path)
+
+# We test if MPIEXEC exists
+if test ! -x $mpiexec_path; then
+    AC_MSG_RESULT(The mpiexec script '$mpiexec_path' is not valid)
+    default_enable_mpi_check=no
+    mpiexec_path=""
+fi
+AC_SUBST(MPIEXEC,$mpiexec_path)
+
 ###############################################################################
 ###############################################################################
 #                                                                             #
 #                                                                             #
 #                                    MPI                                      #
 #                                    MPI                                      #
@@ -504,32 +538,6 @@ if test x$enable_mpi = xno ; then
     running_mpi_check=no
     running_mpi_check=no
 fi
 fi
 
 
-if test x$enable_mpi = xyes -a x$running_mpi_check = xyes ; then
-    # Check if mpiexec is available
-    if test x$enable_simgrid = xyes ; then
-	DEFAULT_MPIEXEC=smpirun
-        AC_ARG_WITH(smpirun, [AS_HELP_STRING([--with-smpirun[=<name of smpirun or path to smpirun>]], [Name or path of the smpirun helper])], [DEFAULT_MPIEXEC=$withval])
-    else
-	DEFAULT_MPIEXEC=mpiexec
-	AC_ARG_WITH(mpiexec, [AS_HELP_STRING([--with-mpiexec=<name of mpiexec or path to mpiexec>], [Name or path of mpiexec])], [DEFAULT_MPIEXEC=$withval])
-    fi
-
-    case $DEFAULT_MPIEXEC in
-	/*) mpiexec_path="$DEFAULT_MPIEXEC" ;;
-	*)  AC_PATH_PROG(mpiexec_path, $DEFAULT_MPIEXEC, [no], [$MPIPATH])
-    esac
-    AC_MSG_CHECKING(whether mpiexec is available)
-    AC_MSG_RESULT($mpiexec_path)
-
-    # We test if MPIEXEC exists
-    if test ! -x $mpiexec_path; then
-        AC_MSG_RESULT(The mpiexec script '$mpiexec_path' is not valid)
-        running_mpi_check=no
-        mpiexec_path=""
-    fi
-    AC_SUBST(MPIEXEC,$mpiexec_path)
-fi
-
 AM_CONDITIONAL(STARPU_MPI_CHECK, test x$running_mpi_check = xyes)
 AM_CONDITIONAL(STARPU_MPI_CHECK, test x$running_mpi_check = xyes)
 AC_MSG_CHECKING(whether MPI tests should be run)
 AC_MSG_CHECKING(whether MPI tests should be run)
 AC_MSG_RESULT($running_mpi_check)
 AC_MSG_RESULT($running_mpi_check)
@@ -552,7 +560,7 @@ fi
 if test x$enable_mpi = xyes ; then
 if test x$enable_mpi = xyes ; then
     if test x$enable_simgrid = xyes ; then
     if test x$enable_simgrid = xyes ; then
         if test x$enable_shared = xyes ; then
         if test x$enable_shared = xyes ; then
-	    AC_MSG_ERROR([MPI with simgrid can not work with shared libraries, if you need the MPI support, theb use --disable-shared to fix this, else disable MPI with --disable-mpi])
+	    AC_MSG_ERROR([MPI with simgrid can not work with shared libraries, if you need the MPI support, then use --disable-shared to fix this, else disable MPI with --disable-mpi])
         else
         else
 	    CFLAGS="$CFLAGS -fPIC"
 	    CFLAGS="$CFLAGS -fPIC"
 	    CXXFLAGS="$CXXFLAGS -fPIC"
 	    CXXFLAGS="$CXXFLAGS -fPIC"
@@ -1273,7 +1281,9 @@ AC_MSG_CHECKING(whether CUDA should be used)
 AC_MSG_RESULT($enable_cuda)
 AC_MSG_RESULT($enable_cuda)
 AC_SUBST(STARPU_USE_CUDA, $enable_cuda)
 AC_SUBST(STARPU_USE_CUDA, $enable_cuda)
 AM_CONDITIONAL(STARPU_USE_CUDA, test x$enable_cuda = xyes)
 AM_CONDITIONAL(STARPU_USE_CUDA, test x$enable_cuda = xyes)
+cc_or_nvcc=$CC
 if test x$enable_cuda = xyes; then
 if test x$enable_cuda = xyes; then
+   	cc_or_nvcc=$NVCC
 	AC_DEFINE(STARPU_USE_CUDA, [1], [CUDA support is activated])
 	AC_DEFINE(STARPU_USE_CUDA, [1], [CUDA support is activated])
 
 
 	# On Darwin, the libstdc++ dependency is not automatically added by nvcc
 	# On Darwin, the libstdc++ dependency is not automatically added by nvcc
@@ -1361,6 +1371,8 @@ if test x$enable_cuda = xyes; then
 	LIBS="${SAVED_LIBS}"
 	LIBS="${SAVED_LIBS}"
 fi
 fi
 
 
+AC_SUBST(CC_OR_NVCC, $cc_or_nvcc)
+
 have_magma=no
 have_magma=no
 if test x$enable_cuda = xyes; then
 if test x$enable_cuda = xyes; then
 	PKG_CHECK_MODULES([MAGMA],  [magma], [
 	PKG_CHECK_MODULES([MAGMA],  [magma], [
@@ -3408,6 +3420,27 @@ AM_CONDITIONAL(AVAILABLE_DOC, [test x$available_doc != xno])
 
 
 ###############################################################################
 ###############################################################################
 #                                                                             #
 #                                                                             #
+#                                Julia                                        #
+#                                                                             #
+###############################################################################
+AC_ARG_ENABLE(julia, [AS_HELP_STRING([--enable-julia],
+			[enable the Julia extension])],
+			enable_julia=$enableval, enable_julia=no)
+if test "$enable_julia" = "yes" ; then
+   # Check whether the julia compiler is available
+   AC_PATH_PROG(juliapath, julia)
+   AC_MSG_CHECKING(whether julia is available)
+   AC_MSG_RESULT($juliapath)
+   if test ! -x $julia_path ; then
+      AC_MSG_ERROR(Julia compiler '$julia_path' is not valid)
+      enable_julia=no
+   fi
+fi
+AM_CONDITIONAL([STARPU_USE_JULIA], [test "x$enable_julia" = "xyes"])
+AC_SUBST(JULIA, $juliapath)
+
+###############################################################################
+#                                                                             #
 #                                Final settings                               #
 #                                Final settings                               #
 #                                                                             #
 #                                                                             #
 ###############################################################################
 ###############################################################################
@@ -3486,6 +3519,10 @@ AC_CONFIG_COMMANDS([executable-scripts], [
   test -e tools/starpu_paje_state_stats.R || ln -sf $ac_abs_top_srcdir/tools/starpu_paje_state_stats.R tools/starpu_paje_state_stats.R
   test -e tools/starpu_paje_state_stats.R || ln -sf $ac_abs_top_srcdir/tools/starpu_paje_state_stats.R tools/starpu_paje_state_stats.R
   test -e tools/starpu_trace_state_stats.py || ln -sf $ac_abs_top_srcdir/tools/starpu_trace_state_stats.py tools/starpu_trace_state_stats.py
   test -e tools/starpu_trace_state_stats.py || ln -sf $ac_abs_top_srcdir/tools/starpu_trace_state_stats.py tools/starpu_trace_state_stats.py
   chmod +x tools/starpu_trace_state_stats.py
   chmod +x tools/starpu_trace_state_stats.py
+  chmod +x julia/examples/execute.sh
+  for x in julia/examples/check_deps/check_deps.sh julia/examples/mult/mult_starpu.sh julia/examples/mult/perf.sh julia/examples/variable/variable.sh julia/examples/task_insert_color/task_insert_color.sh julia/examples/vector_scal/vector_scal.sh julia/examples/mandelbrot/mandelbrot.sh julia/examples/callback/callback.sh julia/examples/dependency/task_dep.sh julia/examples/dependency/tag_dep.sh julia/examples/dependency/end_dep.sh julia/examples/axpy/axpy.sh julia/examples/gemm/gemm.sh; do
+      test -e $x || mkdir -p $(dirname $x) && ln -sf $ac_abs_top_srcdir/$x $(dirname $x)
+  done
 ])
 ])
 
 
 # Create links to ICD files in build/socl/vendors directory. SOCL will use this
 # Create links to ICD files in build/socl/vendors directory. SOCL will use this
@@ -3512,7 +3549,6 @@ AC_OUTPUT([
 	Makefile
 	Makefile
 	src/Makefile
 	src/Makefile
 	tools/Makefile
 	tools/Makefile
-	tools/replay-mpi/Makefile
 	tools/starpu_env
 	tools/starpu_env
 	tools/starpu_codelet_profile
 	tools/starpu_codelet_profile
 	tools/starpu_codelet_histo_profile
 	tools/starpu_codelet_histo_profile
@@ -3563,6 +3599,7 @@ AC_OUTPUT([
 	mpi/src/Makefile
 	mpi/src/Makefile
 	mpi/tests/Makefile
 	mpi/tests/Makefile
 	mpi/examples/Makefile
 	mpi/examples/Makefile
+	mpi/tools/Makefile
 	sc_hypervisor/Makefile
 	sc_hypervisor/Makefile
 	sc_hypervisor/src/Makefile
 	sc_hypervisor/src/Makefile
 	sc_hypervisor/examples/Makefile
 	sc_hypervisor/examples/Makefile
@@ -3575,6 +3612,11 @@ AC_OUTPUT([
 	doc/doxygen_dev/doxygen_filter.sh
 	doc/doxygen_dev/doxygen_filter.sh
 	tools/msvc/starpu_var.bat
 	tools/msvc/starpu_var.bat
 	min-dgels/Makefile
 	min-dgels/Makefile
+	julia/Makefile
+	julia/src/Makefile
+	julia/src/dynamic_compiler/Makefile
+	julia/examples/Makefile
+	julia/examples/execute.sh
 ])
 ])
 
 
 AC_MSG_NOTICE([
 AC_MSG_NOTICE([
@@ -3627,6 +3669,7 @@ AC_MSG_NOTICE([
 	       Native fortran support:                        $enable_build_fortran
 	       Native fortran support:                        $enable_build_fortran
 	       Native MPI fortran support:                    $use_mpi_fort
 	       Native MPI fortran support:                    $use_mpi_fort
 	       Support for multiple linear regression models: $support_mlr
 	       Support for multiple linear regression models: $support_mlr
+	       JULIA enabled:                                 $enable_julia
 ])
 ])
 
 
 if test "$build_socl" = "yes" -a "$run_socl_check" = "no" ; then
 if test "$build_socl" = "yes" -a "$run_socl_check" = "no" ; then

+ 3 - 2
contrib/ci.inria.fr/job-1-check.sh

@@ -63,12 +63,13 @@ fi
 export CC=gcc
 export CC=gcc
 
 
 CONFIGURE_OPTIONS="--enable-debug --enable-verbose --enable-mpi-check --disable-build-doc"
 CONFIGURE_OPTIONS="--enable-debug --enable-verbose --enable-mpi-check --disable-build-doc"
+CONFIGURE_CHECK=""
 day=$(date +%u)
 day=$(date +%u)
 if test $day -le 5
 if test $day -le 5
 then
 then
     CONFIGURE_CHECK="--enable-quick-check"
     CONFIGURE_CHECK="--enable-quick-check"
-else
-    CONFIGURE_CHECK="--enable-long-check"
+#else
+    # we do a normal check, a long check takes too long on VM nodes
 fi
 fi
 ../configure $CONFIGURE_OPTIONS $CONFIGURE_CHECK  $STARPU_CONFIGURE_OPTIONS
 ../configure $CONFIGURE_OPTIONS $CONFIGURE_CHECK  $STARPU_CONFIGURE_OPTIONS
 
 

+ 4 - 0
doc/doxygen/Makefile.am

@@ -200,7 +200,9 @@ dox_inputs = $(DOX_CONFIG) 				\
 	$(top_srcdir)/include/starpu_expert.h		\
 	$(top_srcdir)/include/starpu_expert.h		\
 	$(top_srcdir)/include/starpu_fxt.h		\
 	$(top_srcdir)/include/starpu_fxt.h		\
 	$(top_srcdir)/include/starpu_hash.h		\
 	$(top_srcdir)/include/starpu_hash.h		\
+	$(top_srcdir)/include/starpu_helper.h		\
 	$(top_srcdir)/include/starpu_mic.h		\
 	$(top_srcdir)/include/starpu_mic.h		\
+	$(top_srcdir)/include/starpu_mpi_ms.h		\
 	$(top_srcdir)/include/starpu_mod.f90		\
 	$(top_srcdir)/include/starpu_mod.f90		\
 	$(top_srcdir)/include/starpu_opencl.h		\
 	$(top_srcdir)/include/starpu_opencl.h		\
 	$(top_srcdir)/include/starpu_openmp.h		\
 	$(top_srcdir)/include/starpu_openmp.h		\
@@ -227,6 +229,8 @@ dox_inputs = $(DOX_CONFIG) 				\
 	$(top_srcdir)/include/starpu_util.h		\
 	$(top_srcdir)/include/starpu_util.h		\
 	$(top_srcdir)/include/starpu_worker.h		\
 	$(top_srcdir)/include/starpu_worker.h		\
 	$(top_srcdir)/include/fstarpu_mod.f90		\
 	$(top_srcdir)/include/fstarpu_mod.f90		\
+	$(top_srcdir)/include/schedulers/starpu_heteroprio.h	\
+	$(top_srcdir)/starpufft/include/starpufft.h 	\
 	$(top_srcdir)/mpi/include/starpu_mpi.h 		\
 	$(top_srcdir)/mpi/include/starpu_mpi.h 		\
 	$(top_srcdir)/mpi/include/starpu_mpi_lb.h	\
 	$(top_srcdir)/mpi/include/starpu_mpi_lb.h	\
 	$(top_srcdir)/mpi/include/fstarpu_mpi_mod.f90		\
 	$(top_srcdir)/mpi/include/fstarpu_mpi_mod.f90		\

+ 11 - 4
doc/doxygen/chapters/470_simgrid.doxy

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2020       Federal University of Rio Grande do Sul (UFRGS)
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -132,7 +133,10 @@ machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
 Simulation step on the desktop machine, by setting the environment
 Simulation step on the desktop machine, by setting the environment
 variable \ref STARPU_HOSTNAME to the name of the actual machine, to
 variable \ref STARPU_HOSTNAME to the name of the actual machine, to
 make StarPU use the performance models of the simulated machine even
 make StarPU use the performance models of the simulated machine even
-on the desktop machine.
+on the desktop machine. To use multiple performance models in different ranks,
+in case of smpi executions in a heterogeneous platform, it is possible to use the
+option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
+\ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
 
 
 If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
 If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
 use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
 use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
@@ -171,9 +175,12 @@ $ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mp
 \endverbatim
 \endverbatim
 
 
 Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
 Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
-list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
-clusters: for each MPI node it will just replicate the architecture referred by
-\ref STARPU_HOSTNAME.
+list of MPI nodes to be used. In homogeneous MPI clusters: for each MPI node it
+will just replicate the architecture referred by
+\ref STARPU_HOSTNAME. To use multiple performance models in different ranks,
+in case of a heterogeneous platform, it is possible to use the
+option <c>-hostfile-platform</c> in <c>starpu_smpirun</c>, that will define
+\ref STARPU_MPI_HOSTNAMES with the hostnames of your hostfile.
 
 
 \section SimulationDebuggingApplications Debugging Applications
 \section SimulationDebuggingApplications Debugging Applications
 
 

+ 32 - 0
doc/doxygen/chapters/501_environment_variables.doxy

@@ -2,6 +2,7 @@
  *
  *
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2016       Uppsala University
  * Copyright (C) 2016       Uppsala University
+ * Copyright (C) 2020       Federal University of Rio Grande do Sul (UFRGS)
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -866,6 +867,20 @@ a homogenenous cluster, it is possible to share the models between
 machines by setting <c>export STARPU_HOSTNAME=some_global_name</c>.
 machines by setting <c>export STARPU_HOSTNAME=some_global_name</c>.
 </dd>
 </dd>
 
 
+<dt>STARPU_MPI_HOSTNAMES</dt>
+<dd>
+\anchor STARPU_MPI_HOSTNAMES
+\addindex __env__STARPU_MPI_HOSTNAMES
+Similar to \ref STARPU_HOSTNAME but to define multiple nodes on a
+heterogeneous cluster. The variable is a list of hostnames that will be assigned
+to each StarPU-MPI rank considering their position and the value of
+\ref starpu_mpi_world_rank on each rank. When running, for example, on a
+heterogeneous cluster, it is possible to set individual models for each machine
+by setting <c>export STARPU_MPI_HOSTNAMES="name0 name1 name2"</c>. Where rank 0
+will receive name0, rank1 will receive name1, and so on.
+This variable has precedence over \ref STARPU_HOSTNAME.
+</dd>
+
 <dt>STARPU_OPENCL_PROGRAM_DIR</dt>
 <dt>STARPU_OPENCL_PROGRAM_DIR</dt>
 <dd>
 <dd>
 \anchor STARPU_OPENCL_PROGRAM_DIR
 \anchor STARPU_OPENCL_PROGRAM_DIR
@@ -986,6 +1001,23 @@ NUMA nodes used by StarPU. Any \ref STARPU_LIMIT_CPU_NUMA_devid_MEM additionally
 specified will take over STARPU_LIMIT_CPU_NUMA_MEM.
 specified will take over STARPU_LIMIT_CPU_NUMA_MEM.
 </dd>
 </dd>
 
 
+<dt>STARPU_LIMIT_BANDWIDTH</dt>
+<dd>
+\anchor STARPU_LIMIT_BANDWIDTH
+\addindex __env__STARPU_LIMIT_BANDWIDTH
+Specify the maximum available PCI bandwidth of the system in MB/s. This can only
+be effective with simgrid simulation. This allows to easily override the
+bandwidths stored in the platform file generated from measurements on the native
+system. This can be used e.g. for convenient
+
+Specify the maximum number of megabytes that should be available to the
+application on each NUMA node. This is the same as specifying that same amount
+with \ref STARPU_LIMIT_CPU_NUMA_devid_MEM for each NUMA node number. The total
+memory available to StarPU will thus be this amount multiplied by the number of
+NUMA nodes used by StarPU. Any \ref STARPU_LIMIT_CPU_NUMA_devid_MEM additionally
+specified will take over STARPU_LIMIT_BANDWIDTH.
+</dd>
+
 <dt>STARPU_MINIMUM_AVAILABLE_MEM</dt>
 <dt>STARPU_MINIMUM_AVAILABLE_MEM</dt>
 <dd>
 <dd>
 \anchor STARPU_MINIMUM_AVAILABLE_MEM
 \anchor STARPU_MINIMUM_AVAILABLE_MEM

+ 34 - 0
doc/doxygen/dev/starpu_check_include.sh

@@ -0,0 +1,34 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+dir=$(dirname $0)
+
+cd $dir/../../../
+for d in $(find . -name include -not -wholename "*/build/*")
+do
+    for f in $(find $d -name "*h")
+    do
+	for i in doxygen-config.cfg.in Makefile.am
+	do
+	    x=`grep $f $dir/../$i`
+	    if test -z "$x"
+	    then
+		echo $f missing in $i
+	    fi
+	done
+    done
+done

+ 4 - 6
examples/Makefile.am

@@ -158,11 +158,8 @@ SHELL_TESTS			+=	mult/sgemm.sh
 endif
 endif
 endif
 endif
 
 
-if STARPU_HAVE_WINDOWS
 check_PROGRAMS		=	$(STARPU_EXAMPLES)
 check_PROGRAMS		=	$(STARPU_EXAMPLES)
-else
-check_PROGRAMS		=	$(LOADER) $(STARPU_EXAMPLES)
-endif
+noinst_PROGRAMS		=
 
 
 if !STARPU_HAVE_WINDOWS
 if !STARPU_HAVE_WINDOWS
 ## test loader program
 ## test loader program
@@ -171,6 +168,7 @@ LOADER			=	loader
 loader_CPPFLAGS 	=	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 loader_CPPFLAGS 	=	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 LOADER_BIN		=	$(abs_top_builddir)/examples/$(LOADER)
 LOADER_BIN		=	$(abs_top_builddir)/examples/$(LOADER)
 loader_SOURCES		=	../tests/loader.c
 loader_SOURCES		=	../tests/loader.c
+noinst_PROGRAMS		+=	loader
 else
 else
 LOADER			=
 LOADER			=
 LOADER_BIN		=	$(top_builddir)/examples/loader-cross.sh
 LOADER_BIN		=	$(top_builddir)/examples/loader-cross.sh
@@ -1118,10 +1116,10 @@ endif
 # - link over source file to build our own object
 # - link over source file to build our own object
 fortran90/starpu_mod.f90:
 fortran90/starpu_mod.f90:
 	@$(MKDIR_P) $(dir $@)
 	@$(MKDIR_P) $(dir $@)
-	$(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
+	$(V_ln) $(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
 native_fortran/fstarpu_mod.f90:
 native_fortran/fstarpu_mod.f90:
 	@$(MKDIR_P) $(dir $@)
 	@$(MKDIR_P) $(dir $@)
-	$(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
+	$(V_ln) $(LN_S) $(abs_top_srcdir)/include/$(notdir $@) $@
 
 
 if STARPU_HAVE_FC
 if STARPU_HAVE_FC
 # Fortran90 example
 # Fortran90 example

+ 21 - 1
examples/cholesky/cholesky.sh

@@ -22,6 +22,26 @@ ROOT=${0%.sh}
 [ -n "$STARPU_HOSTNAME" ] || export STARPU_HOSTNAME=mirage
 [ -n "$STARPU_HOSTNAME" ] || export STARPU_HOSTNAME=mirage
 unset MALLOC_PERTURB_
 unset MALLOC_PERTURB_
 
 
+INCR=2
+STOP=32
+
+if [ -n "$STARPU_SIMGRID" ]
+then
+	INCR=4
+	STOP=14
+	# These use the thread factory, and are thus much longer
+	if [ -n "$STARPU_QUICK_CHECK" ]
+	then
+		INCR=8
+		STOP=10
+	fi
+	if [ -n "$STARPU_LONG_CHECK" ]
+	then
+		INCR=4
+		STOP=32
+	fi
+fi
+
 (
 (
 echo -n "#"
 echo -n "#"
 for STARPU_SCHED in $STARPU_SCHEDS ; do
 for STARPU_SCHED in $STARPU_SCHEDS ; do
@@ -29,7 +49,7 @@ for STARPU_SCHED in $STARPU_SCHEDS ; do
 done
 done
 echo
 echo
 
 
-for size in `seq 2 2 30` ; do
+for size in `seq 2 $INCR $STOP` ; do
 	echo -n "$((size * 960))"
 	echo -n "$((size * 960))"
 	for STARPU_SCHED in $STARPU_SCHEDS
 	for STARPU_SCHED in $STARPU_SCHEDS
 	do
 	do

+ 10 - 3
examples/common/blas.h

@@ -88,6 +88,14 @@ void STARPU_DPOTRF(const char*uplo, const int n, double *a, const int lda);
 
 
 #if defined(STARPU_GOTO) || defined(STARPU_OPENBLAS) || defined(STARPU_SYSTEM_BLAS) || defined(STARPU_MKL) || defined(STARPU_ARMPL)
 #if defined(STARPU_GOTO) || defined(STARPU_OPENBLAS) || defined(STARPU_SYSTEM_BLAS) || defined(STARPU_MKL) || defined(STARPU_ARMPL)
 
 
+#ifdef _STARPU_F2C_COMPATIBILITY
+/* for compatibility with F2C, FLOATRET may not be a float but a double in GOTOBLAS */
+/* Don't know how to detect this automatically */
+#define _STARPU_FLOATRET double
+#else
+#define _STARPU_FLOATRET float
+#endif
+
 extern void sgemm_ (const char *transa, const char *transb, const int *m,
 extern void sgemm_ (const char *transa, const char *transb, const int *m,
                    const int *n, const int *k, const float *alpha, 
                    const int *n, const int *k, const float *alpha, 
                    const float *A, const int *lda, const float *B, 
                    const float *A, const int *lda, const float *B, 
@@ -118,7 +126,7 @@ extern void dtrsm_ (const char *side, const char *uplo, const char *transa,
                    const char *diag, const int *m, const int *n,
                    const char *diag, const int *m, const int *n,
                    const double *alpha, const double *A, const int *lda,
                    const double *alpha, const double *A, const int *lda,
                    double *B, const int *ldb);
                    double *B, const int *ldb);
-extern double sasum_ (const int *n, const float *x, const int *incx);
+extern _STARPU_FLOATRET sasum_ (const int *n, const float *x, const int *incx);
 extern double dasum_ (const int *n, const double *x, const int *incx);
 extern double dasum_ (const int *n, const double *x, const int *incx);
 extern void sscal_ (const int *n, const float *alpha, float *x,
 extern void sscal_ (const int *n, const float *alpha, float *x,
                    const int *incx);
                    const int *incx);
@@ -150,8 +158,7 @@ extern void daxpy_(const int *n, const double *alpha, const double *X, const int
 		double *Y, const int *incy);
 		double *Y, const int *incy);
 extern int isamax_(const int *n, const float *X, const int *incX);
 extern int isamax_(const int *n, const float *X, const int *incX);
 extern int idamax_(const int *n, const double *X, const int *incX);
 extern int idamax_(const int *n, const double *X, const int *incX);
-/* for some reason, FLOATRET is not a float but a double in GOTOBLAS */
-extern double sdot_(const int *n, const float *x, const int *incx, const float *y, const int *incy);
+extern _STARPU_FLOATRET sdot_(const int *n, const float *x, const int *incx, const float *y, const int *incy);
 extern double ddot_(const int *n, const double *x, const int *incx, const double *y, const int *incy);
 extern double ddot_(const int *n, const double *x, const int *incx, const double *y, const int *incy);
 extern void sswap_(const int *n, float *x, const int *incx, float *y, const int *incy);
 extern void sswap_(const int *n, float *x, const int *incx, float *y, const int *incy);
 extern void dswap_(const int *n, double *x, const int *incx, double *y, const int *incy);
 extern void dswap_(const int *n, double *x, const int *incx, double *y, const int *incy);

+ 2 - 0
examples/mlr/mlr.c

@@ -110,7 +110,9 @@ static struct starpu_perfmodel cl_model_init =
    template.
    template.
  */
  */
 
 
+/* M^2 * N^1 * K^0 */
 static unsigned combi1 [3]		= {	2,	1,	0 };
 static unsigned combi1 [3]		= {	2,	1,	0 };
+/* M^0 * N^3 * K^1 */
 static unsigned combi2 [3]		= {	0,	3,	1 };
 static unsigned combi2 [3]		= {	0,	3,	1 };
 
 
 static unsigned *combinations[] = { combi1, combi2 };
 static unsigned *combinations[] = { combi1, combi2 };

+ 1 - 1
examples/mult/sgemm.sh

@@ -32,7 +32,7 @@ if [ -n "$STARPU_MIC_SINK_PROGRAM_PATH" ] ; then
 	[ -x "$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm" ] && STARPU_MIC_SINK_PROGRAM_NAME=$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm
 	[ -x "$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm" ] && STARPU_MIC_SINK_PROGRAM_NAME=$STARPU_MIC_SINK_PROGRAM_PATH/.libs/sgemm
 fi
 fi
 
 
-STARPU_SCHED=dmdas STARPU_FXT_PREFIX=$PREFIX/ $PREFIX/sgemm
+STARPU_SCHED=dmdas STARPU_FXT_PREFIX=$PREFIX/ $PREFIX/sgemm -check
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -x -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_display ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_display -x -s starpu_sgemm_gemm
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_recdump ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_recdump -o perfs.rec
 [ ! -x $PREFIX/../../tools/starpu_perfmodel_recdump ] || $STARPU_LAUNCH $PREFIX/../../tools/starpu_perfmodel_recdump -o perfs.rec

+ 14 - 4
examples/mult/xgemm.c

@@ -66,7 +66,7 @@ static starpu_data_handle_t A_handle, B_handle, C_handle;
 #define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
 #define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
 #define PRINTF(fmt, ...) do { if (!getenv("STARPU_SSILENT")) {printf(fmt, ## __VA_ARGS__); fflush(stdout); }} while(0)
 #define PRINTF(fmt, ...) do { if (!getenv("STARPU_SSILENT")) {printf(fmt, ## __VA_ARGS__); fflush(stdout); }} while(0)
 
 
-static void check_output(void)
+static int check_output(void)
 {
 {
 	/* compute C = C - AB */
 	/* compute C = C - AB */
 	CPU_GEMM("N", "N", ydim, xdim, zdim, (TYPE)-1.0f, A, ydim, B, zdim, (TYPE)1.0f, C, ydim);
 	CPU_GEMM("N", "N", ydim, xdim, zdim, (TYPE)-1.0f, A, ydim, B, zdim, (TYPE)1.0f, C, ydim);
@@ -78,6 +78,7 @@ static void check_output(void)
 	if (err < xdim*ydim*0.001)
 	if (err < xdim*ydim*0.001)
 	{
 	{
 		FPRINTF(stderr, "Results are OK\n");
 		FPRINTF(stderr, "Results are OK\n");
+		return 0;
 	}
 	}
 	else
 	else
 	{
 	{
@@ -86,6 +87,7 @@ static void check_output(void)
 
 
 		FPRINTF(stderr, "There were errors ... err = %f\n", err);
 		FPRINTF(stderr, "There were errors ... err = %f\n", err);
 		FPRINTF(stderr, "Max error : %e\n", C[max]);
 		FPRINTF(stderr, "Max error : %e\n", C[max]);
+		return 1;
 	}
 	}
 }
 }
 
 
@@ -150,6 +152,11 @@ static void partition_mult_data(void)
 	starpu_data_partition(A_handle, &horiz);
 	starpu_data_partition(A_handle, &horiz);
 
 
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
+
+	unsigned x, y;
+	for (x = 0; x < nslicesx; x++)
+	for (y = 0; y < nslicesy; y++)
+		starpu_data_set_coordinates(starpu_data_get_sub_data(C_handle, 2, x, y), 2, x, y);
 }
 }
 
 
 #ifdef STARPU_USE_CUDA
 #ifdef STARPU_USE_CUDA
@@ -236,7 +243,7 @@ static struct starpu_codelet cl =
 #endif
 #endif
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.cuda_flags = {STARPU_CUDA_ASYNC},
 	.nbuffers = 3,
 	.nbuffers = 3,
-	.modes = {STARPU_R, STARPU_R, STARPU_RW},
+	.modes = {STARPU_R, STARPU_R, STARPU_W},
 	.model = &starpu_gemm_model
 	.model = &starpu_gemm_model
 };
 };
 
 
@@ -334,7 +341,7 @@ static void parse_args(int argc, char **argv)
 		}
 		}
 		else
 		else
 		{
 		{
-			fprintf(stderr,"Unrecognized option %s", argv[i]);
+			fprintf(stderr,"Unrecognized option %s\n", argv[i]);
 			exit(EXIT_FAILURE);
 			exit(EXIT_FAILURE);
 		}
 		}
 	}
 	}
@@ -398,6 +405,7 @@ int main(int argc, char **argv)
 				ret = starpu_task_submit(task);
 				ret = starpu_task_submit(task);
 				if (ret == -ENODEV)
 				if (ret == -ENODEV)
 				{
 				{
+				     check = 0;
 				     ret = 77;
 				     ret = 77;
 				     goto enodev;
 				     goto enodev;
 				}
 				}
@@ -448,8 +456,10 @@ enodev:
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(C_handle);
 	starpu_data_unregister(C_handle);
 
 
+#ifndef STARPU_SIMGRID
 	if (check)
 	if (check)
-		check_output();
+		ret = check_output();
+#endif
 
 
 	starpu_free_flags(A, zdim*ydim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);
 	starpu_free_flags(A, zdim*ydim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);
 	starpu_free_flags(B, xdim*zdim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);
 	starpu_free_flags(B, xdim*zdim*sizeof(TYPE), STARPU_MALLOC_PINNED|STARPU_MALLOC_SIMULATION_FOLDED);

+ 2 - 4
examples/stencil/Makefile.am

@@ -56,11 +56,8 @@ endif
 # What to install and what to check #
 # What to install and what to check #
 #####################################
 #####################################
 
 
-if STARPU_HAVE_WINDOWS
 check_PROGRAMS	=	$(STARPU_EXAMPLES)
 check_PROGRAMS	=	$(STARPU_EXAMPLES)
-else
-check_PROGRAMS	=	$(LOADER) $(STARPU_EXAMPLES)
-endif
+noinst_PROGRAMS	=
 
 
 if !STARPU_SIMGRID
 if !STARPU_SIMGRID
 if USE_MPI
 if USE_MPI
@@ -79,6 +76,7 @@ LOADER			=	loader
 loader_CPPFLAGS 	= 	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 loader_CPPFLAGS 	= 	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
 LOADER_BIN		=	./$(LOADER)
 LOADER_BIN		=	./$(LOADER)
 loader_SOURCES		=	../../tests/loader.c
 loader_SOURCES		=	../../tests/loader.c
+noinst_PROGRAMS		+=	loader
 else
 else
 LOADER			=
 LOADER			=
 LOADER_BIN		=	$(top_builddir)/examples/stencil/loader-cross.sh
 LOADER_BIN		=	$(top_builddir)/examples/stencil/loader-cross.sh

+ 8 - 0
examples/tag_example/tag_example.c

@@ -222,6 +222,14 @@ int main(int argc, char **argv)
 {
 {
 	int ret;
 	int ret;
 
 
+#ifdef STARPU_HAVE_HELGRIND_H
+	if (RUNNING_ON_VALGRIND) {
+		ni /= 2;
+		nj /= 2;
+		nk /= 2;
+	}
+#endif
+
 	ret = starpu_init(NULL);
 	ret = starpu_init(NULL);
 	if (ret == -ENODEV)
 	if (ret == -ENODEV)
 		exit(77);
 		exit(77);

+ 11 - 0
include/starpu.h

@@ -126,6 +126,17 @@ struct starpu_conf
 	void (*sched_policy_init)(unsigned);
 	void (*sched_policy_init)(unsigned);
 
 
 	/**
 	/**
+	   For all parameters specified in this structure that can
+	   also be set with environment variables, by default,
+	   StarPU chooses the value of the environment variable
+	   against the value set in starpu_conf. Setting the parameter
+	   starpu_conf::precedence_over_environment_variables to 1 allows to give precedence
+	   to the value set in the structure over the environment
+	   variable.
+	 */
+	int precedence_over_environment_variables;
+
+	/**
 	   Number of CPU cores that StarPU can use. This can also be
 	   Number of CPU cores that StarPU can use. This can also be
 	   specified with the environment variable \ref STARPU_NCPU.
 	   specified with the environment variable \ref STARPU_NCPU.
 	   (default = -1)
 	   (default = -1)

+ 246 - 15
include/starpu_bitmap.h

@@ -18,6 +18,12 @@
 #ifndef __STARPU_BITMAP_H__
 #ifndef __STARPU_BITMAP_H__
 #define __STARPU_BITMAP_H__
 #define __STARPU_BITMAP_H__
 
 
+#include <starpu_util.h>
+#include <starpu_config.h>
+
+#include <string.h>
+#include <stdlib.h>
+
 #ifdef __cplusplus
 #ifdef __cplusplus
 extern "C"
 extern "C"
 {
 {
@@ -28,43 +34,268 @@ extern "C"
    @brief This is the interface for the bitmap utilities provided by StarPU.
    @brief This is the interface for the bitmap utilities provided by StarPU.
    @{
    @{
  */
  */
+#ifndef _STARPU_LONG_BIT
+#define _STARPU_LONG_BIT ((int)(sizeof(unsigned long) * 8))
+#endif
+
+#define _STARPU_BITMAP_SIZE ((STARPU_NMAXWORKERS - 1)/_STARPU_LONG_BIT) + 1
 
 
 /** create a empty starpu_bitmap */
 /** create a empty starpu_bitmap */
-struct starpu_bitmap *starpu_bitmap_create(void) STARPU_ATTRIBUTE_MALLOC;
+static inline struct starpu_bitmap *starpu_bitmap_create(void) STARPU_ATTRIBUTE_MALLOC;
+/** zero a starpu_bitmap */
+static inline void starpu_bitmap_init(struct starpu_bitmap *b);
 /** free \p b */
 /** free \p b */
-void starpu_bitmap_destroy(struct starpu_bitmap *b);
+static inline void starpu_bitmap_destroy(struct starpu_bitmap *b);
 
 
 /** set bit \p e in \p b */
 /** set bit \p e in \p b */
-void starpu_bitmap_set(struct starpu_bitmap *b, int e);
+static inline void starpu_bitmap_set(struct starpu_bitmap *b, int e);
 /** unset bit \p e in \p b */
 /** unset bit \p e in \p b */
-void starpu_bitmap_unset(struct starpu_bitmap *b, int e);
+static inline void starpu_bitmap_unset(struct starpu_bitmap *b, int e);
 /** unset all bits in \p b */
 /** unset all bits in \p b */
-void starpu_bitmap_unset_all(struct starpu_bitmap *b);
+static inline void starpu_bitmap_unset_all(struct starpu_bitmap *b);
 
 
 /** return true iff bit \p e is set in \p b */
 /** return true iff bit \p e is set in \p b */
-int starpu_bitmap_get(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_get(struct starpu_bitmap *b, int e);
 /** Basically compute \c starpu_bitmap_unset_all(\p a) ; \p a = \p b & \p c; */
 /** Basically compute \c starpu_bitmap_unset_all(\p a) ; \p a = \p b & \p c; */
-void starpu_bitmap_unset_and(struct starpu_bitmap *a, struct starpu_bitmap *b, struct starpu_bitmap *c);
+static inline void starpu_bitmap_unset_and(struct starpu_bitmap *a, struct starpu_bitmap *b, struct starpu_bitmap *c);
 /** Basically compute \p a |= \p b */
 /** Basically compute \p a |= \p b */
-void starpu_bitmap_or(struct starpu_bitmap *a, struct starpu_bitmap *b);
+static inline void starpu_bitmap_or(struct starpu_bitmap *a, struct starpu_bitmap *b);
 /** return 1 iff \p e is set in \p b1 AND \p e is set in \p b2 */
 /** return 1 iff \p e is set in \p b1 AND \p e is set in \p b2 */
-int starpu_bitmap_and_get(struct starpu_bitmap *b1, struct starpu_bitmap *b2, int e);
+static inline int starpu_bitmap_and_get(struct starpu_bitmap *b1, struct starpu_bitmap *b2, int e);
 /** return the number of set bits in \p b */
 /** return the number of set bits in \p b */
-int starpu_bitmap_cardinal(struct starpu_bitmap *b);
+static inline int starpu_bitmap_cardinal(struct starpu_bitmap *b);
 
 
 /** return the index of the first set bit of \p b, -1 if none */
 /** return the index of the first set bit of \p b, -1 if none */
-int starpu_bitmap_first(struct starpu_bitmap *b);
+static inline int starpu_bitmap_first(struct starpu_bitmap *b);
 /** return the position of the last set bit of \p b, -1 if none */
 /** return the position of the last set bit of \p b, -1 if none */
-int starpu_bitmap_last(struct starpu_bitmap *b);
+static inline int starpu_bitmap_last(struct starpu_bitmap *b);
 /** return the position of set bit right after \p e in \p b, -1 if none */
 /** return the position of set bit right after \p e in \p b, -1 if none */
-int starpu_bitmap_next(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_next(struct starpu_bitmap *b, int e);
 /** todo */
 /** todo */
-int starpu_bitmap_has_next(struct starpu_bitmap *b, int e);
+static inline int starpu_bitmap_has_next(struct starpu_bitmap *b, int e);
 
 
 /** @} */
 /** @} */
 
 
-#ifdef __cplusplus
+struct starpu_bitmap
+{
+	unsigned long bits[_STARPU_BITMAP_SIZE];
+	int cardinal;
+};
+
+#ifdef _STARPU_DEBUG_BITMAP
+static int _starpu_check_bitmap(struct starpu_bitmap *b)
+{
+	int card = b->cardinal;
+	int i = starpu_bitmap_first(b);
+	int j;
+	for(j = 0; j < card; j++)
+	{
+		if(i == -1)
+			return 0;
+		int tmp = starpu_bitmap_next(b,i);
+		if(tmp == i)
+			return 0;
+		i = tmp;
+	}
+	if(i != -1)
+		return 0;
+	return 1;
 }
 }
+#else
+#define _starpu_check_bitmap(b) 1
 #endif
 #endif
 
 
+static int _starpu_count_bit_static(unsigned long e)
+{
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4)
+	return __builtin_popcountl(e);
+#else
+	int c = 0;
+	while(e)
+	{
+		c += e&1;
+		e >>= 1;
+	}
+	return c;
 #endif
 #endif
+}
+
+static inline struct starpu_bitmap *starpu_bitmap_create()
+{
+	return (struct starpu_bitmap *) calloc(1, sizeof(struct starpu_bitmap));
+}
+
+static inline void starpu_bitmap_init(struct starpu_bitmap *b)
+{
+	memset(b, 0, sizeof(*b));
+}
+
+static inline void starpu_bitmap_destroy(struct starpu_bitmap * b)
+{
+	free(b);
+}
+
+static inline void starpu_bitmap_set(struct starpu_bitmap * b, int e)
+{
+	if(!starpu_bitmap_get(b, e))
+		b->cardinal++;
+	else
+		return;
+	STARPU_ASSERT(e/_STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	b->bits[e/_STARPU_LONG_BIT] |= (1ul << (e%_STARPU_LONG_BIT));
+	STARPU_ASSERT(_starpu_check_bitmap(b));
+}
+static inline void starpu_bitmap_unset(struct starpu_bitmap *b, int e)
+{
+	if(starpu_bitmap_get(b, e))
+		b->cardinal--;
+	else
+		return;
+	STARPU_ASSERT(e/_STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	if(e / _STARPU_LONG_BIT > _STARPU_BITMAP_SIZE)
+		return;
+	b->bits[e/_STARPU_LONG_BIT] &= ~(1ul << (e%_STARPU_LONG_BIT));
+	STARPU_ASSERT(_starpu_check_bitmap(b));
+}
+
+static inline void starpu_bitmap_unset_all(struct starpu_bitmap * b)
+{
+	memset(b->bits, 0, _STARPU_BITMAP_SIZE * sizeof(unsigned long));
+}
+
+static inline void starpu_bitmap_unset_and(struct starpu_bitmap * a, struct starpu_bitmap * b, struct starpu_bitmap * c)
+{
+	a->cardinal = 0;
+	int i;
+	for(i = 0; i < _STARPU_BITMAP_SIZE; i++)
+	{
+		a->bits[i] = b->bits[i] & c->bits[i];
+		a->cardinal += _starpu_count_bit_static(a->bits[i]);
+	}
+}
+
+static inline int starpu_bitmap_get(struct starpu_bitmap * b, int e)
+{
+	STARPU_ASSERT(e / _STARPU_LONG_BIT < _STARPU_BITMAP_SIZE);
+	if(e / _STARPU_LONG_BIT >= _STARPU_BITMAP_SIZE)
+		return 0;
+	return (b->bits[e/_STARPU_LONG_BIT] & (1ul << (e%_STARPU_LONG_BIT))) ?
+		1:
+		0;
+}
+
+static inline void starpu_bitmap_or(struct starpu_bitmap * a, struct starpu_bitmap * b)
+{
+	int i;
+	a->cardinal = 0;
+	for(i = 0; i < _STARPU_BITMAP_SIZE; i++)
+	{
+		a->bits[i] |= b->bits[i];
+		a->cardinal += _starpu_count_bit_static(a->bits[i]);
+	}
+}
+
+
+static inline int starpu_bitmap_and_get(struct starpu_bitmap * b1, struct starpu_bitmap * b2, int e)
+{
+	return starpu_bitmap_get(b1,e) && starpu_bitmap_get(b2,e);
+}
+
+static inline int starpu_bitmap_cardinal(struct starpu_bitmap * b)
+{
+	return b->cardinal;
+}
+
+
+static inline int _starpu_get_first_bit_rank(unsigned long ms)
+{
+	STARPU_ASSERT(ms != 0);
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4))
+	return __builtin_ffsl(ms) - 1;
+#else
+	unsigned long m = 1ul;
+	int i = 0;
+	while(!(m&ms))
+		i++,m<<=1;
+	return i;
+#endif
+}
+
+static inline int _starpu_get_last_bit_rank(unsigned long l)
+{
+	STARPU_ASSERT(l != 0);
+#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4))
+	return 8*sizeof(l) - __builtin_clzl(l);
+#else
+	int ibit = _STARPU_LONG_BIT - 1;
+	while((!(1ul << ibit)) & l)
+		ibit--;
+	STARPU_ASSERT(ibit >= 0);
+	return ibit;
+#endif
+}
+
+static inline int starpu_bitmap_first(struct starpu_bitmap * b)
+{
+	int i = 0;
+	while(i < _STARPU_BITMAP_SIZE && !b->bits[i])
+		i++;
+	if( i == _STARPU_BITMAP_SIZE)
+		return -1;
+	int nb_long = i;
+	unsigned long ms = b->bits[i];
+
+	return (nb_long * _STARPU_LONG_BIT) + _starpu_get_first_bit_rank(ms);
+}
+
+static inline int starpu_bitmap_has_next(struct starpu_bitmap * b, int e)
+{
+	int nb_long = (e+1) / _STARPU_LONG_BIT;
+	int nb_bit = (e+1) % _STARPU_LONG_BIT;
+	unsigned long mask = (~0ul) << nb_bit;
+	if(b->bits[nb_long] & mask)
+		return 1;
+	for(nb_long++; nb_long < _STARPU_BITMAP_SIZE; nb_long++)
+		if(b->bits[nb_long])
+			return 1;
+	return 0;
+}
+
+static inline int starpu_bitmap_last(struct starpu_bitmap * b)
+{
+	if(b->cardinal == 0)
+		return -1;
+	int ilong;
+	for(ilong = _STARPU_BITMAP_SIZE - 1; ilong >= 0; ilong--)
+	{
+		if(b->bits[ilong])
+			break;
+	}
+	STARPU_ASSERT(ilong >= 0);
+	unsigned long l = b->bits[ilong];
+	return ilong * _STARPU_LONG_BIT + _starpu_get_last_bit_rank(l);
+}
+
+static inline int starpu_bitmap_next(struct starpu_bitmap *b, int e)
+{
+	int nb_long = e / _STARPU_LONG_BIT;
+	int nb_bit = e % _STARPU_LONG_BIT;
+	unsigned long rest = nb_bit == _STARPU_LONG_BIT - 1 ? 0 : (~0ul << (nb_bit + 1)) & b->bits[nb_long];
+	if(nb_bit != (_STARPU_LONG_BIT - 1) && rest)
+	{
+		int i = _starpu_get_first_bit_rank(rest);
+		STARPU_ASSERT(i >= 0 && i < _STARPU_LONG_BIT);
+		return (nb_long * _STARPU_LONG_BIT) + i;
+	}
+
+	for(nb_long++;nb_long < _STARPU_BITMAP_SIZE; nb_long++)
+		if(b->bits[nb_long])
+			return nb_long * _STARPU_LONG_BIT + _starpu_get_first_bit_rank(b->bits[nb_long]);
+	return -1;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __STARPU_BITMAP_H__ */

+ 3 - 3
include/starpu_sched_component.h

@@ -69,14 +69,14 @@ struct starpu_sched_component
 	/** The tree containing the component*/
 	/** The tree containing the component*/
 	struct starpu_sched_tree *tree;
 	struct starpu_sched_tree *tree;
 	/** set of underlying workers */
 	/** set of underlying workers */
-	struct starpu_bitmap *workers;
+	struct starpu_bitmap workers;
 	/**
 	/**
 	   subset of starpu_sched_component::workers that is currently available in the context
 	   subset of starpu_sched_component::workers that is currently available in the context
 	   The push method should take this value into account, it is set with:
 	   The push method should take this value into account, it is set with:
 	   component->workers UNION tree->workers UNION
 	   component->workers UNION tree->workers UNION
 	   component->child[i]->workers_in_ctx iff exist x such as component->children[i]->parents[x] == component
 	   component->child[i]->workers_in_ctx iff exist x such as component->children[i]->parents[x] == component
 	*/
 	*/
-	struct starpu_bitmap *workers_in_ctx;
+	struct starpu_bitmap workers_in_ctx;
 	/** private data */
 	/** private data */
 	void *data;
 	void *data;
 	char *name;
 	char *name;
@@ -188,7 +188,7 @@ struct starpu_sched_tree
 	/**
 	/**
 	   set of workers available in this context, this value is used to mask workers in modules
 	   set of workers available in this context, this value is used to mask workers in modules
 	*/
 	*/
-	struct starpu_bitmap *workers;
+	struct starpu_bitmap workers;
 	/**
 	/**
 	   context id of the scheduler
 	   context id of the scheduler
 	*/
 	*/

+ 3 - 1
include/starpu_task.h

@@ -538,7 +538,9 @@ struct starpu_codelet
 
 
 	/**
 	/**
 	   Optional color of the codelet. This can be useful for
 	   Optional color of the codelet. This can be useful for
-	   debugging purposes.
+	   debugging purposes. Value 0 acts like if this field wasn't specified.
+	   Color representation is hex triplet (for example: 0xff0000 is red,
+	   0x0000ff is blue, 0xffa500 is orange, ...).
 	*/
 	*/
 	unsigned color;
 	unsigned color;
 
 

+ 27 - 0
julia/Makefile.am

@@ -0,0 +1,27 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+include $(top_srcdir)/starpu.mk
+
+SUBDIRS = src examples
+
+EXTRA_DIST = README
+
+recheck:
+	RET=0 ; \
+	for i in $(SUBDIRS) ; do \
+		make -C $$i recheck || RET=1 ; \
+	done ; \
+	exit $$RET

+ 127 - 0
julia/Manifest.toml

@@ -0,0 +1,127 @@
+# This file is machine-generated - editing it directly is not advised
+
+[[Base64]]
+uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
+
+[[CBinding]]
+deps = ["Libdl", "Random", "Test"]
+git-tree-sha1 = "6f457df38ae2ba239d5e43b80493bb907de826b2"
+repo-rev = "655e9862947d17423f2fb91ea1014e1cb73c1be1"
+repo-url = "https://github.com/analytech-solutions/CBinding.jl.git"
+uuid = "d43a6710-96b8-4a2d-833c-c424785e5374"
+version = "0.8.1"
+
+[[CEnum]]
+git-tree-sha1 = "62847acab40e6855a9b5905ccb99c2b5cf6b3ebb"
+uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
+version = "0.2.0"
+
+[[Clang]]
+deps = ["CEnum", "DataStructures", "LLVM_jll", "Libdl"]
+git-tree-sha1 = "45013227beea038ecc17e8c07cd7c7b05ed26067"
+repo-rev = "master"
+repo-url = "https://github.com/phuchant/Clang.jl.git"
+uuid = "40e3b903-d033-50b4-a0cc-940c62c95e31"
+version = "0.11.0"
+
+[[DataStructures]]
+deps = ["InteractiveUtils", "OrderedCollections"]
+git-tree-sha1 = "6166ecfaf2b8bbf2b68d791bc1d54501f345d314"
+uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
+version = "0.17.15"
+
+[[Dates]]
+deps = ["Printf"]
+uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
+
+[[Distributed]]
+deps = ["Random", "Serialization", "Sockets"]
+uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
+
+[[InteractiveUtils]]
+deps = ["Markdown"]
+uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
+
+[[LLVM_jll]]
+deps = ["Libdl", "Pkg"]
+git-tree-sha1 = "c037c15f36c185c613e5b2589d5833720dab3f76"
+uuid = "86de99a1-58d6-5da7-8064-bd56ce2e322c"
+version = "8.0.1+0"
+
+[[LibGit2]]
+deps = ["Printf"]
+uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
+
+[[Libdl]]
+uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
+
+[[LinearAlgebra]]
+deps = ["Libdl"]
+uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+
+[[Logging]]
+uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
+
+[[Markdown]]
+deps = ["Base64"]
+uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
+
+[[OrderedCollections]]
+git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
+uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
+version = "1.2.0"
+
+[[Pkg]]
+deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
+uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
+
+[[Printf]]
+deps = ["Unicode"]
+uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
+
+[[REPL]]
+deps = ["InteractiveUtils", "Markdown", "Sockets"]
+uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
+
+[[Random]]
+deps = ["Serialization"]
+uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+
+[[RecipesBase]]
+git-tree-sha1 = "54f8ceb165a0f6d083f0d12cb4996f5367c6edbc"
+uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
+version = "1.0.1"
+
+[[SHA]]
+uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
+
+[[Serialization]]
+uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
+
+[[Sockets]]
+uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
+
+[[SparseArrays]]
+deps = ["LinearAlgebra", "Random"]
+uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
+
+[[Statistics]]
+deps = ["LinearAlgebra", "SparseArrays"]
+uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
+
+[[Test]]
+deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
+uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+
+[[ThreadPools]]
+deps = ["Printf", "RecipesBase", "Statistics"]
+git-tree-sha1 = "48e35097fdc6d1706a9b90c5eee62f54402aa62c"
+uuid = "b189fb0b-2eb5-4ed4-bc0c-d34c51242431"
+version = "1.1.0"
+
+[[UUIDs]]
+deps = ["Random", "SHA"]
+uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
+
+[[Unicode]]
+uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

+ 3 - 0
julia/StarPU.jl/Project.toml

@@ -4,4 +4,7 @@ authors = ["barthou "]
 version = "0.1.0"
 version = "0.1.0"
 
 
 [deps]
 [deps]
+CBinding = "d43a6710-96b8-4a2d-833c-c424785e5374"
+Clang = "40e3b903-d033-50b4-a0cc-940c62c95e31"
 Libdl = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
 Libdl = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
+ThreadPools = "b189fb0b-2eb5-4ed4-bc0c-d34c51242431"

+ 53 - 0
julia/README

@@ -0,0 +1,53 @@
+Contents
+========
+
+* Installing Julia
+* Installing StarPU module for Julia
+* Running Examples
+
+Installing Julia
+----------------
+Julia version 1.3+ is required and can be downloaded from
+https://julialang.org/downloads/.
+
+
+Installing StarPU module for Julia
+----------------------------------
+First, build the jlstarpu_c_wrapper library:
+
+$ make
+
+Then, you need to add the lib/ directory to your library path and the julia/
+directory to your Julia load path:
+
+$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/lib
+$ export JULIA_LOAD_PATH=$JULIA_LOAD_PATH:$PWD
+
+This step can also be done by sourcing the setenv.sh script:
+
+$ . setenv.sh
+
+Running Examples
+----------------
+
+You can find several examples in the examples/ directory.
+
+For each example X, three versions are provided:
+
+- X.c: Original C+starpu code
+- X_native.jl: Native Julia version (without StarPU)
+- X.jl: Julia version using StarPU
+
+
+To run the original C+StarPU code:
+$ make cstarpu.dat
+
+To run the native Julia version:
+$ make julia_native.dat
+
+To run the Julia version using StarPU:
+$ make julia_generatedc.dat
+
+
+
+

+ 0 - 8
julia/StarPU.jl/Makefile

@@ -1,8 +0,0 @@
-SRCS=src/jlstarpu_task_submit.c src/jlstarpu_simple_functions.c src/jlstarpu_data_handles.c
-CC = gcc
-CFLAGS += $(shell pkg-config --cflags starpu-1.3)
-LDFLAGS += $(shell pkg-config --libs starpu-1.3)
-
-lib/libjlstarpu_c_wrapper.so: ${SRCS}
-	test -d lib || mkdir lib
-	$(CC) -O3 -shared -fPIC $(CFLAGS) $^ -o $@ $(LDFLAGS)

+ 0 - 4
julia/StarPU.jl/Manifest.toml

@@ -1,4 +0,0 @@
-# This file is machine-generated - editing it directly is not advised
-
-[[Libdl]]
-uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

+ 0 - 2
julia/StarPU.jl/REQUIRE

@@ -1,2 +0,0 @@
-julia 1.0
-Libdl

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 0 - 1290
julia/StarPU.jl/src/StarPU.jl


+ 0 - 349
julia/StarPU.jl/src/compiler/cuda.jl

@@ -1,349 +0,0 @@
-
-
-function is_indep_for_expr(x :: StarpuExpr)
-    return isa(x, StarpuExprFor) && x.is_independant
-end
-
-
-function extract_init_indep_finish(expr :: StarpuExpr) # TODO : it is not a correct extraction (example : if (cond) {@indep for ...} else {return} would not work)
-                                                            # better use apply() (NOTE :assert_no_indep_for already exists) to find recursively every for loops
-    init = StarpuExpr[]
-    finish = StarpuExpr[]
-
-    if is_indep_for_expr(expr)
-        return init, StarpuIndepFor(expr), finish
-    end
-
-    if !isa(expr, StarpuExprBlock)
-        return [expr], nothing, finish
-    end
-
-    for i in (1 : length(expr.exprs))
-
-        if !is_indep_for_expr(expr.exprs[i])
-            continue
-        end
-
-        init = expr.exprs[1 : i-1]
-        indep = StarpuIndepFor(expr.exprs[i])
-        finish = expr.exprs[i+1 : end]
-
-        if any(is_indep_for_expr, finish)
-            error("Sequence of several independant loops is not allowed") #same it may be tricked by a Block(Indep_for(...))
-        end
-
-        return init, indep, finish
-    end
-
-    return expr.exprs, nothing, finish
-end
-
-
-
-
-function analyse_variable_declarations(expr :: StarpuExpr, already_defined :: Vector{StarpuExprTypedVar} = StarpuExprTypedVar[])
-
-    undefined_variables = Symbol[]
-    defined_variable_names = map((x -> x.name), already_defined)
-    defined_variable_types = map((x -> x.typ), already_defined)
-
-    function func_to_apply(x :: StarpuExpr)
-
-        if isa(x, StarpuExprFunction)
-            error("No function declaration allowed in this section")
-        end
-
-        if isa(x, StarpuExprVar) || isa(x, StarpuExprTypedVar)
-
-            if !(x.name in defined_variable_names) && !(x.name in undefined_variables)
-                push!(undefined_variables, x.name)
-            end
-
-            return x
-        end
-
-        if isa(x, StarpuExprAffect) || isa(x, StarpuExprFor)
-
-            if isa(x, StarpuExprAffect)
-
-                var = x.var
-
-                if !isa(var, StarpuExprTypedVar)
-                    return x
-                end
-
-                name = var.name
-                typ = var.typ
-
-            else
-                name = x.iter
-                typ = Int64
-            end
-
-            if name in defined_variable_names
-                error("Multiple definition of variable $name")
-            end
-
-            filter!((sym -> sym != name), undefined_variables)
-            push!(defined_variable_names, name)
-            push!(defined_variable_types, typ)
-
-            return x
-        end
-
-        return x
-    end
-
-    apply(func_to_apply, expr)
-    defined_variable = map(StarpuExprTypedVar, defined_variable_names, defined_variable_types)
-
-    return defined_variable, undefined_variables
-end
-
-
-
-function find_variable(name :: Symbol, vars :: Vector{StarpuExprTypedVar})
-
-    for x in vars
-        if x.name == name
-            return x
-        end
-    end
-
-    return nothing
-end
-
-
-
-function add_device_to_interval_call(expr :: StarpuExpr)
-
-    function func_to_apply(x :: StarpuExpr)
-
-        if isa(x, StarpuExprCall) && x.func == :jlstarpu_interval_size
-            return StarpuExprCall(:jlstarpu_interval_size__device, x.args)
-        end
-
-        return x
-    end
-
-    return apply(func_to_apply, expr)
-end
-
-
-
-function transform_to_cuda_kernel(func :: StarpuExprFunction)
-
-    cpu_func = transform_to_cpu_kernel(func)
-
-    init, indep, finish = extract_init_indep_finish(cpu_func.body)
-
-    if indep == nothing
-        error("No independant for loop has been found") # TODO can fail because extraction is not correct yet
-    end
-
-    prekernel_instr, kernel_args, kernel_instr = analyse_sets(indep)
-
-    kernel_call = StarpuExprCudaCall(:cudaKernel, (@parse nblocks), (@parse THREADS_PER_BLOCK), StarpuExpr[])
-    prekernel_instr = vcat(init, prekernel_instr)
-    kernel_instr = vcat(kernel_instr, indep.body)
-
-    indep_for_def, indep_for_undef = analyse_variable_declarations(StarpuExprBlock(kernel_instr), kernel_args)
-    prekernel_def, prekernel_undef = analyse_variable_declarations(StarpuExprBlock(prekernel_instr), cpu_func.args)
-
-    for undef_var in indep_for_undef
-
-        found_var = find_variable(undef_var, prekernel_def)
-
-        if found_var == nothing # TODO : error then ?
-            continue
-        end
-
-        push!(kernel_args, found_var)
-    end
-
-    call_args = map((x -> StarpuExprVar(x.name)), kernel_args)
-    kernelname=Symbol("KERNEL_",func.func);
-    cuda_call = StarpuExprCudaCall(kernelname, (@parse nblocks), (@parse THREADS_PER_BLOCK), call_args)
-    push!(prekernel_instr, cuda_call)
-    push!(prekernel_instr, @parse cudaStreamSynchronize(starpu_cuda_get_local_stream()))
-    prekernel_instr = vcat(prekernel_instr, finish)
-
-    prekernel_name = Symbol("CUDA_", func.func)
-    prekernel = StarpuExprFunction(Nothing, prekernel_name, cpu_func.args, StarpuExprBlock(prekernel_instr))
-    prekernel = flatten_blocks(prekernel)
-
-    kernel = StarpuExprFunction(Nothing, kernelname, kernel_args, StarpuExprBlock(kernel_instr))
-    kernel = add_device_to_interval_call(kernel)
-    kernel = flatten_blocks(kernel)
-    
-    return prekernel, kernel
-end
-
-
-struct StarpuIndepFor
-
-    iters :: Vector{Symbol}
-    sets :: Vector{StarpuExprInterval}
-
-    body :: StarpuExpr
-end
-
-
-function assert_no_indep_for(expr :: StarpuExpr)
-
-    function func_to_run(x :: StarpuExpr)
-        if (isa(x, StarpuExprFor) && x.is_independant)
-            error("Invalid usage of intricated @indep for loops")
-        end
-
-        return x
-    end
-
-    return apply(func_to_run, expr)
-end
-
-
-function StarpuIndepFor(expr :: StarpuExprFor)
-
-    if !expr.is_independant
-        error("For expression must be prefixed by @indep")
-    end
-
-    iters = []
-    sets = []
-    for_loop = expr
-
-    while isa(for_loop, StarpuExprFor) && for_loop.is_independant
-
-        push!(iters, for_loop.iter)
-        push!(sets, for_loop.set)
-        for_loop = for_loop.body
-
-        while (isa(for_loop, StarpuExprBlock) && length(for_loop.exprs) == 1)
-            for_loop = for_loop.exprs[1]
-        end
-    end
-
-    return StarpuIndepFor(iters, sets, assert_no_indep_for(for_loop))
-end
-
-
-function translate_index_code(dims :: Vector{StarpuExprVar})
-
-    ndims = length(dims)
-
-    if ndims == 0
-        error("No dimension specified")
-    end
-
-    prod = StarpuExprValue(1)
-    output = StarpuExpr[]
-    reversed_dim = reverse(dims)
-    thread_index_patern = @parse € :: Int64 = (€ / €) % €
-    thread_id = @parse THREAD_ID
-
-    for i in (1 : ndims)
-        index_lvalue = StarpuExprVar(Symbol(:kernel_ids__index_, ndims - i + 1))
-        expr = replace_pattern(thread_index_patern, index_lvalue, thread_id, prod, reversed_dim[i])
-        push!(output, expr)
-
-        prod = StarpuExprCall(:(*), [prod, reversed_dim[i]])
-    end
-
-    thread_id_pattern = @parse begin
-
-        € :: Int64 = blockIdx.x * blockDim.x + threadIdx.x
-
-        if (€ >= €)
-            return
-        end
-    end
-
-    bound_verif = replace_pattern(thread_id_pattern, thread_id, thread_id, prod)
-    push!(output, bound_verif)
-
-    return reverse(output)
-end
-
-
-
-
-
-
-
-function kernel_index_declarations(ind_for :: StarpuIndepFor)
-
-    pre_kernel_instr = StarpuExpr[]
-    kernel_args = StarpuExprTypedVar[]
-    kernel_instr = StarpuExpr[]
-
-    decl_pattern = @parse € :: Int64 = €
-    interv_size_decl_pattern = @parse € :: Int64 = jlstarpu_interval_size(€, €, €)
-    iter_pattern = @parse € :: Int64 = € + € * €
-
-    dims = StarpuExprVar[]
-    ker_instr_to_add_later_on = StarpuExpr[]
-
-    for k in (1 : length(ind_for.sets))
-
-        set = ind_for.sets[k]
-
-        start_var = starpu_parse(Symbol(:kernel_ids__start_, k))
-        start_decl = replace_pattern(decl_pattern, start_var, set.start)
-
-        step_var = starpu_parse(Symbol(:kernel_ids__step_, k))
-        step_decl = replace_pattern(decl_pattern, step_var, set.step)
-
-        dim_var = starpu_parse(Symbol(:kernel_ids__dim_, k))
-        dim_decl = replace_pattern(interv_size_decl_pattern, dim_var, start_var, step_var, set.stop)
-
-        push!(dims, dim_var)
-
-        push!(pre_kernel_instr, start_decl, step_decl, dim_decl)
-        push!(kernel_args, StarpuExprTypedVar(start_var.name, Int64))
-        push!(kernel_args, StarpuExprTypedVar(step_var.name, Int64))
-        push!(kernel_args, StarpuExprTypedVar(dim_var.name, Int64))
-
-        iter_var = starpu_parse(ind_for.iters[k])
-        index_var = starpu_parse(Symbol(:kernel_ids__index_, k))
-        iter_decl = replace_pattern(iter_pattern, iter_var, start_var, index_var, step_var)
-
-        push!(ker_instr_to_add_later_on, iter_decl)
-    end
-
-
-    return dims, ker_instr_to_add_later_on, pre_kernel_instr , kernel_args, kernel_instr
-end
-
-
-
-function analyse_sets(ind_for :: StarpuIndepFor)
-
-
-    decl_pattern = @parse € :: Int64 = €
-    nblocks_decl_pattern = @parse € :: Int64 = (€ + THREADS_PER_BLOCK - 1)/THREADS_PER_BLOCK
-
-    dims, ker_instr_to_add, pre_kernel_instr, kernel_args, kernel_instr  = kernel_index_declarations(ind_for)
-
-    dim_prod = @parse 1
-
-    for d in dims
-        dim_prod = StarpuExprCall(:(*), [dim_prod, d])
-    end
-
-    nthreads_var = @parse nthreads
-    nthreads_decl = replace_pattern(decl_pattern, nthreads_var, dim_prod)
-    push!(pre_kernel_instr, nthreads_decl)
-
-    nblocks_var = @parse nblocks
-    nblocks_decl = replace_pattern(nblocks_decl_pattern, nblocks_var, nthreads_var)
-    push!(pre_kernel_instr, nblocks_decl)
-
-
-    index_decomposition = translate_index_code(dims)
-
-    push!(kernel_instr, index_decomposition...)
-    push!(kernel_instr, ker_instr_to_add...)
-
-    return pre_kernel_instr, kernel_args, kernel_instr
-end

+ 0 - 13
julia/StarPU.jl/src/compiler/include.jl

@@ -1,13 +0,0 @@
-export starpu_new_cpu_kernel_file
-export starpu_new_cuda_kernel_file
-export @codelet
-export @target
-
-include("utils.jl")
-include("expressions.jl")
-include("parsing.jl")
-include("expression_manipulation.jl")
-include("c.jl")
-include("cuda.jl")
-include("file_generation.jl")
-

+ 0 - 38
julia/StarPU.jl/src/compiler/utils.jl

@@ -1,38 +0,0 @@
-import Base.print
-
-function print_newline(io :: IO, indent = 0, n_lines = 1)
-    for i in (1 : n_lines)
-        print(io, "\n")
-    end
-
-    for i in (1 : indent)
-        print(io, " ")
-    end
-end
-
-starpu_indent_size = 4
-
-function rand_char()
-    r = rand(UInt) % 62
-
-    if (0 <= r < 10)
-        return '0' + r
-    elseif (10 <= r < 36)
-        return 'a' + (r - 10)
-    else
-        return 'A' + (r - 36)
-    end
-end
-
-function rand_string(size = 8)
-    output = ""
-
-    for i in (1 : size)
-        output *= string(rand_char())
-    end
-    return output
-end
-
-function system(cmd :: String)
-    ccall((:system, "libc"), Cint, (Cstring,), cmd)
-end

+ 0 - 133
julia/StarPU.jl/src/jlstarpu_data_handles.c

@@ -1,133 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-
-#include "jlstarpu.h"
-
-enum jlstarpu_data_filter_func
-{
-	JLSTARPU_MATRIX_FILTER_VERTICAL_BLOCK = 0,
-	JLSTARPU_MATRIX_FILTER_BLOCK,
-	JLSTARPU_VECTOR_FILTER_BLOCK,
-};
-
-struct jlstarpu_data_filter
-{
-	enum jlstarpu_data_filter_func func;
-	unsigned int nchildren;
-
-};
-
-
-void * jlstarpu_translate_data_filter_func(enum jlstarpu_data_filter_func func)
-{
-	switch (func){
-	case JLSTARPU_MATRIX_FILTER_VERTICAL_BLOCK:
-		return starpu_matrix_filter_vertical_block;
-	case JLSTARPU_MATRIX_FILTER_BLOCK:
-		return starpu_matrix_filter_block;
-	case JLSTARPU_VECTOR_FILTER_BLOCK:
-		return starpu_vector_filter_block;
-	default:
-		return NULL;
-	}
-
-}
-
-void jlstarpu_translate_data_filter(const struct jlstarpu_data_filter * const input,struct starpu_data_filter * output)
-{
-	memset(output, 0, sizeof(struct starpu_data_filter));
-	output->filter_func = jlstarpu_translate_data_filter_func(input->func);
-	output->nchildren = input->nchildren;
-}
-
-void jlstarpu_data_partition(starpu_data_handle_t handle,const struct jlstarpu_data_filter * const jl_filter)
-{
-	struct starpu_data_filter filter;
-	jlstarpu_translate_data_filter(jl_filter, &filter);
-	starpu_data_partition(handle, &filter);
-}
-
-
-void jlstarpu_data_map_filters_1_arg(starpu_data_handle_t handle,
-	const struct jlstarpu_data_filter * const jl_filter
-	)
-{
-	struct starpu_data_filter filter;
-	jlstarpu_translate_data_filter(jl_filter, &filter);
-
-	starpu_data_map_filters(handle, 1, &filter);
-
-}
-
-
-void jlstarpu_data_map_filters_2_arg
-(
-	starpu_data_handle_t handle,
-	const struct jlstarpu_data_filter * const jl_filter_1,
-	const struct jlstarpu_data_filter * const jl_filter_2
-	)
-{
-	struct starpu_data_filter filter_1;
-	jlstarpu_translate_data_filter(jl_filter_1, &filter_1);
-
-	struct starpu_data_filter filter_2;
-	jlstarpu_translate_data_filter(jl_filter_2, &filter_2);
-
-
-	starpu_data_map_filters(handle, 2, &filter_1, &filter_2);
-
-}
-
-
-
-
-#define JLSTARPU_GET(interface, field, ret_type)			\
-									\
-	ret_type jlstarpu_##interface##_get_##field(const struct starpu_##interface##_interface * const x) \
-	{								\
-		return (ret_type) x->field;				\
-	}								\
-
-
-
-
-
-JLSTARPU_GET(vector, ptr, void *)
-JLSTARPU_GET(vector, nx, uint32_t)
-JLSTARPU_GET(vector, elemsize, size_t)
-
-
-
-JLSTARPU_GET(matrix, ptr, void *)
-JLSTARPU_GET(matrix, ld, uint32_t)
-JLSTARPU_GET(matrix, nx, uint32_t)
-JLSTARPU_GET(matrix, ny, uint32_t)
-JLSTARPU_GET(matrix, elemsize, size_t)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

+ 0 - 73
julia/StarPU.jl/src/jlstarpu_task.h

@@ -1,73 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_task.h
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-#ifndef JLSTARPU_TASK_H_
-#define JLSTARPU_TASK_H_
-
-
-#include "jlstarpu.h"
-
-struct jlstarpu_codelet
-{
-	uint32_t where;
-
-	starpu_cpu_func_t cpu_func;
-	char * cpu_func_name;
-
-	starpu_cuda_func_t cuda_func;
-	starpu_opencl_func_t opencl_func;
-
-	int nbuffer;
-	enum starpu_data_access_mode * modes;
-
-	struct starpu_perfmodel * model;
-
-};
-
-
-
-struct jlstarpu_task
-{
-	struct starpu_codelet * cl;
-	starpu_data_handle_t * handles;
-	unsigned int synchronous;
-
-	void * cl_arg;
-	size_t cl_arg_size;
-};
-
-
-#if 0
-
-struct cl_args_decorator
-{
-	struct jlstarpu_function_launcher * launcher;
-	void * cl_args;
-};
-
-#endif
-
-
-
-
-
-#endif /* JLSTARPU_TASK_H_ */

+ 0 - 208
julia/StarPU.jl/src/jlstarpu_task_submit.c

@@ -1,208 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_task_submit.c
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-
-#include "jlstarpu.h"
-
-
-struct starpu_codelet * jlstarpu_new_codelet()
-{
-	struct starpu_codelet * output;
-	TYPE_MALLOC(output, 1);
-
-	starpu_codelet_init(output);
-
-	return output;
-}
-
-
-#if 0
-struct starpu_codelet * jlstarpu_translate_codelet(struct jlstarpu_codelet * const input)
-{
-	struct starpu_codelet * output;
-	TYPE_MALLOC(output, 1);
-
-	starpu_codelet_init(output);
-
-	output->where = input->where;
-	output->cpu_funcs[0] = input->cpu_func;
-	output->cpu_funcs_name[0] = input->cpu_func_name;
-
-	output->cuda_funcs[0] = input->cuda_func;
-	output->opencl_funcs[0] = input->opencl_func;
-
-	output->nbuffers = input->nbuffer;
-	memcpy(&(output->modes), input->modes, input->nbuffer * sizeof(enum starpu_data_access_mode));
-
-	output->model = input->model;
-
-	return output;
-}
-#endif
-
-void jlstarpu_codelet_update(const struct jlstarpu_codelet * const input, struct starpu_codelet * const output)
-{
-	output->where = input->where;
-
-	output->cpu_funcs[0] = input->cpu_func;
-	output->cpu_funcs_name[0] = input->cpu_func_name;
-
-	output->cuda_funcs[0] = input->cuda_func;
-	output->opencl_funcs[0] = input->opencl_func;
-
-	output->nbuffers = input->nbuffer;
-	memcpy(&(output->modes), input->modes, input->nbuffer * sizeof(enum starpu_data_access_mode));
-
-	output->model = input->model;
-
-}
-#if 0
-void jlstarpu_free_codelet(struct starpu_codelet * cl)
-{
-	free(cl);
-}
-#endif
-
-void jlstarpu_hello() {
-	fprintf(stderr,"coucou !");
-}
-
-#if 0
-struct starpu_task * jlstarpu_translate_task(const struct jlstarpu_task * const input)
-{
-	struct starpu_task * output = starpu_task_create();
-
-	if (output == NULL){
-		return NULL;
-	}
-
-	output->cl = input->cl;
-	memcpy(&(output->handles), input->handles, input->cl->nbuffers * sizeof(starpu_data_handle_t));
-	output->synchronous = input->synchronous;
-
-
-	return output;
-}
-#endif
-
-char *starpu_find_function(char *name, char *device) {
-	return NULL;
-}
-
-void jlstarpu_task_update(const struct jlstarpu_task * const input, struct starpu_task * const output)
-{
-	output->cl = input->cl;
-	memcpy(&(output->handles), input->handles, input->cl->nbuffers * sizeof(starpu_data_handle_t));
-	output->synchronous = input->synchronous;
-	output->cl_arg = input->cl_arg;
-	output->cl_arg_size = input->cl_arg_size;
-}
-
-/*
-
-void print_perfmodel(struct starpu_perfmodel * p)
-{
-	printf("Perfmodel at address %p:\n");
-	printf("\ttype : %u\n", p->type);
-	printf("\tcost_function : %p\n", p->cost_function);
-	printf("\tarch_cost_function : %p\n", p->arch_cost_function);
-	printf("\tsize_base : %p\n", p->size_base);
-	printf("\tfootprint : %p\n", p->footprint);
-	printf("\tsymbol : %s\n", p->symbol);
-	printf("\tis_loaded : %u\n", p->is_loaded);
-	printf("\tbenchmarking : %u\n", p->benchmarking);
-	printf("\tis_init : %u\n", p->is_init);
-	printf("\tparameters : %p\n", p->parameters);
-	printf("\tparameters_names : %p\n", p->parameters_names);
-	printf("\tnparameters : %u\n", p->nparameters);
-	printf("\tcombinations : %p\n", p->combinations);
-	printf("\tncombinations : %u\n", p->ncombinations);
-	printf("\tstate : %p\n", p->state);
-
-}
-
-
-*/
-
-#if 0
-/*
- * TODO : free memory
- */
-int jlstarpu_task_submit(const struct jlstarpu_task * const jl_task)
-{
-	DEBUG_PRINT("Inside C wrapper");
-
-	struct starpu_task * task;
-	int ret_code;
-
-
-	DEBUG_PRINT("Translating task...");
-	task = jlstarpu_translate_task(jl_task);
-
-	if (task == NULL){
-		fprintf(stderr, "Error while creating the task.\n");
-		return EXIT_FAILURE;
-	}
-
-	DEBUG_PRINT("Task translated");
-	DEBUG_PRINT("Submitting task to StarPU...");
-	ret_code = starpu_task_submit(task);
-	DEBUG_PRINT("starpu_task_submit has returned");
-
-
-	if (ret_code != 0){
-		fprintf(stderr, "Error while submitting task.\n");
-		return ret_code;
-	}
-
-
-	DEBUG_PRINT("Done");
-	DEBUG_PRINT("END OF STARPU FUNCTION");
-
-
-	return ret_code;
-}
-
-#endif
-
-
-
-
-
-
-
-#define JLSTARPU_UPDATE_FUNC(type, field)\
-	\
-	void jlstarpu_##type##_update_##field(const struct jlstarpu_##type * const input, struct starpu_##type * const output)\
-	{\
-		output->field = input->field;\
-	}
-
-
-
-
-
-
-
-
-
-

+ 0 - 67
julia/StarPU.jl/src/jlstarpu_utils.h

@@ -1,67 +0,0 @@
-/* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2018                                     Alexis Juven
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
-/*
- * jlstarpu_utils.h
- *
- *  Created on: 27 juin 2018
- *      Author: ajuven
- */
-
-#ifndef JLSTARPU_UTILS_H_
-#define JLSTARPU_UTILS_H_
-
-#include "jlstarpu.h"
-
-
-#define TYPE_MALLOC(ptr, nb_elements) \
-		do {\
-			if ((nb_elements) == 0){ \
-				ptr = NULL; \
-			} else { \
-				ptr = malloc((nb_elements) * sizeof(*(ptr))); \
-				if (ptr == NULL){ \
-					fprintf(stderr, "\033[31mCRITICAL : MALLOC HAS RETURNED NULL\n\033[0m");\
-					fflush(stderr);\
-					exit(1);\
-				} \
-			} \
-		} while(0)
-
-
-
-//#define DEBUG
-#ifdef DEBUG
-
-#define DEBUG_PRINT(...)\
-		do {\
-			fprintf(stderr, "\x1B[34m%s : \x1B[0m", __FUNCTION__);\
-			fprintf(stderr, __VA_ARGS__);\
-			fprintf(stderr, "\n");\
-			fflush(stderr);\
-		} while (0)
-
-
-
-
-#else
-
-#define DEBUG_PRINT(...)
-
-#endif
-
-
-
-#endif /* JLSTARPU_UTILS_H_ */

+ 139 - 0
julia/examples/Makefile.am

@@ -0,0 +1,139 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+include $(top_srcdir)/starpu.mk
+
+noinst_PROGRAMS		=
+
+if STARPU_HAVE_WINDOWS
+LOADER_BIN		=
+else
+loader_CPPFLAGS 	= 	$(AM_CFLAGS) $(AM_CPPFLAGS) -I$(top_builddir)/src/
+if !STARPU_SIMGRID
+LOADER			=	loader
+LOADER_BIN		=	$(abs_top_builddir)/julia/examples/$(LOADER)
+noinst_PROGRAMS		+=	loader
+endif
+loader_SOURCES		=	../../tests/loader.c
+endif
+
+if STARPU_HAVE_AM111
+TESTS_ENVIRONMENT	=	top_builddir="$(abs_top_builddir)" top_srcdir="$(abs_top_srcdir)"
+else
+TESTS_ENVIRONMENT 	=	top_builddir="$(abs_top_builddir)" top_srcdir="$(abs_top_srcdir)" $(LOADER_BIN)
+endif
+
+BUILT_SOURCES =
+
+CLEANFILES = *.gcno *.gcda *.linkinfo starpu_idle_microsec.log
+
+EXTRA_DIST =					\
+	axpy/axpy.jl				\
+	axpy/axpy.sh				\
+	black_scholes/black_scholes.jl		\
+	callback/callback.jl			\
+	callback/callback.sh			\
+	check_deps/check_deps.jl		\
+	check_deps/check_deps.sh		\
+	dependency/end_dep.jl			\
+	dependency/end_dep.sh			\
+	dependency/tag_dep.jl			\
+	dependency/tag_dep.sh			\
+	dependency/task_dep.sh			\
+	dependency/task_dep.jl			\
+	gemm/gemm.jl				\
+	gemm/gemm_native.jl			\
+	gemm/gemm.sh				\
+	mandelbrot/mandelbrot_native.jl		\
+	mandelbrot/mandelbrot.jl		\
+	mandelbrot/mandelbrot.sh		\
+	mult/mult_native.jl			\
+	mult/mult.jl				\
+	mult/perf.sh				\
+	mult/mult_starpu.sh			\
+	task_insert_color/task_insert_color.jl	\
+	task_insert_color/task_insert_color.sh	\
+	variable/variable.jl			\
+	variable/variable_native.jl		\
+	variable/variable.sh			\
+	vector_scal/vector_scal.jl		\
+	vector_scal/vector_scal.sh
+
+examplebindir = $(libdir)/starpu/julia
+
+examplebin_PROGRAMS =
+
+if STARPU_USE_CUDA
+if STARPU_COVERITY
+include $(top_srcdir)/starpu-mynvcc.mk
+else
+NVCCFLAGS += --compiler-options -fno-strict-aliasing  -I$(top_srcdir)/include/ -I$(top_builddir)/include/ $(HWLOC_CFLAGS)
+
+.cu.cubin:
+	$(V_nvcc) $(NVCC) -cubin $< -o $@ $(NVCCFLAGS)
+
+.cu.o:
+	$(V_nvcc) $(NVCC) $< -c -o $@ $(NVCCFLAGS)
+endif
+endif
+
+AM_CFLAGS = -Wall $(STARPU_CUDA_CPPFLAGS) $(STARPU_OPENCL_CPPFLAGS) $(FXT_CFLAGS) $(MAGMA_CFLAGS) $(HWLOC_CFLAGS) $(GLOBAL_AM_CFLAGS) -Wno-unused
+LIBS = $(top_builddir)/src/@LIBSTARPU_LINK@ ../src/libstarpujulia-@STARPU_EFFECTIVE_VERSION@.la -lm @LIBS@ $(FXT_LIBS) $(MAGMA_LIBS)
+AM_CPPFLAGS = -I$(top_srcdir)/include/ -I$(top_srcdir)/examples/ -I$(top_builddir)/include
+AM_LDFLAGS = $(STARPU_OPENCL_LDFLAGS) $(STARPU_CUDA_LDFLAGS) $(FXT_LDFLAGS) $(STARPU_COI_LDFLAGS) $(STARPU_SCIF_LDFLAGS)
+
+check_PROGRAMS = $(LOADER) $(starpu_julia_EXAMPLES)
+SHELL_TESTS	=
+STARPU_JULIA_EXAMPLES	=
+
+if BUILD_EXAMPLES
+examplebin_PROGRAMS 	+=	$(STARPU_JULIA_EXAMPLES)
+
+TESTS			=	$(SHELL_TESTS) $(STARPU_JULIA_EXAMPLES)
+endif
+
+######################
+#      Examples      #
+######################
+
+SHELL_TESTS	+=	check_deps/check_deps.sh
+
+STARPU_JULIA_EXAMPLES	+=	mult/mult
+mult_mult_SOURCES	=	mult/mult.c mult/cpu_mult.c
+SHELL_TESTS		+=	mult/mult_starpu.sh
+
+STARPU_JULIA_EXAMPLES				+=	task_insert_color/task_insert_color
+task_insert_color_task_insert_color_SOURCES	=	task_insert_color/task_insert_color.c
+SHELL_TESTS					+=	task_insert_color/task_insert_color.sh
+
+SHELL_TESTS	+=	variable/variable.sh
+SHELL_TESTS	+=	vector_scal/vector_scal.sh
+
+STARPU_JULIA_EXAMPLES		+=	mandelbrot/mandelbrot
+mandelbrot_mandelbrot_SOURCES	=	mandelbrot/mandelbrot.c mandelbrot/cpu_mandelbrot.c mandelbrot/cpu_mandelbrot.h
+SHELL_TESTS			+=	mandelbrot/mandelbrot.sh
+
+STARPU_JULIA_EXAMPLES		+= 	callback/callback
+callback_callback_SOURCES	=	callback/callback.c
+SHELL_TESTS			+=	callback/callback.sh
+
+SHELL_TESTS			+=	dependency/tag_dep.sh
+SHELL_TESTS			+=	dependency/task_dep.sh
+SHELL_TESTS			+=	dependency/end_dep.sh
+
+if !NO_BLAS_LIB
+SHELL_TESTS			+=	axpy/axpy.sh
+SHELL_TESTS			+=	gemm/gemm.sh
+endif

+ 110 - 0
julia/examples/axpy/axpy.jl

@@ -0,0 +1,110 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+using Printf
+const EPSILON = 1e-6
+
+function check(alpha, X, Y)
+    for i in 1:length(X)
+        expected_value = alpha * X[i] + 4.0
+        if abs(Y[i] - expected_value) > expected_value * EPSILON
+            error("at ", i, ", ", alpha, "*", X[i], "+4.0=", Y[i], ", expected ", expected_value)
+        end
+    end
+end
+
+@target STARPU_CPU+STARPU_CUDA
+@codelet function axpy(X :: Vector{Float32}, Y :: Vector{Float32}, alpha ::Float32) :: Nothing
+    STARPU_SAXPY(length(X), alpha, X, 1, Y, 1)
+    return
+end
+
+function axpy(N, NBLOCKS, alpha, display = true)
+    X = Array(fill(1.0f0, N))
+    Y = Array(fill(4.0f0, N))
+
+    starpu_memory_pin(X)
+    starpu_memory_pin(Y)
+
+    block_filter = starpu_data_filter(STARPU_VECTOR_FILTER_BLOCK, NBLOCKS)
+
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+    cl = starpu_codelet(
+        cpu_func = CPU_CODELETS["axpy"],
+        cuda_func = CUDA_CODELETS["axpy"],
+        #cuda_func = STARPU_SAXPY,
+        modes = [STARPU_R, STARPU_RW],
+        perfmodel = perfmodel
+    )
+
+    if display
+        println("BEFORE x[0] = ", X[1])
+        println("BEFORE y[0] = ", Y[1])
+    end
+
+    t_start = time_ns()
+
+    @starpu_block let
+        hX,hY = starpu_data_register(X, Y)
+
+        starpu_data_partition(hX, block_filter)
+        starpu_data_partition(hY, block_filter)
+
+        for b in 1:NBLOCKS
+            task = starpu_task(cl = cl, handles = [hX[b],hY[b]], cl_arg=(Float32(alpha),),
+                               tag=starpu_tag_t(b))
+            starpu_task_submit(task)
+        end
+
+        starpu_task_wait_for_all()
+    end
+
+    t_end = time_ns()
+
+    timing = (t_end-t_start)/1000
+
+    if display
+        @printf("timing -> %d us %.2f MB/s\n", timing, 3*N*4/timing)
+        println("AFTER y[0] = ", Y[1], " (ALPHA=", alpha, ")")
+    end
+
+    check(alpha, X, Y)
+
+    starpu_memory_unpin(X)
+    starpu_memory_unpin(Y)
+end
+
+function main()
+    N = 16 * 1024 * 1024
+    NBLOCKS = 8
+    alpha = 3.41
+
+    starpu_init()
+    starpu_cublas_init()
+
+    # warmup
+    axpy(10, 1, alpha, false)
+
+    axpy(N, NBLOCKS, alpha)
+
+    starpu_shutdown()
+end
+
+main()

+ 19 - 0
julia/examples/axpy/axpy.sh

@@ -0,0 +1,19 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh axpy/axpy.jl
+

+ 1 - 0
julia/black_scholes/black_scholes.c

@@ -1,5 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2019       Mael Keryell
  * Copyright (C) 2019       Mael Keryell
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify

+ 15 - 2
julia/black_scholes/black_scholes.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 import Libdl
 using StarPU
 using StarPU
 
 
@@ -115,8 +130,6 @@ using StarPU
     return 0
     return 0
 end
 end
 
 
-
-@debugprint "starpu_init"
 starpu_init()
 starpu_init()
 
 
 function black_scholes_starpu(data ::Matrix{Float64}, res ::Matrix{Float64}, nslices ::Int64)
 function black_scholes_starpu(data ::Matrix{Float64}, res ::Matrix{Float64}, nslices ::Int64)

+ 93 - 0
julia/examples/callback/callback.c

@@ -0,0 +1,93 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2009-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+
+/*
+ * This is an example of using a callback. We submit a task, whose callback
+ * submits another task (without any callback).
+ */
+
+#include <starpu.h>
+
+#define FPRINTF(ofile, fmt, ...) do { if (!getenv("STARPU_SSILENT")) {fprintf(ofile, fmt, ## __VA_ARGS__); }} while(0)
+
+starpu_data_handle_t handle;
+
+void cpu_codelet(void *descr[], void *_args)
+{
+	(void)_args;
+	int *val = (int *)STARPU_VARIABLE_GET_PTR(descr[0]);
+
+	*val += 1;
+}
+
+struct starpu_codelet cl =
+{
+	.modes = { STARPU_RW },
+	.cpu_funcs = {cpu_codelet},
+	.cpu_funcs_name = {"cpu_codelet"},
+	.nbuffers = 1,
+	.name = "callback"
+};
+
+void callback_func(void *callback_arg)
+{
+	int ret;
+
+	(void)callback_arg;
+
+	struct starpu_task *task = starpu_task_create();
+	task->cl = &cl;
+	task->handles[0] = handle;
+
+	ret = starpu_task_submit(task);
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+}
+
+int main(void)
+{
+	int v=40;
+	int ret;
+
+	ret = starpu_init(NULL);
+	if (ret == -ENODEV)
+		return 77;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_init");
+
+	starpu_variable_data_register(&handle, STARPU_MAIN_RAM, (uintptr_t)&v, sizeof(int));
+
+	struct starpu_task *task = starpu_task_create();
+	task->cl = &cl;
+	task->callback_func = callback_func;
+	task->callback_arg = NULL;
+	task->handles[0] = handle;
+
+	ret = starpu_task_submit(task);
+	if (ret == -ENODEV) goto enodev;
+	STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
+
+	starpu_task_wait_for_all();
+	starpu_data_unregister(handle);
+
+	FPRINTF(stderr, "v -> %d\n", v);
+
+	starpu_shutdown();
+
+	return (v == 42) ? 0 : 1;
+
+enodev:
+	starpu_shutdown();
+	return 77;
+}

+ 76 - 0
julia/examples/callback/callback.jl

@@ -0,0 +1,76 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function variable(val ::Ref{Int32}) :: Nothing
+    val[] = val[] + 1
+
+    return
+end
+
+function callback(args)
+    cl = args[1]
+    handles = args[2]
+
+    task = starpu_task(cl = cl, handles=handles)
+    starpu_task_submit(task)
+end
+
+function variable_with_starpu(val ::Ref{Int32})
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+    cl = starpu_codelet(
+        cpu_func = CPU_CODELETS["variable"],
+        # cuda_func = CUDA_CODELETS["matrix_mult"],
+        #opencl_func="ocl_matrix_mult",
+        modes = [STARPU_RW],
+        perfmodel = perfmodel
+    )
+
+    @starpu_block let
+	hVal = starpu_data_register(val)
+
+        task = starpu_task(cl = cl, handles = [hVal], callback=callback, callback_arg=(cl, [hVal]))
+        starpu_task_submit(task)
+
+        starpu_task_wait_for_all()
+    end
+end
+
+function display()
+    v = Ref(Int32(40))
+
+    variable_with_starpu(v)
+
+    println("variable -> ", v[])
+    if v[] == 42
+        println("result is correct")
+    else
+        println("result is incorret")
+    end
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+display()
+starpu_shutdown()
+GC.enable(true)

+ 19 - 0
julia/examples/callback/callback.sh

@@ -0,0 +1,19 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh callback/callback.jl
+

+ 32 - 0
julia/examples/check_deps/check_deps.jl

@@ -0,0 +1,32 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+import Pkg
+
+try
+    using CBinding
+    using Clang
+    using ThreadPools
+catch
+    Pkg.activate((@__DIR__)*"/../..")
+    Pkg.instantiate()
+    using Clang
+    using CBinding
+    using ThreadPools
+end
+
+using StarPU
+
+starpu_translate_headers()

+ 20 - 0
julia/examples/check_deps/check_deps.sh

@@ -0,0 +1,20 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh check_deps/check_deps.jl
+
+

+ 104 - 0
julia/examples/dependency/end_dep.jl

@@ -0,0 +1,104 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA() :: Nothing
+    # print("[Task A] Value = ", val[]);
+    # do nothing
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function callbackB(task)
+    sleep(1)
+    starpu_task_end_dep_release(task)
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function callbackC(task)
+    starpu_task_end_dep_release(task)
+end
+
+
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        handle = starpu_data_register(value)
+
+	starpu_data_set_sequential_consistency_flag(handle, 0)
+
+        taskA = starpu_task(cl = clA, detach=0)
+        taskB = starpu_task(cl = clB, handles = [handle], callback=callbackB, callback_arg=taskA)
+	taskC = starpu_task(cl = clC, handles = [handle], callback=callbackC, callback_arg=taskA)
+
+	starpu_task_end_dep_add(taskA, 2)
+        starpu_task_declare_deps(taskC, taskB)
+
+        starpu_task_submit(taskA)
+        starpu_task_submit(taskB)
+        starpu_task_submit(taskC)
+        starpu_task_wait(taskA)
+
+        starpu_data_acquire_on_node(handle, STARPU_MAIN_RAM, STARPU_R);
+	# Waiting for taskA should have also waited for taskB and taskC
+        if value[] != 48
+            error("Incorrect value $(value[]) (expected 48)")
+        end
+	starpu_data_release_on_node(handle, STARPU_MAIN_RAM);
+    end
+
+
+    println("Value = ", value[])
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+main()
+starpu_shutdown()
+GC.enable(true)

+ 18 - 0
julia/examples/dependency/end_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/end_dep.jl

+ 122 - 0
julia/examples/dependency/tag_dep.jl

@@ -0,0 +1,122 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA(val ::Ref{Int32}) :: Nothing
+    # print("[Task A] Value = ", val[]);
+    val[] = val[] * 2
+end
+
+function callbackA(arg)
+    clB = arg[1]
+    handle = arg[2]
+    tagHoldC = arg[3]
+
+    taskB = starpu_task(cl = clB, handles = [handle],
+                        callback = starpu_tag_notify_from_apps,
+                        callback_arg = tagHoldC,
+                        sequential_consistency=false)
+
+    starpu_task_submit(taskB)
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] +1
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+
+# Submit taskA and hold it
+# Submit taskC and hold it
+# Release taskA
+# Execute taskA       --> callback: submit taskB
+# Execute taskB       --> callback: release taskC
+#
+# All three tasks use the same data in RW, taskB is submitted after
+# taskC, so taskB should normally only execute after taskC but as the
+# sequential consistency for (taskB, data) is unset, taskB can
+# execute straightaway
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+    tagHoldA :: starpu_tag_t = 32
+    tagHoldC :: starpu_tag_t = 84
+    tagA :: starpu_tag_t = 421
+    tagC :: starpu_tag_t = 842
+
+    starpu_tag_declare_deps(tagA, tagHoldA)
+    starpu_tag_declare_deps(tagC, tagHoldC)
+
+    perfmodel = starpu_perfmodel(
+        perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+        symbol = "history_perf"
+    )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        handle = starpu_data_register(value)
+
+        taskA = starpu_task(cl = clA, handles = [handle], tag = tagA,
+                            callback = callbackA,
+                            callback_arg=(clB, handle, tagHoldC))
+        starpu_task_submit(taskA)
+
+        taskC = starpu_task(cl = clC, handles = [handle], tag = tagC)
+        starpu_task_submit(taskC)
+
+        # Release taskA (we want to make sure it will execute after taskC has been submitted)
+        starpu_tag_notify_from_apps(tagHoldA)
+
+        starpu_task_wait_for_all()
+    end
+
+    if value[] != 50
+        error("Incorrect value $(value[]) (expected 50)")
+    end
+
+    println("Value = ", value[])
+end
+
+# Disable garbage collector because of random segfault/hang when using mutex.
+# This issue should be solved with Julia release 1.5.
+GC.enable(false)
+starpu_init()
+main()
+starpu_shutdown()
+GC.enable(true)

+ 18 - 0
julia/examples/dependency/tag_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/tag_dep.jl

+ 88 - 0
julia/examples/dependency/task_dep.jl

@@ -0,0 +1,88 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU
+@codelet function codeletA(val ::Ref{Int32}) :: Nothing
+    # print("[Task A] Value = ", val[]);
+    val[] = val[] * 2
+end
+
+@target STARPU_CPU
+@codelet function codeletB(val ::Ref{Int32}) :: Nothing
+    # println("[Task B] Value = ", val[]);
+    val[] = val[] +1
+end
+
+@target STARPU_CPU
+@codelet function codeletC(val ::Ref{Int32}) :: Nothing
+    # println("[Task C] Value = ", val[]);
+    val[] = val[] *2
+end
+
+function main()
+    value = Ref(Int32(12))
+
+    @starpu_block let
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+
+        clA = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletA"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clB = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletB"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+        clC = starpu_codelet(
+            cpu_func = CPU_CODELETS["codeletC"],
+            modes = [STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        starpu_data_set_default_sequential_consistency_flag(0)
+
+        handle = starpu_data_register(value)
+
+        taskA = starpu_task(cl = clA, handles = [handle])
+        taskB = starpu_task(cl = clB, handles = [handle])
+        taskC = starpu_task(cl = clC, handles = [handle])
+
+        starpu_task_declare_deps(taskA, taskB)
+        starpu_task_declare_deps(taskC, taskA, taskB)
+
+        starpu_task_submit(taskA)
+        starpu_task_submit(taskB)
+        starpu_task_submit(taskC)
+
+        starpu_task_wait_for_all()
+    end
+
+    if value[] != 52
+        error("Incorrect value $(value[]) (expected 52)")
+    end
+
+    println("Value = ", value[])
+end
+
+starpu_init()
+main()
+starpu_shutdown()

+ 18 - 0
julia/examples/dependency/task_dep.sh

@@ -0,0 +1,18 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh dependency/task_dep.jl

+ 47 - 0
julia/examples/execute.sh.in

@@ -0,0 +1,47 @@
+#!@REALBASH@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+set -x
+export JULIA_LOAD_PATH=@STARPU_SRC_DIR@/julia:$JULIA_LOAD_PATH
+export STARPU_BUILD_DIR=@STARPU_BUILD_DIR@
+export STARPU_SRC_DIR=@STARPU_SRC_DIR@
+export STARPU_JULIA_LIB=@STARPU_BUILD_DIR@/julia/src/.libs/libstarpujulia-1.3.so
+export STARPU_JULIA_BUILD=@STARPU_BUILD_DIR@/julia
+export JULIA_NUM_THREADS=8
+srcdir=@STARPU_SRC_DIR@/julia/examples
+
+if test "$1" == "-calllib"
+then
+    shift
+    pwd
+    rm -f extern_tasks.so
+    make -f @STARPU_BUILD_DIR@/julia/src/dynamic_compiler/Makefile extern_tasks.so SOURCES_CPU=$srcdir/$1
+    shift
+    export JULIA_TASK_LIB=$PWD/extern_tasks.so
+fi
+
+srcfile=$1
+if test ! -f $srcdir/$srcfile
+then
+    echo "Error. File $srcdir/$srcfile not found"
+    exit 1
+fi
+shift
+#cd $srcdir/$(dirname $srcfile)
+#@JULIA@ $(basename $srcfile) $*
+@JULIA@ $srcdir/$srcfile $*
+

+ 130 - 0
julia/examples/gemm/gemm.jl

@@ -0,0 +1,130 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using StarPU
+
+@target STARPU_CPU+STARPU_CUDA
+@codelet function gemm(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32) :: Nothing
+
+    M :: Int32 = height(A)
+    N :: Int32 = width(B)
+    K :: Int32 = width(A)
+    lda :: Int32 = height(A)
+    ldb :: Int32 = height(B)
+    ldc :: Int32 = height(C)
+    STARPU_SGEMM("N", "N", M, N, K, alpha, A, lda, B, ldb, beta, C, ldc)
+
+    return
+end
+
+function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32, nslicesx, nslicesy)
+    scale= 3
+    tmin=0
+    vert = starpu_data_filter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
+    @starpu_block let
+        hA,hB,hC = starpu_data_register(A, B, C)
+        starpu_data_partition(hB, vert)
+        starpu_data_partition(hA, horiz)
+        starpu_data_map_filters(hC, vert, horiz)
+        tmin=0
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
+            symbol = "history_perf"
+        )
+        cl = starpu_codelet(
+            cpu_func = CPU_CODELETS["gemm"],
+            cuda_func = CUDA_CODELETS["gemm"],
+            modes = [STARPU_R, STARPU_R, STARPU_RW],
+            perfmodel = perfmodel
+        )
+
+        for i in (1 : 10 )
+            t=time_ns()
+            @starpu_sync_tasks begin
+                for taskx in (1 : nslicesx)
+                    for tasky in (1 : nslicesy)
+                        handles = [hA[tasky], hB[taskx], hC[taskx, tasky]]
+                        task = starpu_task(cl = cl, handles = handles, cl_arg=(alpha, beta))
+                        starpu_task_submit(task)
+                        #@starpu_async_cl matrix_mult(hA[tasky], hB[taskx], hC[taskx, tasky])
+                    end
+                end
+            end
+            t=time_ns()-t
+            if (tmin==0 || tmin>t)
+                tmin=t
+            end
+        end
+    end
+    return tmin
+end
+
+
+function approximately_equals(
+    A :: Matrix{Cfloat},
+    B :: Matrix{Cfloat},
+    eps = 1e-2
+)
+    (height, width) = size(A)
+
+    for j in (1 : width)
+        for i in (1 : height)
+            if (abs(A[i,j] - B[i,j]) > eps * max(abs(B[i,j]), abs(A[i,j])))
+                println("A[$i,$j] : $(A[i,j]), B[$i,$j] : $(B[i,j])")
+                return false
+            end
+        end
+    end
+
+    return true
+end
+
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        starpu_memory_pin(A)
+        starpu_memory_pin(B)
+        starpu_memory_pin(C)
+        alpha = 4.0f0
+        beta = 2.0f0
+        mt =  multiply_with_starpu(A, B, C, alpha, beta, nslicesx, nslicesy)
+        gflop = 2 * dim * dim * dim * 1.e-9
+        gflops = gflop / (mt * 1.e-9)
+        size=dim*dim*dim*4*3/1024/1024
+        println(io,"$dim $gflops")
+        println("$dim $gflops")
+        starpu_memory_unpin(A)
+        starpu_memory_unpin(B)
+        starpu_memory_unpin(C)
+    end
+end
+
+if size(ARGS, 1) < 1
+    filename="x.dat"
+else
+    filename=ARGS[1]
+end
+
+starpu_init()
+
+io=open(filename,"w")
+compute_times(io,64,512,4096,2,2)
+close(io)
+
+starpu_shutdown()
+

+ 21 - 0
julia/examples/gemm/gemm.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh gemm/gemm.jl
+$(dirname $0)/../execute.sh gemm/gemm_native.jl
+
+

+ 56 - 0
julia/examples/gemm/gemm_native.jl

@@ -0,0 +1,56 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+using LinearAlgebra.BLAS
+
+function gemm_without_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, alpha :: Float32, beta :: Float32)
+    tmin = 0
+    for i in (1 : 10 )
+        t=time_ns()
+        gemm!('N', 'N', alpha, A, B, beta, C)
+        t=time_ns() - t
+        if (tmin==0 || tmin>t)
+            tmin=t
+        end
+    end
+    return tmin
+end
+
+
+function compute_times(io,start_dim, step_dim, stop_dim)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        alpha = 4.0f0
+        beta = 2.0f0
+        mt =  gemm_without_starpu(A, B, C, alpha, beta)
+        gflop = 2 * dim * dim * dim * 1.e-9
+        gflops = gflop / (mt * 1.e-9)
+        size=dim*dim*dim*4*3/1024/1024
+        println(io,"$dim $gflops")
+        println("$dim $gflops")
+    end
+end
+
+if size(ARGS, 1) < 1
+    filename="x.dat"
+else
+    filename=ARGS[1]
+end
+io=open(filename,"w")
+compute_times(io,64,512,4096)
+close(io)
+

+ 5 - 7
julia/mandelbrot/Makefile

@@ -21,12 +21,10 @@ ifneq ($(ENABLE_CUDA),yes)
 	CUDA_OBJECTS:=
 	CUDA_OBJECTS:=
 endif
 endif
 
 
-LIBPATH=${PWD}/../StarPU.jl/lib
-
 all: ${EXTERNLIB}
 all: ${EXTERNLIB}
 
 
 mandelbrot: mandelbrot.c cpu_mandelbrot.o #gpu_mandelbrot.o
 mandelbrot: mandelbrot.c cpu_mandelbrot.o #gpu_mandelbrot.o
-	$(CC) $(CPU_CFLAGS) $^ -o $@ $(LDFLAGS)
+	$(CC) $(CPU_CFLAGS) $^ -o $@ $(LDFLAGS) -lm
 
 
 %.o: %.c
 %.o: %.c
 	$(CC) -c -fPIC $(CPU_CFLAGS) $^ -o $@
 	$(CC) -c -fPIC $(CPU_CFLAGS) $^ -o $@
@@ -47,12 +45,12 @@ clean:
 
 
 # Performance Tests
 # Performance Tests
 cstarpu.dat: mandelbrot
 cstarpu.dat: mandelbrot
-	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mandelbrot -0.800671 -0.158392 32 32 4096 4 > $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mandelbrot > $@
 julia_generatedc.dat:
 julia_generatedc.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl $@
 julia_native.dat:
 julia_native.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot_native.jl $@
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot_native.jl $@
 julia_calllib.dat: ${EXTERNLIB}
 julia_calllib.dat: ${EXTERNLIB}
-	LD_LIBRARY_PATH+=${LIBPATH} JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl julia_calllib.dat
+	JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mandelbrot.jl julia_calllib.dat
 
 
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat

+ 79 - 0
julia/examples/mandelbrot/cpu_mandelbrot.c

@@ -0,0 +1,79 @@
+/* StarPU --- Runtime system for heterogeneous multicore architectures.
+ *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ *
+ * StarPU is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ *
+ * StarPU is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * See the GNU Lesser General Public License in COPYING.LGPL for more details.
+ */
+#include <stdio.h>
+#include <starpu.h>
+#include <math.h>
+#include "cpu_mandelbrot.h"
+
+void cpu_mandelbrot(void *descr[], void *cl_arg)
+{
+        long long *pixels;
+
+        pixels = (long long int *)STARPU_MATRIX_GET_PTR(descr[0]);
+        struct params *params = (struct params *) cl_arg;
+
+        long width = STARPU_MATRIX_GET_NY(descr[0]);
+        long height = STARPU_MATRIX_GET_NX(descr[0]);
+        double zoom = width * 0.25296875;
+        double iz = 1. / zoom;
+        float diverge = 4.0;
+        float max_iterations = (width/2) * 0.049715909 * log10(zoom);
+        float imi = 1. / max_iterations;
+        double centerr = params->centerr;
+        double centeri = params->centeri;
+        long offset = params->offset;
+        long dim = params->dim;
+        double cr = 0;
+        double zr = 0;
+        double ci = 0;
+        double zi = 0;
+        long n = 0;
+        double tmp = 0;
+        int ldP = STARPU_MATRIX_GET_LD(descr[0]);
+
+        long long x,y;
+
+        for (y = 0; y < height; y++)
+	{
+                for (x = 0; x < width; x++)
+		{
+                        cr = centerr + (x - (dim/2)) * iz;
+			zr = cr;
+                        ci = centeri + (y+offset - (dim/2)) * iz;
+                        zi = ci;
+
+                        for (n = 0; n <= max_iterations; n++)
+			{
+				if (zr*zr + zi*zi>diverge) break;
+                                tmp = zr*zr - zi*zi + cr;
+                                zi = 2*zr*zi + ci;
+                                zr = tmp;
+                        }
+			if (n<max_iterations)
+				pixels[y +x*ldP] = round(15.*n*imi);
+			else
+				pixels[y +x*ldP] = 0;
+		}
+	}
+}
+
+char* CPU = "cpu_mandelbrot";
+char* GPU = "gpu_mandelbrot";
+extern char *starpu_find_function(char *name, char *device)
+{
+	if (!strcmp(device,"gpu")) return GPU;
+	return CPU;
+}

+ 8 - 10
julia/StarPU.jl/src/jlstarpu_simple_functions.c

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -13,14 +13,12 @@
  *
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
  */
-#include "jlstarpu.h"
-
-int jlstarpu_init(void)
+struct params
 {
 {
-	return starpu_init(NULL);
-}
+        double centerr;
+        double centeri;
+        long offset;
+        long dim;
+};
+
 
 
-void jlstarpu_set_to_zero(void * ptr, unsigned int size)
-{
-	memset(ptr, 0, size);
-}

+ 78 - 56
julia/mandelbrot/mandelbrot.c

@@ -1,5 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  * Copyright (C) 2019       Mael Keryell
  * Copyright (C) 2019       Mael Keryell
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
@@ -13,36 +14,35 @@
  *
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
  */
+
 #include <stdio.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdlib.h>
 #include <starpu.h>
 #include <starpu.h>
+#include "cpu_mandelbrot.h"
 
 
 void cpu_mandelbrot(void **, void *);
 void cpu_mandelbrot(void **, void *);
 void gpu_mandelbrot(void **, void *);
 void gpu_mandelbrot(void **, void *);
 
 
 static struct starpu_perfmodel model =
 static struct starpu_perfmodel model =
 {
 {
-		.type = STARPU_HISTORY_BASED,
-		.symbol = "history_perf"
+	.type = STARPU_HISTORY_BASED,
+	.symbol = "history_perf"
 };
 };
 
 
 static struct starpu_codelet cl =
 static struct starpu_codelet cl =
 {
 {
-	.cpu_funcs = {cpu_mandelbrot},
+ 	.cpu_funcs = {cpu_mandelbrot},
 	//.cuda_funcs = {gpu_mandelbrot},
 	//.cuda_funcs = {gpu_mandelbrot},
-	.nbuffers = 2,
-	.modes = {STARPU_W, STARPU_R},
+	.nbuffers = 1,
+	.modes = {STARPU_W},
 	.model = &model
 	.model = &model
 };
 };
 
 
-
-void mandelbrot_with_starpu(long long *pixels, float *params, long long dim, long long nslicesx)
+void mandelbrot_with_starpu(long long *pixels, struct params *p, long long dim, long long nslicesx)
 {
 {
 	starpu_data_handle_t pixels_handle;
 	starpu_data_handle_t pixels_handle;
-	starpu_data_handle_t params_handle;
 
 
 	starpu_matrix_data_register(&pixels_handle, STARPU_MAIN_RAM, (uintptr_t)pixels, dim, dim, dim, sizeof(long long));
 	starpu_matrix_data_register(&pixels_handle, STARPU_MAIN_RAM, (uintptr_t)pixels, dim, dim, dim, sizeof(long long));
-	starpu_matrix_data_register(&params_handle, STARPU_MAIN_RAM, (uintptr_t)params, 4*nslicesx, 4*nslicesx, 1, sizeof(float));
 
 
 	struct starpu_data_filter horiz =
 	struct starpu_data_filter horiz =
 	{
 	{
@@ -51,90 +51,95 @@ void mandelbrot_with_starpu(long long *pixels, float *params, long long dim, lon
 	};
 	};
 
 
 	starpu_data_partition(pixels_handle, &horiz);
 	starpu_data_partition(pixels_handle, &horiz);
-	starpu_data_partition(params_handle, &horiz);
 
 
 	long long taskx;
 	long long taskx;
 
 
-	for (taskx = 0; taskx < nslicesx; taskx++){
+	for (taskx = 0; taskx < nslicesx; taskx++)
+	{
 		struct starpu_task *task = starpu_task_create();
 		struct starpu_task *task = starpu_task_create();
 
 
 		task->cl = &cl;
 		task->cl = &cl;
 		task->handles[0] = starpu_data_get_child(pixels_handle, taskx);
 		task->handles[0] = starpu_data_get_child(pixels_handle, taskx);
-		task->handles[1] = starpu_data_get_child(params_handle, taskx);
+		task->cl_arg = p;
+		task->cl_arg_size = sizeof(*p);
 		if (starpu_task_submit(task)!=0) fprintf(stderr,"submit task error\n");
 		if (starpu_task_submit(task)!=0) fprintf(stderr,"submit task error\n");
 	}
 	}
 
 
 	starpu_task_wait_for_all();
 	starpu_task_wait_for_all();
 
 
 	starpu_data_unpartition(pixels_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(pixels_handle, STARPU_MAIN_RAM);
-	starpu_data_unpartition(params_handle, STARPU_MAIN_RAM);
-
 	starpu_data_unregister(pixels_handle);
 	starpu_data_unregister(pixels_handle);
-	starpu_data_unregister(params_handle);
 }
 }
 
 
 void pixels2img(long long *pixels, long long width, long long height, const char *filename)
 void pixels2img(long long *pixels, long long width, long long height, const char *filename)
 {
 {
-  FILE *fp = fopen(filename, "w");
-  if (!fp)
-    return;
+	FILE *fp = fopen(filename, "w");
+	if (!fp)
+		return;
 
 
-  int MAPPING[16][3] = {{66,30,15},{25,7,26},{9,1,47},{4,4,73},{0,7,100},{12,44,138},{24,82,177},{57,125,209},{134,181,229},{211,236,248},{241,233,191},{248,201,95},{255,170,0},{204,128,0},{153,87,0},{106,52,3}};
+	int MAPPING[16][3] = {{66,30,15},{25,7,26},{9,1,47},{4,4,73},{0,7,100},{12,44,138},{24,82,177},{57,125,209},{134,181,229},{211,236,248},{241,233,191},{248,201,95},{255,170,0},{204,128,0},{153,87,0},{106,52,3}};
 
 
-  fprintf(fp, "P3\n%lld %lld\n255\n", width, height);
-  long long i, j;
-  for (i = 0; i < height; ++i) {
-    for (j = 0; j < width; ++j) {
-      fprintf(fp, "%d %d %d ", MAPPING[pixels[j*width+i]][0], MAPPING[pixels[j*width+i]][1], MAPPING[pixels[j*width+i]][2]);
-    }
-  }
+	fprintf(fp, "P3\n%lld %lld\n255\n", width, height);
+	long long i, j;
+	for (i = 0; i < height; ++i)
+	{
+		for (j = 0; j < width; ++j)
+		{
+			fprintf(fp, "%d %d %d ", MAPPING[pixels[j*width+i]][0], MAPPING[pixels[j*width+i]][1], MAPPING[pixels[j*width+i]][2]);
+		}
+	}
 
 
-  fclose(fp);
+	fclose(fp);
 }
 }
 
 
-double min_times(double cr, double ci, long long dim, long long nslices)
+double min_times(double cr, double ci, long long dim, long long nslices, int gen_images)
 {
 {
 	long long *pixels = calloc(dim*dim, sizeof(long long));
 	long long *pixels = calloc(dim*dim, sizeof(long long));
-	float *params = calloc(4*nslices, sizeof(float));
+	struct params *p = calloc(nslices, sizeof(struct params));
 
 
 	double t_min = 0;
 	double t_min = 0;
 	long long i;
 	long long i;
 
 
-	for (i=0; i<nslices; i++) {
-		params[4*i+0] = cr;
-		params[4*i+1] = ci;
-		params[4*i+2] = i*dim/nslices;
-		params[4*i+3] = dim;
+	for (i=0; i<nslices; i++)
+	{
+		p[i].centerr = cr;
+		p[i].centeri = ci;
+		p[i].offset = i*dim/nslices;
+		p[i].dim = dim;
 	}
 	}
 
 
 	double start, stop, exec_t;
 	double start, stop, exec_t;
-	for (i = 0; i < 10; i++){
+	for (i = 0; i < 10; i++)
+	{
 		start = starpu_timing_now(); // starpu_timing_now() gives the time in microseconds.
 		start = starpu_timing_now(); // starpu_timing_now() gives the time in microseconds.
-		mandelbrot_with_starpu(pixels, params, dim, nslices);
+		mandelbrot_with_starpu(pixels, &p[i], dim, nslices);
 		stop = starpu_timing_now();
 		stop = starpu_timing_now();
 		exec_t = (stop-start)*1.e3;
 		exec_t = (stop-start)*1.e3;
 		if (t_min==0 || t_min>exec_t)
 		if (t_min==0 || t_min>exec_t)
 		  t_min = exec_t;
 		  t_min = exec_t;
 	}
 	}
 
 
-	char filename[64];
-	snprintf(filename, 64, "out%lld.ppm", dim);
-	pixels2img(pixels,dim,dim,filename);
+	if (gen_images == 1)
+	{
+		char filename[64];
+		snprintf(filename, 64, "out%lld.ppm", dim);
+		pixels2img(pixels,dim,dim,filename);
+	}
 
 
 	free(pixels);
 	free(pixels);
-	free(params);
+	free(p);
 
 
 	return t_min;
 	return t_min;
 }
 }
 
 
-void display_times(double cr, double ci, long long start_dim, long long step_dim, long long stop_dim, long long nslices)
+void display_times(double cr, double ci, long long start_dim, long long step_dim, long long stop_dim, long long nslices, int gen_images)
 {
 {
-
 	long long dim;
 	long long dim;
 
 
-	for (dim = start_dim; dim <= stop_dim; dim += step_dim) {
+	for (dim = start_dim; dim <= stop_dim; dim += step_dim)
+	{
 		printf("Dimension: %lld...\n", dim);
 		printf("Dimension: %lld...\n", dim);
-		double res = min_times(cr, ci, dim, nslices);
+		double res = min_times(cr, ci, dim, nslices, gen_images);
 		res = res / dim / dim; // time per pixel
 		res = res / dim / dim; // time per pixel
 		printf("%lld %lf\n", dim, res);
 		printf("%lld %lf\n", dim, res);
 	}
 	}
@@ -142,23 +147,40 @@ void display_times(double cr, double ci, long long start_dim, long long step_dim
 
 
 int main(int argc, char **argv)
 int main(int argc, char **argv)
 {
 {
-	if (argc != 7){
-		printf("Usage: %s cr ci start_dim step_dim stop_dim nslices(must divide dims)\n", argv[0]);
-		return 1;
+	double cr, ci;
+	long long start_dim, step_dim, stop_dim, nslices;
+	int gen_images;
+
+	if (argc != 8)
+	{
+		printf("Usage: %s cr ci start_dim step_dim stop_dim nslices(must divide dims) gen_images. Using default parameters\n", argv[0]);
+
+		cr = -0.800671;
+		ci = -0.158392;
+		start_dim = 32;
+		step_dim = 32;
+		stop_dim = 512;
+		nslices = 4;
+		gen_images = 0;
 	}
 	}
-	if (starpu_init(NULL) != EXIT_SUCCESS){
+	else
+	{
+		cr = (float) atof(argv[1]);
+		ci = (float) atof(argv[2]);
+		start_dim = atoll(argv[3]);
+		step_dim = atoll(argv[4]);
+		stop_dim = atoll(argv[5]);
+		nslices = atoll(argv[6]);
+		gen_images = atoi(argv[7]);
+	}
+
+	if (starpu_init(NULL) != EXIT_SUCCESS)
+	{
 		fprintf(stderr, "ERROR\n");
 		fprintf(stderr, "ERROR\n");
 		return 77;
 		return 77;
 	}
 	}
 
 
-	double cr = (float) atof(argv[1]);
-	double ci = (float) atof(argv[2]);
-	long long start_dim = atoll(argv[3]);
-	long long step_dim = atoll(argv[4]);
-	long long stop_dim = atoll(argv[5]);
-	long long nslices = atoll(argv[6]);
-
-	display_times(cr, ci, start_dim, step_dim, stop_dim, nslices);
+	display_times(cr, ci, start_dim, step_dim, stop_dim, nslices, gen_images);
 
 
 	starpu_shutdown();
 	starpu_shutdown();
 
 

+ 26 - 12
julia/mandelbrot/mandelbrot.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 import Libdl
 using StarPU
 using StarPU
 using LinearAlgebra
 using LinearAlgebra
@@ -34,7 +49,7 @@ using LinearAlgebra
                 zi = 2*zr*zi + ci
                 zi = 2*zr*zi + ci
                 zr = tmp
                 zr = tmp
             end
             end
-            
+
             if (n < max_iterations)
             if (n < max_iterations)
                 pixels[y,x] = round(15 * n * imi)
                 pixels[y,x] = round(15 * n * imi)
             else
             else
@@ -46,17 +61,16 @@ using LinearAlgebra
     return
     return
 end
 end
 
 
-@debugprint "starpu_init"
 starpu_init()
 starpu_init()
 
 
 function mandelbrot_with_starpu(A ::Matrix{Int64}, cr ::Float64, ci ::Float64, dim ::Int64, nslicesx ::Int64)
 function mandelbrot_with_starpu(A ::Matrix{Int64}, cr ::Float64, ci ::Float64, dim ::Int64, nslicesx ::Int64)
-    horiz = StarpuDataFilter(STARPU_MATRIX_FILTER_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesx)
     @starpu_block let
     @starpu_block let
 	hA = starpu_data_register(A)
 	hA = starpu_data_register(A)
 	starpu_data_partition(hA,horiz)
 	starpu_data_partition(hA,horiz)
 
 
 	@starpu_sync_tasks for taskx in (1 : nslicesx)
 	@starpu_sync_tasks for taskx in (1 : nslicesx)
-                @starpu_async_cl mandelbrot(hA[taskx]) [STARPU_W] [cr, ci, (taskx-1)*dim/nslicesx, dim]
+                @starpu_async_cl mandelbrot(hA[taskx]) [STARPU_W] (cr, ci, Int64((taskx-1)*dim/nslicesx), dim)
 	end
 	end
     end
     end
 end
 end
@@ -74,9 +88,9 @@ function pixels2img(pixels ::Matrix{Int64}, width ::Int64, height ::Int64, filen
     end
     end
 end
 end
 
 
-function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
+function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64, gen_images)
     tmin=0;
     tmin=0;
-    
+
     pixels ::Matrix{Int64} = zeros(dim, dim)
     pixels ::Matrix{Int64} = zeros(dim, dim)
     for i = 1:10
     for i = 1:10
         t = time_ns();
         t = time_ns();
@@ -86,21 +100,21 @@ function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
             tmin=t
             tmin=t
         end
         end
     end
     end
-    pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    if (gen_images == 1)
+        pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    end
     return tmin
     return tmin
 end
 end
 
 
-function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64)
+function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64, gen_images)
     for dim in (start_dim : step_dim : stop_dim)
     for dim in (start_dim : step_dim : stop_dim)
-        res = min_times(cr, ci, dim, nslices)
+        res = min_times(cr, ci, dim, nslices, gen_images)
         res=res/dim/dim; # time per pixel
         res=res/dim/dim; # time per pixel
         println("$(dim) $(res)")
         println("$(dim) $(res)")
     end
     end
 end
 end
 
 
 
 
-display_time(-0.800671,-0.158392,32,32,4096,4)
+display_time(-0.800671,-0.158392,32,32,512,4, 0)
 
 
-@debugprint "starpu_shutdown"
 starpu_shutdown()
 starpu_shutdown()
-

+ 21 - 0
julia/examples/mandelbrot/mandelbrot.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh mandelbrot/mandelbrot.jl
+$(dirname $0)/../execute.sh mandelbrot/mandelbrot_native.jl
+$(dirname $0)/../execute.sh -calllib mandelbrot/cpu_mandelbrot.c mandelbrot/mandelbrot.jl
+

+ 22 - 5
julia/mandelbrot/mandelbrot_native.jl

@@ -1,3 +1,18 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 using LinearAlgebra
 using LinearAlgebra
 
 
 function mandelbrot(pixels, centerr ::Float64, centeri ::Float64, offset ::Int64, dim ::Int64) :: Nothing
 function mandelbrot(pixels, centerr ::Float64, centeri ::Float64, offset ::Int64, dim ::Int64) :: Nothing
@@ -68,7 +83,7 @@ function pixels2img(pixels ::Matrix{Int64}, width ::Int64, height ::Int64, filen
     end
     end
 end
 end
 
 
-function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
+function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64, gen_images)
     tmin=0;
     tmin=0;
 
 
     pixels ::Matrix{Int64} = zeros(dim, dim)
     pixels ::Matrix{Int64} = zeros(dim, dim)
@@ -80,17 +95,19 @@ function min_times(cr ::Float64, ci ::Float64, dim ::Int64, nslices ::Int64)
             tmin=t
             tmin=t
         end
         end
     end
     end
-    pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    if (gen_images == 1)
+        pixels2img(pixels,dim,dim,"out$(dim).ppm")
+    end
     return tmin
     return tmin
 end
 end
 
 
-function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64)
+function display_time(cr ::Float64, ci ::Float64, start_dim ::Int64, step_dim ::Int64, stop_dim ::Int64, nslices ::Int64, gen_images)
     for dim in (start_dim : step_dim : stop_dim)
     for dim in (start_dim : step_dim : stop_dim)
-        res = min_times(cr, ci, dim, nslices)
+        res = min_times(cr, ci, dim, nslices, gen_images)
         res=res/dim/dim; # time per pixel
         res=res/dim/dim; # time per pixel
         println("$(dim) $(res)")
         println("$(dim) $(res)")
     end
     end
 end
 end
 
 
 
 
-display_time(-0.800671,-0.158392,32,32,4096,4)
+display_time(-0.800671,-0.158392,32,32,512,4, 0)

+ 11 - 15
julia/mult/Makefile

@@ -1,9 +1,6 @@
-# tile size. Should be changed in mult.jl as well
-STRIDE=72
-
 # ICC compiler
 # ICC compiler
 #CC =icc
 #CC =icc
-#CFLAGS=-restrict -unroll4 -ipo -falign-loops=256 -O3 -DSTRIDE=${STRIDE} -march=native $(shell pkg-config --cflags starpu-1.3)
+#CFLAGS=-restrict -unroll4 -ipo -falign-loops=256 -O3 -march=native $(shell pkg-config --cflags starpu-1.3)
 # GCC compiler
 # GCC compiler
 CC=gcc
 CC=gcc
 NVCC=nvcc
 NVCC=nvcc
@@ -14,7 +11,7 @@ ifeq ($(ENABLE_CUDA),yes)
         LD := ${NVCC}
         LD := ${NVCC}
 endif
 endif
 
 
-CFLAGS = -O3 -g -DSTRIDE=${STRIDE} $(shell pkg-config --cflags starpu-1.3)
+CFLAGS = -O3 -g $(shell pkg-config --cflags starpu-1.3)
 CPU_CFLAGS = ${CFLAGS} -Wall -mavx -fomit-frame-pointer -march=native -ffast-math
 CPU_CFLAGS = ${CFLAGS} -Wall -mavx -fomit-frame-pointer -march=native -ffast-math
 CUDA_CFLAGS = ${CFLAGS}
 CUDA_CFLAGS = ${CFLAGS}
 LDFLAGS +=$(shell pkg-config --libs starpu-1.3)
 LDFLAGS +=$(shell pkg-config --libs starpu-1.3)
@@ -28,9 +25,6 @@ ifneq ($(ENABLE_CUDA),yes)
 	CUDA_OBJECTS:=
 	CUDA_OBJECTS:=
 endif
 endif
 
 
-
-LIBPATH=${PWD}/../StarPU.jl/lib
-
 all: ${EXTERNLIB}
 all: ${EXTERNLIB}
 
 
 mult: mult.c cpu_mult.o #gpu_mult.o
 mult: mult.c cpu_mult.o #gpu_mult.o
@@ -53,14 +47,16 @@ ${GENERATEDLIB}: $(C_OBJECTS) $(CUDA_OBJECTS)
 clean:
 clean:
 	rm -f mult *.so *.o genc_*.c gencuda_*.cu *.dat
 	rm -f mult *.so *.o genc_*.c gencuda_*.cu *.dat
 
 
+tjulia: julia_generatedc.dat
 # Performance Tests
 # Performance Tests
+STRIDE=72
 cstarpu.dat: mult
 cstarpu.dat: mult
-	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mult > $@
-julia_generatedc.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $@
-julia_native.dat:
-	LD_LIBRARY_PATH+=${LIBPATH} STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult_native.jl $@
-julia_calllib.dat: ${EXTERNLIB}
-	LD_LIBRARY_PATH+=${LIBPATH} JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl julia_calllib.dat
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 ./mult $(STRIDE) > $@
+julia_generatedc.dat: mult.jl
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $(STRIDE) $@
+julia_native.dat: mult_native.jl
+	STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult_native.jl $(STRIDE) $@
+julia_calllib.dat: ${EXTERNLIB} mult.jl
+	JULIA_TASK_LIB="${EXTERNLIB}" STARPU_NOPENCL=0 STARPU_SCHED=dmda STARPU_CALIBRATE=1 julia mult.jl $(STRIDE) julia_calllib.dat
 
 
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat
 test: cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat

julia/mult/README → julia/examples/mult/README


+ 24 - 13
julia/mult/cpu_mult.c

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2018       Alexis Juven
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -13,26 +14,30 @@
  *
  *
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  */
  */
+
 #include <stdint.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdio.h>
 #include <string.h>
 #include <string.h>
 #include <starpu.h>
 #include <starpu.h>
+
 /*
 /*
  * The codelet is passed 3 matrices, the "descr" union-type field gives a
  * The codelet is passed 3 matrices, the "descr" union-type field gives a
  * description of the layout of those 3 matrices in the local memory (ie. RAM
  * description of the layout of those 3 matrices in the local memory (ie. RAM
  * in the case of CPU, GPU frame buffer in the case of GPU etc.). Since we have
  * in the case of CPU, GPU frame buffer in the case of GPU etc.). Since we have
  * registered data with the "matrix" data interface, we use the matrix macros.
  * registered data with the "matrix" data interface, we use the matrix macros.
  */
  */
-void cpu_mult(void *descr[], void *arg)
+void cpu_mult(void *descr[], void *cl_arg)
 {
 {
-	(void)arg;
+	int stride;
 	float *subA, *subB, *subC;
 	float *subA, *subB, *subC;
+
+	stride = *((int *)cl_arg);
+
 	/* .blas.ptr gives a pointer to the first element of the local copy */
 	/* .blas.ptr gives a pointer to the first element of the local copy */
 	subA = (float *)STARPU_MATRIX_GET_PTR(descr[0]);
 	subA = (float *)STARPU_MATRIX_GET_PTR(descr[0]);
 	subB = (float *)STARPU_MATRIX_GET_PTR(descr[1]);
 	subB = (float *)STARPU_MATRIX_GET_PTR(descr[1]);
 	subC = (float *)STARPU_MATRIX_GET_PTR(descr[2]);
 	subC = (float *)STARPU_MATRIX_GET_PTR(descr[2]);
 
 
-
 	/* .blas.nx is the number of rows (consecutive elements) and .blas.ny
 	/* .blas.nx is the number of rows (consecutive elements) and .blas.ny
 	 * is the number of lines that are separated by .blas.ld elements (ld
 	 * is the number of lines that are separated by .blas.ld elements (ld
 	 * stands for leading dimension).
 	 * stands for leading dimension).
@@ -50,14 +55,18 @@ void cpu_mult(void *descr[], void *arg)
 	int i,j,k,ii,jj,kk;
 	int i,j,k,ii,jj,kk;
 	for (i = 0; i < nyC*nxC; i++) subC[i] = 0;
 	for (i = 0; i < nyC*nxC; i++) subC[i] = 0;
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d %d/%d on %d\n",nyC,nyA,nxC,starpu_worker_get_id(),STARPU_NMAXWORKERS,starpu_worker_get_devid(starpu_worker_get_id()));
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d %d/%d on %d\n",nyC,nyA,nxC,starpu_worker_get_id(),STARPU_NMAXWORKERS,starpu_worker_get_devid(starpu_worker_get_id()));
-	for (i=0;i<nyC;i+=STRIDE) {
-		for (k=0;k<nyA;k+=STRIDE) {
-			for (j=0;j<nxC;j+=STRIDE) {
-				
-				for (ii = i; ii < i+STRIDE; ii+=2) {
+	for (i=0;i<nyC;i+=stride)
+	{
+		for (k=0;k<nyA;k+=stride)
+		{
+			for (j=0;j<nxC;j+=stride)
+			{
+				for (ii = i; ii < i+stride; ii+=2)
+				{
 					float *sC0=subC+ii*ldC+j;
 					float *sC0=subC+ii*ldC+j;
 					float *sC1=subC+ii*ldC+ldC+j;
 					float *sC1=subC+ii*ldC+ldC+j;
-					for (kk = k; kk < k+STRIDE; kk+=4) {
+					for (kk = k; kk < k+stride; kk+=4)
+					{
 						float alpha00=subB[kk +  ii*ldB];
 						float alpha00=subB[kk +  ii*ldB];
 						float alpha01=subB[kk+1+ii*ldB];
 						float alpha01=subB[kk+1+ii*ldB];
 						float alpha10=subB[kk+  ii*ldB+ldB];
 						float alpha10=subB[kk+  ii*ldB+ldB];
@@ -70,7 +79,8 @@ void cpu_mult(void *descr[], void *arg)
 						float *sA1=subA+kk*ldA+ldA+j;
 						float *sA1=subA+kk*ldA+ldA+j;
 						float *sA2=subA+kk*ldA+2*ldA+j;
 						float *sA2=subA+kk*ldA+2*ldA+j;
 						float *sA3=subA+kk*ldA+3*ldA+j;
 						float *sA3=subA+kk*ldA+3*ldA+j;
-						for (jj = 0; jj < STRIDE; jj+=1) {
+						for (jj = 0; jj < stride; jj+=1)
+						{
 							sC0[jj] += alpha00*sA0[jj]+alpha01*sA1[jj]+alpha02*sA2[jj]+alpha03*sA3[jj];
 							sC0[jj] += alpha00*sA0[jj]+alpha01*sA1[jj]+alpha02*sA2[jj]+alpha03*sA3[jj];
 							sC1[jj] += alpha10*sA0[jj]+alpha11*sA1[jj]+alpha12*sA2[jj]+alpha13*sA3[jj];
 							sC1[jj] += alpha10*sA0[jj]+alpha11*sA1[jj]+alpha12*sA2[jj]+alpha13*sA3[jj];
 						}
 						}
@@ -80,11 +90,12 @@ void cpu_mult(void *descr[], void *arg)
 		}
 		}
 	}
 	}
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d\n",nyC,nyA,nxC);
 	//fprintf(stderr,"inside cpu_mult %dx%dx%d\n",nyC,nyA,nxC);
-
 }
 }
+
 char* CPU = "cpu_mult";
 char* CPU = "cpu_mult";
 char* GPU = "gpu_mult";
 char* GPU = "gpu_mult";
-extern char *starpu_find_function(char *name, char *device) {
+extern char *starpu_find_function(char *name, char *device)
+{
 	if (!strcmp(device,"gpu")) return GPU;
 	if (!strcmp(device,"gpu")) return GPU;
 	return CPU;
 	return CPU;
 }
 }

+ 2 - 1
julia/mult/gpu_mult.cu

@@ -1,6 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
- * Copyright (C) 2018                                     Alexis Juven
+ * Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+ * Copyright (C) 2018       Alexis Juven
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by

+ 50 - 59
julia/mult/mult.c

@@ -1,10 +1,7 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
- * Copyright (C) 2018                                     Alexis Juven
- * Copyright (C) 2012,2013                                Inria
- * Copyright (C) 2009-2011,2013-2015                      Université de Bordeaux
- * Copyright (C) 2010                                     Mehdi Juhoor
- * Copyright (C) 2010-2013,2015,2017                      CNRS
+ * Copyright (C) 2018       Alexis Juven
+ * Copyright (C) 2010-2020  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Lesser General Public License as published by
  * it under the terms of the GNU Lesser General Public License as published by
@@ -40,8 +37,6 @@
 
 
 #include <starpu.h>
 #include <starpu.h>
 
 
-
-
 /*
 /*
  * That program should compute C = A * B
  * That program should compute C = A * B
  *
  *
@@ -63,43 +58,32 @@
 
 
  */
  */
 
 
-
-
-
-
 //void gpu_mult(void **, void *);
 //void gpu_mult(void **, void *);
 void cpu_mult(void **, void *);
 void cpu_mult(void **, void *);
 
 
-
 static struct starpu_perfmodel model =
 static struct starpu_perfmodel model =
 {
 {
-		.type = STARPU_HISTORY_BASED,
-		.symbol = "history_perf"
+	.type = STARPU_HISTORY_BASED,
+	.symbol = "history_perf"
 };
 };
 
 
 static struct starpu_codelet cl =
 static struct starpu_codelet cl =
 {
 {
-		.cpu_funcs = {cpu_mult},
-		.cpu_funcs_name = {"cpu_mult"},
-		//.cuda_funcs = {gpu_mult},
-		.nbuffers = 3,
-		.modes = {STARPU_R, STARPU_R, STARPU_W},
-		.model = &model
+	.cpu_funcs = {cpu_mult},
+	.cpu_funcs_name = {"cpu_mult"},
+	//.cuda_funcs = {gpu_mult},
+	.nbuffers = 3,
+	.modes = {STARPU_R, STARPU_R, STARPU_W},
+	.model = &model
 };
 };
 
 
-
-void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigned ydim,  unsigned zdim, unsigned nslicesx, unsigned nslicesy)
+void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigned ydim,  unsigned zdim, unsigned nslicesx, unsigned nslicesy, int stride)
 {
 {
 	starpu_data_handle_t A_handle, B_handle, C_handle;
 	starpu_data_handle_t A_handle, B_handle, C_handle;
 
 
-
-	starpu_matrix_data_register(&A_handle, STARPU_MAIN_RAM, (uintptr_t)A,
-			ydim, ydim, zdim, sizeof(float));
-	starpu_matrix_data_register(&B_handle, STARPU_MAIN_RAM, (uintptr_t)B,
-			zdim, zdim, xdim, sizeof(float));
-	starpu_matrix_data_register(&C_handle, STARPU_MAIN_RAM, (uintptr_t)C,
-			ydim, ydim, xdim, sizeof(float));
-
+	starpu_matrix_data_register(&A_handle, STARPU_MAIN_RAM, (uintptr_t)A, ydim, ydim, zdim, sizeof(float));
+	starpu_matrix_data_register(&B_handle, STARPU_MAIN_RAM, (uintptr_t)B, zdim, zdim, xdim, sizeof(float));
+	starpu_matrix_data_register(&C_handle, STARPU_MAIN_RAM, (uintptr_t)C, ydim, ydim, xdim, sizeof(float));
 
 
 	struct starpu_data_filter vert =
 	struct starpu_data_filter vert =
 	{
 	{
@@ -113,31 +97,32 @@ void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigne
 			.nchildren = nslicesy
 			.nchildren = nslicesy
 	};
 	};
 
 
-
 	starpu_data_partition(B_handle, &vert);
 	starpu_data_partition(B_handle, &vert);
 	starpu_data_partition(A_handle, &horiz);
 	starpu_data_partition(A_handle, &horiz);
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
 	starpu_data_map_filters(C_handle, 2, &vert, &horiz);
 
 
 	unsigned taskx, tasky;
 	unsigned taskx, tasky;
 
 
-	for (taskx = 0; taskx < nslicesx; taskx++){
-		for (tasky = 0; tasky < nslicesy; tasky++){
-
+	for (taskx = 0; taskx < nslicesx; taskx++)
+	{
+		for (tasky = 0; tasky < nslicesy; tasky++)
+		{
 			struct starpu_task *task = starpu_task_create();
 			struct starpu_task *task = starpu_task_create();
 
 
 			task->cl = &cl;
 			task->cl = &cl;
 			task->handles[0] = starpu_data_get_sub_data(A_handle, 1, tasky);
 			task->handles[0] = starpu_data_get_sub_data(A_handle, 1, tasky);
 			task->handles[1] = starpu_data_get_sub_data(B_handle, 1, taskx);
 			task->handles[1] = starpu_data_get_sub_data(B_handle, 1, taskx);
 			task->handles[2] = starpu_data_get_sub_data(C_handle, 2, taskx, tasky);
 			task->handles[2] = starpu_data_get_sub_data(C_handle, 2, taskx, tasky);
+			task->cl_arg = &stride;
+			task->cl_arg_size = sizeof(stride);
 
 
-			if (starpu_task_submit(task)!=0) fprintf(stderr,"submit task error\n");
-
+			int ret = starpu_task_submit(task);
+			STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
 		}
 		}
 	}
 	}
 
 
 	starpu_task_wait_for_all();
 	starpu_task_wait_for_all();
 
 
-
 	starpu_data_unpartition(A_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(A_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(B_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(B_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(C_handle, STARPU_MAIN_RAM);
 	starpu_data_unpartition(C_handle, STARPU_MAIN_RAM);
@@ -145,31 +130,27 @@ void multiply_with_starpu(float *A, float *B, float *C,  unsigned xdim,  unsigne
 	starpu_data_unregister(A_handle);
 	starpu_data_unregister(A_handle);
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(B_handle);
 	starpu_data_unregister(C_handle);
 	starpu_data_unregister(C_handle);
-
 }
 }
 
 
-
-
 void init_rand(float * m, unsigned width, unsigned height)
 void init_rand(float * m, unsigned width, unsigned height)
 {
 {
 	unsigned i,j;
 	unsigned i,j;
 
 
-	for (j = 0 ; j < height ; j++){
-		for (i = 0 ; i < width ; i++){
+	for (j = 0 ; j < height ; j++)
+	{
+		for (i = 0 ; i < width ; i++)
+		{
 			m[j+i*height] = (float)(starpu_drand48());
 			m[j+i*height] = (float)(starpu_drand48());
 		}
 		}
 	}
 	}
 }
 }
 
 
-
 void init_zero(float * m, unsigned width, unsigned height)
 void init_zero(float * m, unsigned width, unsigned height)
 {
 {
 	memset(m, 0, sizeof(float) * width * height);
 	memset(m, 0, sizeof(float) * width * height);
 }
 }
 
 
-
-
-double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, unsigned nsclicesx, unsigned nsclicesy)
+double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, unsigned nsclicesx, unsigned nsclicesy, int stride)
 {
 {
 	unsigned i;
 	unsigned i;
 
 
@@ -179,8 +160,8 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 
 
 	double exec_times=-1;
 	double exec_times=-1;
 
 
-	for (i = 0 ; i < nb_test ; i++){
-
+	for (i = 0 ; i < nb_test ; i++)
+	{
 		double start, stop, exec_t;
 		double start, stop, exec_t;
 
 
 		init_rand(A, zdim, ydim);
 		init_rand(A, zdim, ydim);
@@ -188,7 +169,7 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 		init_zero(C, xdim, ydim);
 		init_zero(C, xdim, ydim);
 
 
 		start = starpu_timing_now();
 		start = starpu_timing_now();
-		multiply_with_starpu(A, B, C, xdim, ydim, zdim, nsclicesx, nsclicesy);
+		multiply_with_starpu(A, B, C, xdim, ydim, zdim, nsclicesx, nsclicesy, stride);
 		stop = starpu_timing_now();
 		stop = starpu_timing_now();
 
 
 		exec_t = (stop - start)*1.e3; // Put in ns instead of us
 		exec_t = (stop - start)*1.e3; // Put in ns instead of us
@@ -201,34 +182,44 @@ double min_time(unsigned nb_test, unsigned xdim, unsigned ydim, unsigned zdim, u
 	return exec_times;
 	return exec_times;
 }
 }
 
 
-
-void display_times(unsigned start_dim, unsigned step_dim, unsigned stop_dim, unsigned nb_tests, unsigned nsclicesx, unsigned nsclicesy)
+void display_times(unsigned start_dim, unsigned step_dim, unsigned stop_dim, unsigned nb_tests, unsigned nsclicesx, unsigned nsclicesy, int stride)
 {
 {
 	unsigned dim;
 	unsigned dim;
 
 
-	for (dim = start_dim ; dim <= stop_dim ; dim += step_dim){
-		double t = min_time(nb_tests, dim, dim, dim, nsclicesx, nsclicesy);
+	for (dim = start_dim ; dim <= stop_dim ; dim += step_dim)
+	{
+		double t = min_time(nb_tests, dim, dim, dim, nsclicesx, nsclicesy, stride);
 		printf("%f %f\n", dim*dim*4.*3./1024./1024, (2.*dim-1.)*dim*dim/t);
 		printf("%f %f\n", dim*dim*4.*3./1024./1024, (2.*dim-1.)*dim*dim/t);
 	}
 	}
-
 }
 }
 
 
+#define STRIDE_DEFAULT 8
 
 
 int main(int argc, char * argv[])
 int main(int argc, char * argv[])
 {
 {
-	if (starpu_init(NULL) != EXIT_SUCCESS){
+	int stride=STRIDE_DEFAULT;
+	if (argc >= 2)
+		stride = atoi(argv[1]);
+	if (stride % 4 != 0)
+	{
+		fprintf(stderr, "STRIDE must be a multiple of 4 (%d)\n", stride);
+		return -1;
+	}
+
+	if (starpu_init(NULL) != EXIT_SUCCESS)
+	{
 		fprintf(stderr, "ERROR\n");
 		fprintf(stderr, "ERROR\n");
 		return 77;
 		return 77;
 	}
 	}
 
 
-	unsigned start_dim = 16*STRIDE;
-	unsigned step_dim = 4*STRIDE;
-	unsigned stop_dim = 4096;
+	unsigned start_dim = 16*stride;
+	unsigned step_dim = 4*stride;
+	unsigned stop_dim = 128*stride;
 	unsigned nb_tests = 10;
 	unsigned nb_tests = 10;
 	unsigned nsclicesx = 2;
 	unsigned nsclicesx = 2;
 	unsigned nsclicesy = 2;
 	unsigned nsclicesy = 2;
 
 
-	display_times(start_dim, step_dim, stop_dim, nb_tests, nsclicesx, nsclicesy);
+	display_times(start_dim, step_dim, stop_dim, nb_tests, nsclicesx, nsclicesy, stride);
 
 
 	starpu_shutdown();
 	starpu_shutdown();
 
 

+ 35 - 18
julia/mult/mult.jl

@@ -1,12 +1,24 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
 import Libdl
 import Libdl
 using StarPU
 using StarPU
 using LinearAlgebra
 using LinearAlgebra
 
 
-#shoud be the same as in the makefile
-const STRIDE = 72
-
 @target STARPU_CPU+STARPU_CUDA
 @target STARPU_CPU+STARPU_CUDA
-@codelet function matrix_mult(m1 :: Matrix{Float32}, m2 :: Matrix{Float32}, m3 :: Matrix{Float32}) :: Nothing
+@codelet function matrix_mult(m1 :: Matrix{Float32}, m2 :: Matrix{Float32}, m3 :: Matrix{Float32}, stride ::Int32) :: Nothing
 
 
     width_m2 :: Int32 = width(m2)
     width_m2 :: Int32 = width(m2)
     height_m1 :: Int32 = height(m1)
     height_m1 :: Int32 = height(m1)
@@ -57,25 +69,24 @@ const STRIDE = 72
 end
 end
 
 
 
 
-@debugprint "starpu_init"
 starpu_init()
 starpu_init()
 
 
-function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy)
+function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy, stride)
     scale= 3
     scale= 3
     tmin=0
     tmin=0
-    vert = StarpuDataFilter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
-    horiz = StarpuDataFilter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
+    vert = starpu_data_filter(STARPU_MATRIX_FILTER_VERTICAL_BLOCK, nslicesx)
+    horiz = starpu_data_filter(STARPU_MATRIX_FILTER_BLOCK, nslicesy)
     @starpu_block let
     @starpu_block let
         hA,hB,hC = starpu_data_register(A, B, C)
         hA,hB,hC = starpu_data_register(A, B, C)
         starpu_data_partition(hB, vert)
         starpu_data_partition(hB, vert)
         starpu_data_partition(hA, horiz)
         starpu_data_partition(hA, horiz)
         starpu_data_map_filters(hC, vert, horiz)
         starpu_data_map_filters(hC, vert, horiz)
         tmin=0
         tmin=0
-        perfmodel = StarpuPerfmodel(
-            perf_type = STARPU_HISTORY_BASED,
+        perfmodel = starpu_perfmodel(
+            perf_type = starpu_perfmodel_type(STARPU_HISTORY_BASED),
             symbol = "history_perf"
             symbol = "history_perf"
         )
         )
-        cl = StarpuCodelet(
+        cl = starpu_codelet(
             cpu_func = CPU_CODELETS["matrix_mult"],
             cpu_func = CPU_CODELETS["matrix_mult"],
             # cuda_func = CUDA_CODELETS["matrix_mult"],
             # cuda_func = CUDA_CODELETS["matrix_mult"],
             #opencl_func="ocl_matrix_mult",
             #opencl_func="ocl_matrix_mult",
@@ -89,7 +100,7 @@ function multiply_with_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: M
                 for taskx in (1 : nslicesx)
                 for taskx in (1 : nslicesx)
                     for tasky in (1 : nslicesy)
                     for tasky in (1 : nslicesy)
                         handles = [hA[tasky], hB[taskx], hC[taskx, tasky]]
                         handles = [hA[tasky], hB[taskx], hC[taskx, tasky]]
-                        task = StarpuTask(cl = cl, handles = handles)
+                        task = starpu_task(cl = cl, handles = handles, cl_arg=(Int32(stride),))
                         starpu_task_submit(task)
                         starpu_task_submit(task)
                         #@starpu_async_cl matrix_mult(hA[tasky], hB[taskx], hC[taskx, tasky])
                         #@starpu_async_cl matrix_mult(hA[tasky], hB[taskx], hC[taskx, tasky])
                     end
                     end
@@ -124,12 +135,12 @@ function approximately_equals(
     return true
     return true
 end
 end
 
 
-function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy, stride)
     for dim in (start_dim : step_dim : stop_dim)
     for dim in (start_dim : step_dim : stop_dim)
         A = Array(rand(Cfloat, dim, dim))
         A = Array(rand(Cfloat, dim, dim))
         B = Array(rand(Cfloat, dim, dim))
         B = Array(rand(Cfloat, dim, dim))
         C = zeros(Float32, dim, dim)
         C = zeros(Float32, dim, dim)
-        mt =  multiply_with_starpu(A, B, C, nslicesx, nslicesy)
+        mt =  multiply_with_starpu(A, B, C, nslicesx, nslicesy, stride)
         flops = (2*dim-1)*dim*dim/mt
         flops = (2*dim-1)*dim*dim/mt
         size=dim*dim*4*3/1024/1024
         size=dim*dim*4*3/1024/1024
         println(io,"$size $flops")
         println(io,"$size $flops")
@@ -137,10 +148,16 @@ function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy)
     end
     end
 end
 end
 
 
-
-io=open(ARGS[1],"w")
-compute_times(io,16*STRIDE,4*STRIDE,4096,2,2)
+if size(ARGS, 1) < 2
+    stride=4
+    filename="x.dat"
+else
+    stride=parse(Int, ARGS[1])
+    filename=ARGS[2]
+end
+io=open(filename,"w")
+compute_times(io,16*stride,4*stride,128*stride,2,2,stride)
 close(io)
 close(io)
-@debugprint "starpu_shutdown"
+
 starpu_shutdown()
 starpu_shutdown()
 
 

julia/mult/mult.plot → julia/examples/mult/mult.plot


+ 57 - 0
julia/examples/mult/mult_native.jl

@@ -0,0 +1,57 @@
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+import Libdl
+using StarPU
+using LinearAlgebra
+
+function multiply_without_starpu(A :: Matrix{Float32}, B :: Matrix{Float32}, C :: Matrix{Float32}, nslicesx, nslicesy, stride)
+    tmin = 0
+    for i in (1 : 10 )
+        t=time_ns()
+        C = A * B;
+        t=time_ns() - t
+        if (tmin==0 || tmin>t)
+            tmin=t
+        end
+    end
+    return tmin
+end
+
+
+function compute_times(io,start_dim, step_dim, stop_dim, nslicesx, nslicesy, stride)
+    for dim in (start_dim : step_dim : stop_dim)
+        A = Array(rand(Cfloat, dim, dim))
+        B = Array(rand(Cfloat, dim, dim))
+        C = zeros(Float32, dim, dim)
+        mt =  multiply_without_starpu(A, B, C, nslicesx, nslicesy, stride)
+        flops = (2*dim-1)*dim*dim/mt
+        size=dim*dim*4*3/1024/1024
+        println(io,"$size $flops")
+        println("$size $flops")
+    end
+end
+
+if size(ARGS, 1) < 2
+    stride=4
+    filename="x.dat"
+else
+    stride=parse(Int, ARGS[1])
+    filename=ARGS[2]
+end
+io=open(filename,"w")
+compute_times(io,16*stride,4*stride,128*stride,2,2,stride)
+close(io)
+

+ 22 - 0
julia/examples/mult/mult_starpu.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+$(dirname $0)/../execute.sh mult/mult.jl
+$(dirname $0)/../execute.sh mult/mult_native.jl
+$(dirname $0)/../execute.sh -calllib mult/cpu_mult.c mult/mult.jl
+
+

+ 38 - 0
julia/examples/mult/perf.sh

@@ -0,0 +1,38 @@
+#!/bin/bash
+# StarPU --- Runtime system for heterogeneous multicore architectures.
+#
+# Copyright (C) 2020       Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
+#
+# StarPU is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation; either version 2.1 of the License, or (at
+# your option) any later version.
+#
+# StarPU is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See the GNU Lesser General Public License in COPYING.LGPL for more details.
+#
+
+stride=72
+#stride=4
+
+export STARPU_NOPENCL=0
+export STARPU_SCHED=dmda
+export STARPU_CALIBRATE=1
+
+rm -f ./cstarpu.dat julia_generatedc.dat julia_native.dat julia_calllib.dat
+
+$(dirname $0)/mult $stride > ./cstarpu.dat
+$(dirname $0)/../execute.sh mult/mult.jl $stride julia_generatedc.dat
+$(dirname $0)/../execute.sh mult/mult_native.jl $stride julia_native.dat
+$(dirname $0)/../execute.sh -calllib mult/cpu_mult.c mult/mult.jl $stride julia_calllib.dat
+
+(
+    cat <<EOF
+set output "comparison.pdf"
+set term pdf
+plot "julia_native.dat" w l,"cstarpu.dat" w l,"julia_generatedc.dat" w l,"julia_calllib.dat" w l
+EOF
+) | gnuplot

julia/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_cstarpu_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_gen_gcc9_1x4.dat → julia/examples/mult/res/mult_gen_gcc9_1x4.dat


julia/mult/res/mult_gen_gcc9_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_4x1.dat


julia/mult/res/mult_gen_gcc9_s100_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s100_4x1.dat


julia/mult/res/mult_gen_gcc9_s50_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s50_4x1.dat


julia/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_16x16_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_4x4_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_8x1_b4x2.dat


julia/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s64_8x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_16x18_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_16x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b4x4.dat


julia/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_2x2_b8x2.dat


julia/mult/res/mult_gen_gcc9_s72_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s72_4x1.dat


julia/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_4x4_b4x2.dat


julia/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat → julia/examples/mult/res/mult_gen_gcc9_s72_8x8_b4x2.dat


julia/mult/res/mult_gen_gcc9_s80_4x1.dat → julia/examples/mult/res/mult_gen_gcc9_s80_4x1.dat


julia/mult/res/mult_gen_icc_s72_2x1_b4x2.dat → julia/examples/mult/res/mult_gen_icc_s72_2x1_b4x2.dat


julia/mult/res/mult_gen_icc_s72_4x4_b4x2.dat → julia/examples/mult/res/mult_gen_icc_s72_4x4_b4x2.dat


julia/mult/res/mult_native.dat → julia/examples/mult/res/mult_native.dat


julia/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat → julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b2x2.dat


julia/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_gcc9_s72_2x2_b4x2.dat


julia/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_icc_s72-36_2x2_b4x2.dat


julia/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat → julia/examples/mult/res/mult_nogen_icc_s72_2x2_b4x2.dat


+ 0 - 0
julia/mult/res/mult_nogen_icc_s72x2_2x2_b4x2.dat


Algúns arquivos non se mostraron porque demasiados arquivos cambiaron neste cambio