Browse Source

Make _starpu_predict_transfer_time take into account that each activated GPU eats bandwidth.

Samuel Thibault 14 years ago
parent
commit
9b7c782713
2 changed files with 7 additions and 11 deletions
  1. 4 9
      doc/starpu.texi
  2. 3 2
      src/core/perfmodel/perfmodel_bus.c

+ 4 - 9
doc/starpu.texi

@@ -1358,15 +1358,10 @@ tries to minimize is @code{alpha * T_execution + beta * T_data_transfer}, where
 accurate), and @code{T_data_transfer} is the estimated data transfer time. The
 accurate), and @code{T_data_transfer} is the estimated data transfer time. The
 latter is however estimated based on bus calibration before execution start,
 latter is however estimated based on bus calibration before execution start,
 i.e. with an idle machine. You can force bus re-calibration by running
 i.e. with an idle machine. You can force bus re-calibration by running
-@code{starpu_calibrate_bus}. When StarPU manages several GPUs, such estimation
-is not accurate any more. Beta can then be used to correct this by hand. For
-instance, you can use @code{export STARPU_BETA=2} to double the transfer
-time estimation, e.g. because there are two GPUs in the machine. This is of
-course imprecise, but in practice, a rough estimation already gives the good
-results that a precise estimation would give.
-
-Measuring the actual data transfer time is however on our TODO-list to
-accurately estimate data transfer penalty without the need of a hand-tuned beta parameter.
+@code{starpu_calibrate_bus}. The beta parameter defaults to 1, but it can be
+worth trying to tweak it by using @code{export STARPU_BETA=2} for instance.
+This is of course imprecise, but in practice, a rough estimation already gives
+the good results that a precise estimation would give.
 
 
 @node Power-based scheduling
 @node Power-based scheduling
 @section Power-based scheduling
 @section Power-based scheduling

+ 3 - 2
src/core/perfmodel/perfmodel_bus.c

@@ -1,6 +1,6 @@
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
 /* StarPU --- Runtime system for heterogeneous multicore architectures.
  *
  *
- * Copyright (C) 2009, 2010  Université de Bordeaux 1
+ * Copyright (C) 2009, 2010-2011  Université de Bordeaux 1
  * Copyright (C) 2010  Centre National de la Recherche Scientifique
  * Copyright (C) 2010  Centre National de la Recherche Scientifique
  *
  *
  * StarPU is free software; you can redistribute it and/or modify
  * StarPU is free software; you can redistribute it and/or modify
@@ -1028,6 +1028,7 @@ double _starpu_predict_transfer_time(unsigned src_node, unsigned dst_node, size_
 {
 {
 	double bandwidth = bandwidth_matrix[src_node][dst_node];
 	double bandwidth = bandwidth_matrix[src_node][dst_node];
 	double latency = latency_matrix[src_node][dst_node];
 	double latency = latency_matrix[src_node][dst_node];
+	struct starpu_machine_topology_s *topology = &_starpu_get_machine_config()->topology;
 
 
-	return latency + size/bandwidth;
+	return latency + (size/bandwidth)*2*(topology->ncudagpus+topology->nopenclgpus);
 }
 }