瀏覽代碼

More prominently print calibration issues. Add an environment variable for controling the calibration threshold

Samuel Thibault 11 年之前
父節點
當前提交
15531a027e
共有 2 個文件被更改,包括 17 次插入3 次删除
  1. 10 0
      doc/doxygen/chapters/40environment_variables.doxy
  2. 7 3
      src/core/perfmodel/perfmodel_history.c

+ 10 - 0
doc/doxygen/chapters/40environment_variables.doxy

@@ -602,6 +602,16 @@ functions, thus allowing to quickly check that the task scheme is working
 properly, without performing the actual application-provided computation.
 </dd>
 
+<dt>STARPU_HISTORY_MAX_ERROR</dt>
+<dd>
+\anchor STARPU_HISTORY_MAX_ERROR
+\addindex __env__STARPU_HISTORY_MAX_ERROR
+History-based performance models will drop measurements which are really far
+froom the measured average. This specifies the allowed variation. The default is
+10, i.e. the measurement is allowed to be 110% faster or 110% slower than the
+average.
+</dd>
+
 </dl>
 
 \section ConfiguringTheHypervisor Configuring The Hypervisor

+ 7 - 3
src/core/perfmodel/perfmodel_history.c

@@ -1322,11 +1322,15 @@ void _starpu_update_perfmodel_history(struct _starpu_job *j, struct starpu_perfm
 				/* There is already an entry with the same footprint */
 
 				double local_deviation = measured/entry->mean;
+				int historymaxerror = starpu_get_env_number_default("STARPU_HISTORY_MAX_ERROR", STARPU_HISTORYMAXERROR);
 				
 				if (entry->nsample &&
-					(100 * local_deviation > (100 + STARPU_HISTORYMAXERROR)
-					 || (100 / local_deviation > (100 + STARPU_HISTORYMAXERROR))))
+					(100 * local_deviation > (100 + historymaxerror)
+					 || (100 / local_deviation > (100 + historymaxerror))))
 				{
+					/* TODO: add aging, otherwise with
+					 * millions of tasks we're sure to
+					 * flush at least once... */
 					entry->nerror++;
 
 					/* Too many errors: we flush out all the entries */
@@ -1338,7 +1342,7 @@ void _starpu_update_perfmodel_history(struct _starpu_job *j, struct starpu_perfm
 						entry->nerror = 0;
 						entry->mean = 0.0;
 						entry->deviation = 0.0;
-						_STARPU_DEBUG("Too many errors for model %s\n", model->symbol);
+						_STARPU_DISP("Too big deviation for model %s: %f vs average %f (%+f%%), flushing the performance model. Use the STARPU_HISTORY_MAX_ERROR environement variable to control the threshold (currently %d%%)\n", model->symbol, measured, entry->mean, measured * 100. / entry->mean - 100, historymaxerror);
 					}
 				}
 				else