Browse Source

energy consumption can be measured on V100, or provided by cost models

(cherry picked from commit 0a8972d4b9f5d4f0586ec98fc0711b9837ea749d)
Samuel Thibault 6 years ago
parent
commit
5a9d6ecb4e
1 changed files with 26 additions and 12 deletions
  1. 26 12
      doc/doxygen/chapters/320_scheduling.doxy

+ 26 - 12
doc/doxygen/chapters/320_scheduling.doxy

@@ -166,20 +166,34 @@ be obtained from the machine power supplier.
 The energy actually consumed by the total execution can be displayed by setting
 The energy actually consumed by the total execution can be displayed by setting
 <c>export STARPU_PROFILING=1 STARPU_WORKER_STATS=1</c> .
 <c>export STARPU_PROFILING=1 STARPU_WORKER_STATS=1</c> .
 
 
-On-line task consumption measurement is currently only supported through the
+For OpenCL devices, on-line task consumption measurement is currently supported through the
 <c>CL_PROFILING_POWER_CONSUMED</c> OpenCL extension, implemented in the MoviSim
 <c>CL_PROFILING_POWER_CONSUMED</c> OpenCL extension, implemented in the MoviSim
-simulator. Applications can however provide explicit measurements by
+simulator.
-using the function starpu_perfmodel_update_history() (examplified in \ref PerformanceModelExample
+
-with the <c>energy_model</c> performance model). Fine-grain
+For CUDA devices, on-line task consumption measurement is supported on V100
-measurement is often not feasible with the feedback provided by the hardware, so
+cards and beyond. This however only works for quite long tasks, since the
-the user can for instance run a given task a thousand times, measure the global
+measurement granularity is about 10ms.
+
+Applications can however provide explicit measurements by using the function
+starpu_perfmodel_update_history() (examplified in \ref PerformanceModelExample
+with the <c>energy_model</c> performance model). Fine-grain measurement
+is often not feasible with the feedback provided by the hardware, so the
+user can for instance run a given task a thousand times, measure the global
 consumption for that series of tasks, divide it by a thousand, repeat for
 consumption for that series of tasks, divide it by a thousand, repeat for
-varying kinds of tasks and task sizes, and eventually feed StarPU
+varying kinds of tasks and task sizes, and eventually feed StarPU with these
-with these manual measurements through starpu_perfmodel_update_history().
+manual measurements through starpu_perfmodel_update_history().  For instance,
-For instance, for CUDA devices, <c>nvidia-smi -q -d POWER</c> can be used to get
+for CUDA devices, <c>nvidia-smi -q -d POWER</c> can be used to get the current
-the current consumption in Watt. Multiplying this value by the average duration
+consumption in Watt. Multiplying this value by the average duration of a
-of a single task gives the consumption of the task in Joules, which can be given
+single task gives the consumption of the task in Joules, which can be given to
-to starpu_perfmodel_update_history().
+starpu_perfmodel_update_history().
+
+Another way to provide the energy performance is to define a
+perfmodel with starpu_perfmodel::type ::STARPU_PER_ARCH, and set the
+starpu_perfmodel::arch_cost_function field to a function which shall return the
+estimated consumption of the task in Joules. Such a function can for instance
+use starpu_task_expected_length() on the task (in µs), multiplied by the
+typical power consumption of the device, e.g. in W, and divided by 1000000. to
+get Joules.
 
 
 \section ExistingModularizedSchedulers Modularized Schedulers
 \section ExistingModularizedSchedulers Modularized Schedulers