Browse Source

document STARPU_WORKER_STATS and STARPU_BUS_STATS

Samuel Thibault 9 years ago
parent
commit
a014963b6d

+ 2 - 0
doc/doxygen/chapters/07data_management.doxy

@@ -516,6 +516,8 @@ codelet is needed).
 
 
 \subsection ScratchData Scratch Data
 \subsection ScratchData Scratch Data
 
 
+TOOD: dire qu'on enregistre une seule fois, et comme ça alloué une fois par worker seulement
+
 Some kernels sometimes need temporary data to achieve the computations, i.e. a
 Some kernels sometimes need temporary data to achieve the computations, i.e. a
 workspace. The application could allocate it at the start of the codelet
 workspace. The application could allocate it at the start of the codelet
 function, and free it at the end, but that would be costly. It could also
 function, and free it at the end, but that would be costly. It could also

+ 51 - 2
doc/doxygen/chapters/12online_performance_tools.doxy

@@ -82,6 +82,36 @@ wasted in pure StarPU overhead.
 Calling starpu_profiling_worker_get_info() resets the profiling
 Calling starpu_profiling_worker_get_info() resets the profiling
 information associated to a worker.
 information associated to a worker.
 
 
+To easily display all this information, the environment variable \ref
+STARPU_WORKER_STATS can be set to 1 (in addition to setting \ref
+STARPU_PROFILING to 1). A summary will then be displayed at program termination:
+
+\verbatim
+Worker stats:
+CUDA 0.0 (4.7 GiB)      
+	480 task(s)
+	total: 1574.82 ms executing: 1510.72 ms sleeping: 0.00 ms overhead 64.10 ms
+	325.217970 GFlop/s
+
+CPU 0                           
+	22 task(s)
+	total: 1574.82 ms executing: 1364.81 ms sleeping: 0.00 ms overhead 210.01 ms
+	7.512057 GFlop/s
+
+CPU 1                           
+	14 task(s)
+	total: 1574.82 ms executing: 1500.13 ms sleeping: 0.00 ms overhead 74.69 ms
+	6.675853 GFlop/s
+
+CPU 2                           
+	14 task(s)
+	total: 1574.82 ms executing: 1553.12 ms sleeping: 0.00 ms overhead 21.70 ms
+	7.152886 GFlop/s
+\endverbatim
+
+The number of GFlops is available because the starpu_task::flops field of the
+tasks were filled.
+
 When an FxT trace is generated (see \ref GeneratingTracesWithFxT), it is also
 When an FxT trace is generated (see \ref GeneratingTracesWithFxT), it is also
 possible to use the tool <c>starpu_workers_activity</c> (see \ref
 possible to use the tool <c>starpu_workers_activity</c> (see \ref
 MonitoringActivity) to generate a graphic showing the evolution of
 MonitoringActivity) to generate a graphic showing the evolution of
@@ -89,8 +119,6 @@ these values during the time, for the different workers.
 
 
 \subsection Bus-relatedFeedback Bus-related Feedback
 \subsection Bus-relatedFeedback Bus-related Feedback
 
 
-TODO: ajouter \ref STARPU_BUS_STATS
-
 // how to enable/disable performance monitoring
 // how to enable/disable performance monitoring
 // what kind of information do we get ?
 // what kind of information do we get ?
 
 
@@ -110,6 +138,27 @@ CUDA 1  4523.718152     2414.078822     0.000000        2417.375119
 CUDA 2  4534.229519     2417.069025     2417.060863     0.000000
 CUDA 2  4534.229519     2417.069025     2417.060863     0.000000
 \endverbatim
 \endverbatim
 
 
+Statistics about the data transfers which were performed and temporal average
+of bandwidth usage can be obtained by setting the environment variable \ref
+STARPU_BUS_STATS to 1; a summary will then be displayed at program termination:
+
+\verbatim
+Data transfer stats:
+	RAM 0 -> CUDA 0	319.92 MB	213.10 MB/s	(transfers : 91 - avg 3.52 MB)
+	CUDA 0 -> RAM 0	214.45 MB	142.85 MB/s	(transfers : 61 - avg 3.52 MB)
+	RAM 0 -> CUDA 1	302.34 MB	201.39 MB/s	(transfers : 86 - avg 3.52 MB)
+	CUDA 1 -> RAM 0	133.59 MB	88.99 MB/s	(transfers : 38 - avg 3.52 MB)
+	CUDA 0 -> CUDA 1	144.14 MB	96.01 MB/s	(transfers : 41 - avg 3.52 MB)
+	CUDA 1 -> CUDA 0	130.08 MB	86.64 MB/s	(transfers : 37 - avg 3.52 MB)
+	RAM 0 -> CUDA 2	312.89 MB	208.42 MB/s	(transfers : 89 - avg 3.52 MB)
+	CUDA 2 -> RAM 0	133.59 MB	88.99 MB/s	(transfers : 38 - avg 3.52 MB)
+	CUDA 0 -> CUDA 2	151.17 MB	100.69 MB/s	(transfers : 43 - avg 3.52 MB)
+	CUDA 2 -> CUDA 0	105.47 MB	70.25 MB/s	(transfers : 30 - avg 3.52 MB)
+	CUDA 1 -> CUDA 2	175.78 MB	117.09 MB/s	(transfers : 50 - avg 3.52 MB)
+	CUDA 2 -> CUDA 1	203.91 MB	135.82 MB/s	(transfers : 58 - avg 3.52 MB)
+Total transfers: 2.27 GB
+\endverbatim
+
 \subsection StarPU-TopInterface StarPU-Top Interface
 \subsection StarPU-TopInterface StarPU-Top Interface
 
 
 StarPU-Top is an interface which remotely displays the on-line state of a StarPU
 StarPU-Top is an interface which remotely displays the on-line state of a StarPU