| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622 | 
							- @c -*-texinfo-*-
 
- @c This file is part of the StarPU Handbook.
 
- @c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 
- @c Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique
 
- @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
 
- @c See the file starpu.texi for copying conditions.
 
- @menu
 
- * Task debugger::               Using the Temanejo task debugger
 
- * On-line::                     On-line performance feedback
 
- * Off-line::                    Off-line performance feedback
 
- * Codelet performance::         Performance of codelets
 
- * Theoretical lower bound on execution time API::
 
- * Memory feedback::
 
- * Data statistics::
 
- @end menu
 
- @node Task debugger
 
- @section Using the Temanejo task debugger
 
- StarPU can connect to Temanejo (see
 
- @url{http://www.hlrs.de/temanejo}), to permit
 
- nice visual task debugging. To do so, build Temanejo's @code{libayudame.so},
 
- install @code{Ayudame.h} to e.g. @code{/usr/local/include}, apply the
 
- @code{tools/patch-ayudame} to it to fix C build, re-@code{./configure}, make
 
- sure that it found it, rebuild StarPU.  Run the Temanejo GUI, give it the path
 
- to your application, any options you want to pass it, the path to libayudame.so.
 
- Make sure to specify at least the same number of CPUs in the dialog box as your
 
- machine has, otherwise an error will happen during execution. Future versions
 
- of Temanejo should be able to tell StarPU the number of CPUs to use.
 
- Tag numbers have to be below @code{4000000000000000000ULL} to be usable for
 
- Temanejo (so as to distinguish them from tasks).
 
- @node On-line
 
- @section On-line performance feedback
 
- @menu
 
- * Enabling on-line performance monitoring::
 
- * Task feedback::               Per-task feedback
 
- * Codelet feedback::            Per-codelet feedback
 
- * Worker feedback::             Per-worker feedback
 
- * Bus feedback::                Bus-related feedback
 
- * StarPU-Top::                  StarPU-Top interface
 
- @end menu
 
- @node Enabling on-line performance monitoring
 
- @subsection Enabling on-line performance monitoring
 
- In order to enable online performance monitoring, the application can call
 
- @code{starpu_profiling_status_set(STARPU_PROFILING_ENABLE)}. It is possible to
 
- detect whether monitoring is already enabled or not by calling
 
- @code{starpu_profiling_status_get()}. Enabling monitoring also reinitialize all
 
- previously collected feedback. The @code{STARPU_PROFILING} environment variable
 
- can also be set to 1 to achieve the same effect.
 
- Likewise, performance monitoring is stopped by calling
 
- @code{starpu_profiling_status_set(STARPU_PROFILING_DISABLE)}. Note that this
 
- does not reset the performance counters so that the application may consult
 
- them later on.
 
- More details about the performance monitoring API are available in section
 
- @ref{Profiling API}.
 
- @node Task feedback
 
- @subsection Per-task feedback
 
- If profiling is enabled, a pointer to a @code{starpu_task_profiling_info}
 
- structure is put in the @code{.profiling_info} field of the @code{starpu_task}
 
- structure when a task terminates.
 
- This structure is automatically destroyed when the task structure is destroyed,
 
- either automatically or by calling @code{starpu_task_destroy}.
 
- The @code{starpu_task_profiling_info} structure indicates the date when the
 
- task was submitted (@code{submit_time}), started (@code{start_time}), and
 
- terminated (@code{end_time}), relative to the initialization of
 
- StarPU with @code{starpu_init}. It also specifies the identifier of the worker
 
- that has executed the task (@code{workerid}).
 
- These date are stored as @code{timespec} structures which the user may convert
 
- into micro-seconds using the @code{starpu_timing_timespec_to_us} helper
 
- function.
 
- It it worth noting that the application may directly access this structure from
 
- the callback executed at the end of the task. The @code{starpu_task} structure
 
- associated to the callback currently being executed is indeed accessible with
 
- the @code{starpu_task_get_current()} function.
 
- @node Codelet feedback
 
- @subsection Per-codelet feedback
 
- The @code{per_worker_stats} field of the @code{struct starpu_codelet} structure is
 
- an array of counters. The i-th entry of the array is incremented every time a
 
- task implementing the codelet is executed on the i-th worker.
 
- This array is not reinitialized when profiling is enabled or disabled.
 
- @node Worker feedback
 
- @subsection Per-worker feedback
 
- The second argument returned by the @code{starpu_worker_get_profiling_info}
 
- function is a @code{starpu_worker_profiling_info} structure that gives
 
- statistics about the specified worker. This structure specifies when StarPU
 
- started collecting profiling information for that worker (@code{start_time}),
 
- the duration of the profiling measurement interval (@code{total_time}), the
 
- time spent executing kernels (@code{executing_time}), the time spent sleeping
 
- because there is no task to execute at all (@code{sleeping_time}), and the
 
- number of tasks that were executed while profiling was enabled.
 
- These values give an estimation of the proportion of time spent do real work,
 
- and the time spent either sleeping because there are not enough executable
 
- tasks or simply wasted in pure StarPU overhead.
 
- Calling @code{starpu_worker_get_profiling_info} resets the profiling
 
- information associated to a worker.
 
- When an FxT trace is generated (see @ref{Generating traces}), it is also
 
- possible to use the @code{starpu_workers_activity} script (described in @ref{starpu-workers-activity}) to
 
- generate a graphic showing the evolution of these values during the time, for
 
- the different workers.
 
- @node Bus feedback
 
- @subsection Bus-related feedback
 
- TODO: ajouter STARPU_BUS_STATS
 
- @c how to enable/disable performance monitoring
 
- @c what kind of information do we get ?
 
- The bus speed measured by StarPU can be displayed by using the
 
- @code{starpu_machine_display} tool, for instance:
 
- @example
 
- StarPU has found:
 
-         3 CUDA devices
 
-                 CUDA 0 (Tesla C2050 02:00.0)
 
-                 CUDA 1 (Tesla C2050 03:00.0)
 
-                 CUDA 2 (Tesla C2050 84:00.0)
 
- from    to RAM          to CUDA 0       to CUDA 1       to CUDA 2
 
- RAM     0.000000        5176.530428     5176.492994     5191.710722
 
- CUDA 0  4523.732446     0.000000        2414.074751     2417.379201
 
- CUDA 1  4523.718152     2414.078822     0.000000        2417.375119
 
- CUDA 2  4534.229519     2417.069025     2417.060863     0.000000
 
- @end example
 
- @node StarPU-Top
 
- @subsection StarPU-Top interface
 
- StarPU-Top is an interface which remotely displays the on-line state of a StarPU
 
- application and permits the user to change parameters on the fly.
 
- Variables to be monitored can be registered by calling the
 
- @code{starpu_top_add_data_boolean}, @code{starpu_top_add_data_integer},
 
- @code{starpu_top_add_data_float} functions, e.g.:
 
- @cartouche
 
- @smallexample
 
- starpu_top_data *data = starpu_top_add_data_integer("mynum", 0, 100, 1);
 
- @end smallexample
 
- @end cartouche
 
- The application should then call @code{starpu_top_init_and_wait} to give its name
 
- and wait for StarPU-Top to get a start request from the user. The name is used
 
- by StarPU-Top to quickly reload a previously-saved layout of parameter display.
 
- @cartouche
 
- @smallexample
 
- starpu_top_init_and_wait("the application");
 
- @end smallexample
 
- @end cartouche
 
- The new values can then be provided thanks to
 
- @code{starpu_top_update_data_boolean}, @code{starpu_top_update_data_integer},
 
- @code{starpu_top_update_data_float}, e.g.:
 
- @cartouche
 
- @smallexample
 
- starpu_top_update_data_integer(data, mynum);
 
- @end smallexample
 
- @end cartouche
 
- Updateable parameters can be registered thanks to @code{starpu_top_register_parameter_boolean}, @code{starpu_top_register_parameter_integer}, @code{starpu_top_register_parameter_float}, e.g.:
 
- @cartouche
 
- @smallexample
 
- float alpha;
 
- starpu_top_register_parameter_float("alpha", &alpha, 0, 10, modif_hook);
 
- @end smallexample
 
- @end cartouche
 
- @code{modif_hook} is a function which will be called when the parameter is being modified, it can for instance print the new value:
 
- @cartouche
 
- @smallexample
 
- void modif_hook(struct starpu_top_param *d) @{
 
-     fprintf(stderr,"%s has been modified: %f\n", d->name, alpha);
 
- @}
 
- @end smallexample
 
- @end cartouche
 
- Task schedulers should notify StarPU-Top when it has decided when a task will be
 
- scheduled, so that it can show it in its Gantt chart, for instance:
 
- @cartouche
 
- @smallexample
 
- starpu_top_task_prevision(task, workerid, begin, end);
 
- @end smallexample
 
- @end cartouche
 
- Starting StarPU-Top@footnote{StarPU-Top is started via the binary
 
- @code{starpu_top}.} and the application can be done two ways:
 
- @itemize
 
- @item The application is started by hand on some machine (and thus already
 
- waiting for the start event). In the Preference dialog of StarPU-Top, the SSH
 
- checkbox should be unchecked, and the hostname and port (default is 2011) on
 
- which the application is already running should be specified. Clicking on the
 
- connection button will thus connect to the already-running application.
 
- @item StarPU-Top is started first, and clicking on the connection button will
 
- start the application itself (possibly on a remote machine). The SSH checkbox
 
- should be checked, and a command line provided, e.g.:
 
- @example
 
- $ ssh myserver STARPU_SCHED=dmda ./application
 
- @end example
 
- If port 2011 of the remote machine can not be accessed directly, an ssh port bridge should be added:
 
- @example
 
- $ ssh -L 2011:localhost:2011 myserver STARPU_SCHED=dmda ./application
 
- @end example
 
- and "localhost" should be used as IP Address to connect to.
 
- @end itemize
 
- @node Off-line
 
- @section Off-line performance feedback
 
- @menu
 
- * Generating traces::           Generating traces with FxT
 
- * Gantt diagram::               Creating a Gantt Diagram
 
- * DAG::                         Creating a DAG with graphviz
 
- * starpu-workers-activity::     Monitoring activity
 
- @end menu
 
- @node Generating traces
 
- @subsection Generating traces with FxT
 
- StarPU can use the FxT library (see
 
- @url{https://savannah.nongnu.org/projects/fkt/}) to generate traces
 
- with a limited runtime overhead.
 
- You can either get a tarball:
 
- @example
 
- $ wget http://download.savannah.gnu.org/releases/fkt/fxt-0.2.11.tar.gz
 
- @end example
 
- or use the FxT library from CVS (autotools are required):
 
- @example
 
- $ cvs -d :pserver:anonymous@@cvs.sv.gnu.org:/sources/fkt co FxT
 
- $ ./bootstrap
 
- @end example
 
- Compiling and installing the FxT library in the @code{$FXTDIR} path is
 
- done following the standard procedure:
 
- @example
 
- $ ./configure --prefix=$FXTDIR
 
- $ make
 
- $ make install
 
- @end example
 
- In order to have StarPU to generate traces, StarPU should be configured with
 
- the @code{--with-fxt} option:
 
- @example
 
- $ ./configure --with-fxt=$FXTDIR
 
- @end example
 
- Or you can simply point the @code{PKG_CONFIG_PATH} to
 
- @code{$FXTDIR/lib/pkgconfig} and pass @code{--with-fxt} to @code{./configure}
 
- When FxT is enabled, a trace is generated when StarPU is terminated by calling
 
- @code{starpu_shutdown()}). The trace is a binary file whose name has the form
 
- @code{prof_file_XXX_YYY} where @code{XXX} is the user name, and
 
- @code{YYY} is the pid of the process that used StarPU. This file is saved in the
 
- @code{/tmp/} directory by default, or by the directory specified by
 
- the @code{STARPU_FXT_PREFIX} environment variable.
 
- @node Gantt diagram
 
- @subsection Creating a Gantt Diagram
 
- When the FxT trace file @code{filename} has been generated, it is possible to
 
- generate a trace in the Paje format by calling:
 
- @example
 
- $ starpu_fxt_tool -i filename
 
- @end example
 
- Or alternatively, setting the @code{STARPU_GENERATE_TRACE} environment variable
 
- to @code{1} before application execution will make StarPU do it automatically at
 
- application shutdown.
 
- This will create a @code{paje.trace} file in the current directory that
 
- can be inspected with the @url{http://vite.gforge.inria.fr/, ViTE trace
 
- visualizing open-source tool}.  It is possible to open the
 
- @code{paje.trace} file with ViTE by using the following command:
 
- @example
 
- $ vite paje.trace
 
- @end example
 
- To get names of tasks instead of "unknown", fill the optional @code{name} field
 
- of the codelets, or use a performance model for them.
 
- In the MPI execution case, collect the trace files from the MPI nodes, and
 
- specify them all on the @code{starpu_fxt_tool} command, for instance:
 
- @smallexample
 
- $ starpu_fxt_tool -i filename1 -i filename2
 
- @end smallexample
 
- By default, all tasks are displayed using a green color. To display tasks with
 
- varying colors, pass option @code{-c} to @code{starpu_fxt_tool}.
 
- @node DAG
 
- @subsection Creating a DAG with graphviz
 
- When the FxT trace file @code{filename} has been generated, it is possible to
 
- generate a task graph in the DOT format by calling:
 
- @example
 
- $ starpu_fxt_tool -i filename
 
- @end example
 
- This will create a @code{dag.dot} file in the current directory. This file is a
 
- task graph described using the DOT language. It is possible to get a
 
- graphical output of the graph by using the graphviz library:
 
- @example
 
- $ dot -Tpdf dag.dot -o output.pdf
 
- @end example
 
- @node starpu-workers-activity
 
- @subsection Monitoring activity
 
- When the FxT trace file @code{filename} has been generated, it is possible to
 
- generate an activity trace by calling:
 
- @example
 
- $ starpu_fxt_tool -i filename
 
- @end example
 
- This will create an @code{activity.data} file in the current
 
- directory. A profile of the application showing the activity of StarPU
 
- during the execution of the program can be generated:
 
- @example
 
- $ starpu_workers_activity activity.data
 
- @end example
 
- This will create a file named @code{activity.eps} in the current directory.
 
- This picture is composed of two parts.
 
- The first part shows the activity of the different workers. The green sections
 
- indicate which proportion of the time was spent executed kernels on the
 
- processing unit. The red sections indicate the proportion of time spent in
 
- StartPU: an important overhead may indicate that the granularity may be too
 
- low, and that bigger tasks may be appropriate to use the processing unit more
 
- efficiently. The black sections indicate that the processing unit was blocked
 
- because there was no task to process: this may indicate a lack of parallelism
 
- which may be alleviated by creating more tasks when it is possible.
 
- The second part of the @code{activity.eps} picture is a graph showing the
 
- evolution of the number of tasks available in the system during the execution.
 
- Ready tasks are shown in black, and tasks that are submitted but not
 
- schedulable yet are shown in grey.
 
- @node Codelet performance
 
- @section Performance of codelets
 
- The performance model of codelets (described in @ref{Performance model example}) can be examined by using the
 
- @code{starpu_perfmodel_display} tool:
 
- @example
 
- $ starpu_perfmodel_display -l
 
- file: <malloc_pinned.hannibal>
 
- file: <starpu_slu_lu_model_21.hannibal>
 
- file: <starpu_slu_lu_model_11.hannibal>
 
- file: <starpu_slu_lu_model_22.hannibal>
 
- file: <starpu_slu_lu_model_12.hannibal>
 
- @end example
 
- Here, the codelets of the lu example are available. We can examine the
 
- performance of the 22 kernel (in micro-seconds), which is history-based:
 
- @example
 
- $ starpu_perfmodel_display -s starpu_slu_lu_model_22
 
- performance model for cpu
 
- # hash      size       mean          dev           n
 
- 57618ab0    19660800   2.851069e+05  1.829369e+04  109
 
- performance model for cuda_0
 
- # hash      size       mean          dev           n
 
- 57618ab0    19660800   1.164144e+04  1.556094e+01  315
 
- performance model for cuda_1
 
- # hash      size       mean          dev           n
 
- 57618ab0    19660800   1.164271e+04  1.330628e+01  360
 
- performance model for cuda_2
 
- # hash      size       mean          dev           n
 
- 57618ab0    19660800   1.166730e+04  3.390395e+02  456
 
- @end example
 
- We can see that for the given size, over a sample of a few hundreds of
 
- execution, the GPUs are about 20 times faster than the CPUs (numbers are in
 
- us). The standard deviation is extremely low for the GPUs, and less than 10% for
 
- CPUs.
 
- This tool can also be used for regression-based performance models. It will then
 
- display the regression formula, and in the case of non-linear regression, the
 
- same performance log as for history-based performance models:
 
- @example
 
- $ starpu_perfmodel_display -s non_linear_memset_regression_based.type
 
- performance model for cpu_impl_0
 
- 	Regression : #sample = 1400
 
- 	Linear: y = alpha size ^ beta
 
- 		alpha = 1.335973e-03
 
- 		beta = 8.024020e-01
 
- 	Non-Linear: y = a size ^b + c
 
- 		a = 5.429195e-04
 
- 		b = 8.654899e-01
 
- 		c = 9.009313e-01
 
- # hash		size		mean		stddev		n
 
- a3d3725e	4096           	4.763200e+00   	7.650928e-01   	100
 
- 870a30aa	8192           	1.827970e+00   	2.037181e-01   	100
 
- 48e988e9	16384          	2.652800e+00   	1.876459e-01   	100
 
- 961e65d2	32768          	4.255530e+00   	3.518025e-01   	100
 
- ...
 
- @end example
 
- The @code{starpu_perfmodel_plot} tool can be used to draw performance models.
 
- It writes a @code{.gp} file in the current directory, to be run in the
 
- @code{gnuplot} tool, which shows the corresponding curve.
 
- The same can also be achieved by using StarPU's library API, see
 
- @ref{Performance Model API} and notably the @code{starpu_perfmodel_load_symbol}
 
- function. The source code of the @code{starpu_perfmodel_display} tool can be a
 
- useful example.
 
- When the FxT trace file @code{filename} has been generated, it is possible to
 
- get a profiling of each codelet by calling:
 
- @example
 
- $ starpu_fxt_tool -i filename
 
- $ starpu_codelet_profile distrib.data codelet_name
 
- @end example
 
- This will create profiling data files, and a @code{.gp} file in the current
 
- directory, which draws the distribution of codelet time over the application
 
- execution, according to data input size.
 
- This is also available in the @code{starpu_perfmodel_plot} tool, by passing it
 
- the fxt trace:
 
- @example
 
- $ starpu_perfmodel_display -s non_linear_memset_regression_based.type -i /tmp/prof_file_foo_0
 
- @end example
 
- It willd produce a @code{.gp} file which contains both the performance model
 
- curves, and the profiling measurements.
 
- If you have the R statistical tool installed, you can additionally use
 
- @example
 
- $ starpu_codelet_histo_profile distrib.data
 
- @end example
 
- Which will create one pdf file per codelet and per input size, showing a
 
- histogram of the codelet execution time distribution.
 
- @node Theoretical lower bound on execution time API
 
- @section Theoretical lower bound on execution time
 
- See @ref{Theoretical lower bound on execution time} for an example on how to use
 
- this API. It permits to record a trace of what tasks are needed to complete the
 
- application, and then, by using a linear system, provide a theoretical lower
 
- bound of the execution time (i.e. with an ideal scheduling).
 
- The computed bound is not really correct when not taking into account
 
- dependencies, but for an application which have enough parallelism, it is very
 
- near to the bound computed with dependencies enabled (which takes a huge lot
 
- more time to compute), and thus provides a good-enough estimation of the ideal
 
- execution time.
 
- @deftypefun void starpu_bound_start (int @var{deps}, int @var{prio})
 
- Start recording tasks (resets stats).  @var{deps} tells whether
 
- dependencies should be recorded too (this is quite expensive)
 
- @end deftypefun
 
- @deftypefun void starpu_bound_stop (void)
 
- Stop recording tasks
 
- @end deftypefun
 
- @deftypefun void starpu_bound_print_dot ({FILE *}@var{output})
 
- Print the DAG that was recorded
 
- @end deftypefun
 
- @deftypefun void starpu_bound_compute ({double *}@var{res}, {double *}@var{integer_res}, int @var{integer})
 
- Get theoretical upper bound (in ms) (needs glpk support detected by @code{configure} script). It returns 0 if some performance models are not calibrated.
 
- @end deftypefun
 
- @deftypefun void starpu_bound_print_lp ({FILE *}@var{output})
 
- Emit the Linear Programming system on @var{output} for the recorded tasks, in
 
- the lp format
 
- @end deftypefun
 
- @deftypefun void starpu_bound_print_mps ({FILE *}@var{output})
 
- Emit the Linear Programming system on @var{output} for the recorded tasks, in
 
- the mps format
 
- @end deftypefun
 
- @deftypefun void starpu_bound_print ({FILE *}@var{output}, int @var{integer})
 
- Emit statistics of actual execution vs theoretical upper bound. @var{integer}
 
- permits to choose between integer solving (which takes a long time but is
 
- correct), and relaxed solving (which provides an approximate solution).
 
- @end deftypefun
 
- @node Memory feedback
 
- @section Memory feedback
 
- It is possible to enable memory statistics. To do so, you need to pass the option
 
- @code{--enable-memory-stats} when running configure. It is then
 
- possible to call the function @code{starpu_display_memory_stats()} to
 
- display statistics about the current data handles registered within StarPU.
 
- Moreover, statistics will be displayed at the end of the execution on
 
- data handles which have not been cleared out. This can be disabled by
 
- setting the environment variable @code{STARPU_MEMORY_STATS} to 0.
 
- For example, if you do not unregister data at the end of the complex
 
- example, you will get something similar to:
 
- @example
 
- $ STARPU_MEMORY_STATS=0 ./examples/interface/complex
 
- Complex[0] = 45.00 + 12.00 i
 
- Complex[0] = 78.00 + 78.00 i
 
- Complex[0] = 45.00 + 12.00 i
 
- Complex[0] = 45.00 + 12.00 i
 
- @end example
 
- @example
 
- $ STARPU_MEMORY_STATS=1 ./examples/interface/complex
 
- Complex[0] = 45.00 + 12.00 i
 
- Complex[0] = 78.00 + 78.00 i
 
- Complex[0] = 45.00 + 12.00 i
 
- Complex[0] = 45.00 + 12.00 i
 
- #---------------------
 
- Memory stats:
 
- #-------
 
- Data on Node #3
 
- #-----
 
- Data : 0x553ff40
 
- Size : 16
 
- #--
 
- Data access stats
 
- /!\ Work Underway
 
- Node #0
 
- 	Direct access : 4
 
- 	Loaded (Owner) : 0
 
- 	Loaded (Shared) : 0
 
- 	Invalidated (was Owner) : 0
 
- Node #3
 
- 	Direct access : 0
 
- 	Loaded (Owner) : 0
 
- 	Loaded (Shared) : 1
 
- 	Invalidated (was Owner) : 0
 
- #-----
 
- Data : 0x5544710
 
- Size : 16
 
- #--
 
- Data access stats
 
- /!\ Work Underway
 
- Node #0
 
- 	Direct access : 2
 
- 	Loaded (Owner) : 0
 
- 	Loaded (Shared) : 1
 
- 	Invalidated (was Owner) : 1
 
- Node #3
 
- 	Direct access : 0
 
- 	Loaded (Owner) : 1
 
- 	Loaded (Shared) : 0
 
- 	Invalidated (was Owner) : 0
 
- @end example
 
- @node Data statistics
 
- @section Data statistics
 
- Different data statistics can be displayed at the end of the execution
 
- of the application. To enable them, you need to pass the option
 
- @code{--enable-stats} when calling @code{configure}. When calling
 
- @code{starpu_shutdown()} various statistics will be displayed,
 
- execution, MSI cache statistics, allocation cache statistics, and data
 
- transfer statistics. The display can be disabled by setting the
 
- environment variable @code{STARPU_STATS} to 0.
 
- @example
 
- $ ./examples/cholesky/cholesky_tag
 
- Computation took (in ms)
 
- 518.16
 
- Synthetic GFlops : 44.21
 
- #---------------------
 
- MSI cache stats :
 
- TOTAL MSI stats	hit 1622 (66.23 %)	miss 827 (33.77 %)
 
- ...
 
- @end example
 
- @example
 
- $ STARPU_STATS=0 ./examples/cholesky/cholesky_tag
 
- Computation took (in ms)
 
- 518.16
 
- Synthetic GFlops : 44.21
 
- @end example
 
- @c TODO: data transfer stats are similar to the ones displayed when
 
- @c setting STARPU_BUS_STATS
 
 
  |