| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549 | /* StarPU --- Runtime system for heterogeneous multicore architectures. * * Copyright (C) 2009-2021  Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria * * StarPU is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or (at * your option) any later version. * * StarPU is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * See the GNU Lesser General Public License in COPYING.LGPL for more details. *//*! \page BuildingAndInstallingStarPU Building and Installing StarPU\section InstallingABinaryPackage Installing a Binary PackageOne of the StarPU developers being a Debian Developer, the packagesare well integrated and very uptodate. To see which packages areavailable, simply type:\verbatim$ apt-cache search starpu\endverbatimTo install what you need, type for example:\verbatim$ sudo apt-get install libstarpu-1.3 libstarpu-dev\endverbatim\section InstallingFromSource Installing from SourceStarPU can be built and installed by the standard means of the GNUautotools. The following chapter is intended to briefly remind how these toolscan be used to install StarPU.\subsection OptionalDependencies Optional DependenciesThe <c>hwloc</c> (http://www.open-mpi.org/software/hwloc) topologydiscovery library is not mandatory to use StarPU but stronglyrecommended.  It allows for topology aware scheduling, which improvesperformance. <c>hwloc</c> is available in major free operating systemdistributions, and for most operating systems. Make sure to not only installa <c>hwloc</c> or <c>libhwloc</c> package, but also <c>hwloc-devel</c> or<c>libhwloc-dev</c> so as to have \c hwloc headers etc.If <c>libhwloc</c> is installed in a standardlocation, no option is required, it will be detected automatically,otherwise \ref with-hwloc "--with-hwloc=<directory>" should be used to specify itslocation.If <c>libhwloc</c> is not available on your system, the option\ref without-hwloc "--without-hwloc" should be explicitely given when calling thescript <c>configure</c>.\subsection GettingSources Getting SourcesStarPU's sources can be obtained from the download page ofthe StarPU website (https://starpu.gitlabpages.inria.fr/files/).All releases and the development tree of StarPU are freely availableon StarPU SCM server under the LGPL license. Some releases are availableunder the BSD license.The latest release can be downloaded from the StarPU download page (https://starpu.gitlabpages.inria.fr/files/).The latest nightly snapshot can be downloaded from the StarPU website (https://starpu.gitlabpages.inria.fr/files/testing/).And finally, current development version is also accessible via git.It should only be used if you need the very latest changes (i.e. lessthan a day old!).\verbatim$ git clone git@gitlab.inria.fr:starpu/starpu.git\endverbatim\subsection ConfiguringStarPU Configuring StarPURunning <c>autogen.sh</c> is not necessary when using the tarballreleases of StarPU.  However when using the source code from the gitrepository, you first need to generate the script <c>configure</c> and thedifferent Makefiles. This requires the availability of <c>autoconf</c> and<c>automake</c> >= 2.60.\verbatim$ ./autogen.sh\endverbatimYou then need to configure StarPU. Details about options that areuseful to give to <c>configure</c> are given in \ref CompilationConfiguration.\verbatim$ ./configure\endverbatimIf <c>configure</c> does not detect some software or produces errors, pleasemake sure to post the contents of the file <c>config.log</c> whenreporting the issue.By default, the files produced during the compilation are placed inthe source directory. As the compilation generates a lot of files, itis advised to put them all in a separate directory. It is theneasier to cleanup, and this allows to compile several configurationsout of the same source tree. To do so, simply enter the directorywhere you want the compilation to produce its files, and invoke thescript <c>configure</c> located in the StarPU source directory.\verbatim$ mkdir build$ cd build$ ../configure\endverbatimBy default, StarPU will be installed in <c>/usr/local/bin</c>,<c>/usr/local/lib</c>, etc. You can specify an installation prefixother than <c>/usr/local</c> using the option <c>--prefix</c>, forinstance:\verbatim$ ../configure --prefix=$HOME/starpu\endverbatim\subsection BuildingStarPU Building StarPU\verbatim$ make\endverbatimOnce everything is built, you may want to test the result. Anextensive set of regression tests is provided with StarPU. Running thetests is done by calling <c>make check</c>. These tests are run every nightand the result from the main profile is publicly available (https://starpu.gitlabpages/files/testing/master/).\verbatim$ make check\endverbatim\subsection InstallingStarPU Installing StarPUIn order to install StarPU at the location which was specified duringconfiguration:\verbatim$ make install\endverbatimIf you have let StarPU install in <c>/usr/local/</c>, you additionally need to run\verbatim$ sudo ldconfig\endverbatimso the libraries can be found by the system.Libtool interface versioning information are included inlibraries names (<c>libstarpu-1.3.so</c>, <c>libstarpumpi-1.3.so</c> and<c>libstarpufft-1.3.so</c>).\section SettingUpYourOwnCode Setting up Your Own Code\subsection SettingFlagsForCompilingLinkingAndRunningApplications Setting Flags for Compiling, Linking and Running ApplicationsStarPU provides a <c>pkg-config</c> executable to obtain relevant compilerand linker flags. As compiling and linking an application againstStarPU may require to use specific flags or libraries (for instance<c>CUDA</c> or <c>libspe2</c>).If StarPU was not installed at some standard location, the path of StarPU'slibrary must be specified in the environment variable<c>PKG_CONFIG_PATH</c> to allow <c>pkg-config</c> to find it. Forexample if StarPU was installed in<c>$STARPU_PATH</c>:\verbatim$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$STARPU_PATH/lib/pkgconfig\endverbatimThe flags required to compile or link against StarPU are thenaccessible with the following commands:\verbatim$ pkg-config --cflags starpu-1.3  # options for the compiler$ pkg-config --libs starpu-1.3    # options for the linker\endverbatimNote that it is still possible to use the API provided in the version1.0 of StarPU by calling <c>pkg-config</c> with the <c>starpu-1.0</c> package.Similar packages are provided for <c>starpumpi-1.0</c> and <c>starpufft-1.0</c>.It is also possible to use the API provided in the version0.9 of StarPU by calling <c>pkg-config</c> with the <c>libstarpu</c> package.Similar packages are provided for <c>libstarpumpi</c> and <c>libstarpufft</c>.Make sure that <c>pkg-config --libs starpu-1.3</c> actually produces some outputbefore going further: <c>PKG_CONFIG_PATH</c> has to point to the place where<c>starpu-1.3.pc</c> was installed during <c>make install</c>.Also pass the option <c>--static</c> if the application is to belinked statically.It is also necessary to set the environment variable <c>LD_LIBRARY_PATH</c> tolocate dynamic libraries at runtime.\verbatim$ export LD_LIBRARY_PATH=$STARPU_PATH/lib:$LD_LIBRARY_PATH\endverbatimAnd it is useful to get access to the StarPU tools:\verbatim$ export PATH=$PATH:$STARPU_PATH/bin\endverbatimIt is then useful to check that StarPU executes correctly and finds your hardware:\verbatim$ starpu_machine_display\endverbatimIf it does not, please check the output of \c lstopo from \c hwloc and reportthe issue to the \c hwloc project, since this is what StarPU uses to detect the hardware.<br>A tool is provided to help setting all the environment variablesneeded by StarPU. Once StarPU is installed in a specific directory,calling the script <c>bin/starpu_env</c> will set in your currentenvironment the variables <c>STARPU_PATH</c>, <c>LD_LIBRARY_PATH</c>,<c>PKG_CONFIG_PATH</c>, <c>PATH</c> and <c>MANPATH</c>.\verbatim$ source $STARPU_PATH/bin/starpu_env\endverbatim\subsection IntegratingStarPUInABuildSystem Integrating StarPU in a Build System\subsubsection StarPUInMake Integrating StarPU in a Make Build SystemWhen using a Makefile, the following lines can be added to set theoptions for the compiler and the linker:\verbatimCFLAGS          +=      $$(pkg-config --cflags starpu-1.3)LDLIBS          +=      $$(pkg-config --libs starpu-1.3)\endverbatimIf you have a \c test-starpu.c file containing for instance:\code{.c}#include <starpu.h>#include <stdio.h>int main(void){    int ret;    ret = starpu_init(NULL);    if (ret != 0)    {        return 1;    }    printf("%d CPU cores\n", starpu_worker_get_count_by_type(STARPU_CPU_WORKER));    printf("%d CUDA GPUs\n", starpu_worker_get_count_by_type(STARPU_CUDA_WORKER));    printf("%d OpenCL GPUs\n", starpu_worker_get_count_by_type(STARPU_OPENCL_WORKER));    starpu_shutdown();    return 0;}\endcodeYou can build it with <code>make test-starpu</code> and run it with <code>./test-starpu</code>\subsubsection StarPUInCMake Integrating StarPU in a CMake Build SystemThis section shows a minimal example integrating StarPU in an existing application's CMake build system.Let's assume we want to build an executable from the following source code using CMake:\code{.c}#include <starpu.h>#include <stdio.h>int main(void){    int ret;    ret = starpu_init(NULL);    if (ret != 0)    {        return 1;    }    printf("%d CPU cores\n", starpu_worker_get_count_by_type(STARPU_CPU_WORKER));    printf("%d CUDA GPUs\n", starpu_worker_get_count_by_type(STARPU_CUDA_WORKER));    printf("%d OpenCL GPUs\n", starpu_worker_get_count_by_type(STARPU_OPENCL_WORKER));    starpu_shutdown();    return 0;}\endcodeThe \c CMakeLists.txt file below uses the Pkg-Config support from CMake toautodetect the StarPU installation and library dependences (such as<c>libhwloc</c>) provided that the <c>PKG_CONFIG_PATH</c> variable is set, andis sufficient to build a statically-linked executable. This example has beensuccessfully tested with CMake 3.2, though it may work with earlier CMake 3.xversions.\code{File CMakeLists.txt}cmake_minimum_required (VERSION 3.2)project (hello_starpu)find_package(PkgConfig)pkg_check_modules(STARPU REQUIRED starpu-1.3)if (STARPU_FOUND)    include_directories (${STARPU_INCLUDE_DIRS})    link_directories    (${STARPU_STATIC_LIBRARY_DIRS})    link_libraries      (${STARPU_STATIC_LIBRARIES})else (STARPU_FOUND)    message(FATAL_ERROR "StarPU not found")endif()add_executable(hello_starpu hello_starpu.c)\endcodeThe following \c CMakeLists.txt implements an alternative, more complexstrategy, still relying on Pkg-Config, but also taking into account additionalflags. While more complete, this approach makes CMake's build types (Debug,Release, ...) unavailable because of the direct affectation to variable<c>CMAKE_C_FLAGS</c>. If both the full flags support and the build typessupport are needed, the \c CMakeLists.txt below may be altered to work with<c>CMAKE_C_FLAGS_RELEASE</c>, <c>CMAKE_C_FLAGS_DEBUG</c>, and others as needed.This example has been successfully tested with CMake 3.2, though it may workwith earlier CMake 3.x versions. \code{File CMakeLists.txt}cmake_minimum_required (VERSION 3.2)project (hello_starpu)find_package(PkgConfig)pkg_check_modules(STARPU REQUIRED starpu-1.3)# This section must appear before 'add_executable'if (STARPU_FOUND)    # CFLAGS other than -I    foreach(CFLAG ${STARPU_CFLAGS_OTHER})        set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CFLAG}")    endforeach()    # Static LDFLAGS other than -L    foreach(LDFLAG ${STARPU_STATIC_LDFLAGS_OTHER})        set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LDFLAG}")    endforeach()    # -L directories    link_directories(${STARPU_STATIC_LIBRARY_DIRS})else (STARPU_FOUND)    message(FATAL_ERROR "StarPU not found")endif()add_executable(hello_starpu hello_starpu.c)# This section must appear after 'add_executable'if (STARPU_FOUND)    # -I directories    target_include_directories(hello_starpu PRIVATE ${STARPU_INCLUDE_DIRS})    # Static -l libs    target_link_libraries(hello_starpu PRIVATE ${STARPU_STATIC_LIBRARIES})endif()\endcode\subsection RunningABasicStarPUApplication Running a Basic StarPU ApplicationBasic examples using StarPU are built in the directory<c>examples/basic_examples/</c> (and installed in<c>$STARPU_PATH/lib/starpu/examples/</c>). You can for example run the example<c>vector_scal</c>.\verbatim$ ./examples/basic_examples/vector_scalBEFORE: First element was 1.000000AFTER: First element is 3.140000\endverbatimWhen StarPU is used for the first time, the directory<c>$STARPU_HOME/.starpu/</c> is created, performance models will be stored inthis directory (\ref STARPU_HOME).Please note that buses are benchmarked when StarPU is launched for thefirst time. This may take a few minutes, or less if <c>libhwloc</c> isinstalled. This step is done only once per user and per machine.\subsection RunningABasicStarPUApplicationOnMicrosoft Running a Basic StarPU Application on Microsoft Visual CBatch files are provided to run StarPU applications under MicrosoftVisual C. They are installed in <c>$STARPU_PATH/bin/msvc</c>.To execute a StarPU application, you first need to set the environmentvariable \ref STARPU_PATH.\verbatimc:\....> cd c:\cygwin\home\ci\starpu\c:\....> set STARPU_PATH=c:\cygwin\home\ci\starpu\c:\....> cd bin\msvcc:\....> starpu_open.bat starpu_simple.c\endverbatimThe batch script will run Microsoft Visual C with a basic project fileto run the given application.The batch script <c>starpu_clean.bat</c> can be used to delete allcompilation generated files.The batch script <c>starpu_exec.bat</c> can be used to compile and execute aStarPU application from the command prompt.\verbatimc:\....> cd c:\cygwin\home\ci\starpu\c:\....> set STARPU_PATH=c:\cygwin\home\ci\starpu\c:\....> cd bin\msvcc:\....> starpu_exec.bat ..\..\..\..\examples\basic_examples\hello_world.c\endverbatim\verbatimMSVC StarPU Execution.../out:hello_world.exe...Hello world (params = {1, 2.00000})Callback function got argument 0000042c:\....>\endverbatim\subsection KernelThreadsStartedByStarPU Kernel Threads Started by StarPUStarPU automatically binds one thread per CPU core. It does not useSMT/hyperthreading because kernels are usually already optimized for using afull core, and using hyperthreading would make kernel calibration rather random.Since driving GPUs is a CPU-consuming task, StarPU dedicates one coreper GPU.While StarPU tasks are executing, the application is not supposed to docomputations in the threads it starts itself, tasks should be used instead.If the application needs to reserve some cores for its own computations, itcan do so with the field starpu_conf::reserve_ncpus, get the core IDs withstarpu_get_next_bindid(), and bind to them with starpu_bind_thread_on().Another option is for the application to pause StarPU by callingstarpu_pause(), then to perform its own computations, and then toresume StarPU by calling starpu_resume() so that StarPU can executetasks.\subsection EnablingOpenCL Enabling OpenCLWhen both CUDA and OpenCL drivers are enabled, StarPU will launch anOpenCL worker for NVIDIA GPUs only if CUDA is not already running on them.This design choice was necessary as OpenCL and CUDA can not run at thesame time on the same NVIDIA GPU, as there is currently no interoperabilitybetween them.To enable OpenCL, you need either to disable CUDA when configuring StarPU:\verbatim$ ./configure --disable-cuda\endverbatimor when running applications:\verbatim$ STARPU_NCUDA=0 ./application\endverbatimOpenCL will automatically be started on any device not yet used byCUDA. So on a machine running 4 GPUS, it is therefore possible toenable CUDA on 2 devices, and OpenCL on the 2 other devices by doingso:\verbatim$ STARPU_NCUDA=2 ./application\endverbatim\section BenchmarkingStarPU Benchmarking StarPUSome interesting benchmarks are installed among examples in<c>$STARPU_PATH/lib/starpu/examples/</c>. Make sure to try variousschedulers, for instance <c>STARPU_SCHED=dmda</c>.\subsection TaskSizeOverhead Task Size OverheadThis benchmark gives a glimpse into how long a task should be (in µs) for StarPU overheadto be low enough to keep efficiency.  Running<c>tasks_size_overhead.sh</c> generates a plotof the speedup of tasks of various sizes, depending on the number of CPUs beingused.\image html tasks_size_overhead.png\image latex tasks_size_overhead.eps "" width=\textwidth\subsection DataTransferLatency Data Transfer Latency<c>local_pingpong</c> performs a ping-pong between the first two CUDA nodes, andprints the measured latency.\subsection MatrixMatrixMultiplication Matrix-Matrix Multiplication<c>sgemm</c> and <c>dgemm</c> perform a blocked matrix-matrixmultiplication using BLAS and cuBLAS. They output the obtained GFlops.\subsection CholeskyFactorization Cholesky Factorization<c>cholesky_*</c> perform a Cholesky factorization (single precision). They use different dependency primitives.\subsection LUFactorization LU Factorization<c>lu_*</c> perform an LU factorization. They use different dependency primitives.\subsection SimulatedBenchmarks Simulated BenchmarksIt can also be convenient to try simulated benchmarks, if you want to give a tryat CPU-GPU scheduling without actually having a GPU at hand. This can be done byusing the SimGrid version of StarPU: first install the SimGrid simulator fromhttp://simgrid.gforge.inria.fr/ (we tested with SimGrid from 3.11 to 3.16, and3.18 to 3.25. SimGrid versions 3.25 and above need to be configured with \c -Denable_msg=ON.Other versions may have compatibility issues, 3.17 notably doesnot build at all. MPI simulation does not work with version 3.22).Then configure StarPU with \ref enable-simgrid"--enable-simgrid" and rebuild and install it, and then you can simulate the performance for afew virtualized systems shipped along StarPU: attila, mirage, idgraf, and sirocco.For instance:\verbatim$ export STARPU_PERF_MODEL_DIR=$STARPU_PATH/share/starpu/perfmodels/sampling$ export STARPU_HOSTNAME=attila$ $STARPU_PATH/lib/starpu/examples/cholesky_implicit -size $((960*20)) -nblocks 20\endverbatimWill show the performance of the cholesky factorization with the attilasystem. It will be interesting to try with different matrix sizes andschedulers.Performance models are available for <c>cholesky_*</c>, <c>lu_*</c>, <c>*gemm</c>, with block sizes320, 640, or 960 (plus 1440 for sirocco), and for <c>stencil</c> with block size 128x128x128, 192x192x192, and256x256x256.Read the chapter \ref SimGridSupport for more information on the SimGrid support.*/
 |