123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2009-2020 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page BuildingAndInstallingStarPU Building and Installing StarPU
- \section InstallingABinaryPackage Installing a Binary Package
- One of the StarPU developers being a Debian Developer, the packages
- are well integrated and very uptodate. To see which packages are
- available, simply type:
- \verbatim
- $ apt-cache search starpu
- \endverbatim
- To install what you need, type for example:
- \verbatim
- $ sudo apt-get install libstarpu-1.3 libstarpu-dev
- \endverbatim
- \section InstallingFromSource Installing from Source
- StarPU can be built and installed by the standard means of the GNU
- autotools. The following chapter is intended to briefly remind how these tools
- can be used to install StarPU.
- \subsection OptionalDependencies Optional Dependencies
- The <c>hwloc</c> (http://www.open-mpi.org/software/hwloc) topology
- discovery library is not mandatory to use StarPU but strongly
- recommended. It allows for topology aware scheduling, which improves
- performance. <c>libhwloc</c> is available in major free operating system
- distributions, and for most operating systems.
- If <c>libhwloc</c> is installed in a standard
- location, no option is required, it will be detected automatically,
- otherwise \ref with-hwloc "--with-hwloc=<directory>" should be used to specify its
- location.
- If <c>libhwloc</c> is not available on your system, the option
- \ref without-hwloc "--without-hwloc" should be explicitely given when calling the
- script <c>configure</c>.
- \subsection GettingSources Getting Sources
- StarPU's sources can be obtained from the download page of
- the StarPU website (http://starpu.gforge.inria.fr/files/).
- All releases and the development tree of StarPU are freely available
- on Inria's gforge under the LGPL license. Some releases are available
- under the BSD license.
- The latest release can be downloaded from the Inria's gforge (http://gforge.inria.fr/frs/?group_id=1570) or
- directly from the StarPU download page (http://starpu.gforge.inria.fr/files/).
- The latest nightly snapshot can be downloaded from the StarPU gforge website (http://starpu.gforge.inria.fr/testing/).
- \verbatim
- $ wget http://starpu.gforge.inria.fr/testing/starpu-nightly-latest.tar.gz
- \endverbatim
- And finally, current development version is also accessible via git.
- It should only be used if you need the very latest changes (i.e. less
- than a day old!).
- \verbatim
- $ git clone https://scm.gforge.inria.fr/anonscm/git/starpu/starpu.git
- \endverbatim
- \subsection ConfiguringStarPU Configuring StarPU
- Running <c>autogen.sh</c> is not necessary when using the tarball
- releases of StarPU. However when using the source code from the git
- repository, you first need to generate the script <c>configure</c> and the
- different Makefiles. This requires the availability of <c>autoconf</c> and
- <c>automake</c> >= 2.60.
- \verbatim
- $ ./autogen.sh
- \endverbatim
- You then need to configure StarPU. Details about options that are
- useful to give to <c>configure</c> are given in \ref CompilationConfiguration.
- \verbatim
- $ ./configure
- \endverbatim
- If <c>configure</c> does not detect some software or produces errors, please
- make sure to post the contents of the file <c>config.log</c> when
- reporting the issue.
- By default, the files produced during the compilation are placed in
- the source directory. As the compilation generates a lot of files, it
- is advised to put them all in a separate directory. It is then
- easier to cleanup, and this allows to compile several configurations
- out of the same source tree. To do so, simply enter the directory
- where you want the compilation to produce its files, and invoke the
- script <c>configure</c> located in the StarPU source directory.
- \verbatim
- $ mkdir build
- $ cd build
- $ ../configure
- \endverbatim
- By default, StarPU will be installed in <c>/usr/local/bin</c>,
- <c>/usr/local/lib</c>, etc. You can specify an installation prefix
- other than <c>/usr/local</c> using the option <c>--prefix</c>, for
- instance:
- \verbatim
- $ ../configure --prefix=$HOME/starpu
- \endverbatim
- \subsection BuildingStarPU Building StarPU
- \verbatim
- $ make
- \endverbatim
- Once everything is built, you may want to test the result. An
- extensive set of regression tests is provided with StarPU. Running the
- tests is done by calling <c>make check</c>. These tests are run every night
- and the result from the main profile is publicly available (http://starpu.gforge.inria.fr/testing/master/).
- \verbatim
- $ make check
- \endverbatim
- \subsection InstallingStarPU Installing StarPU
- In order to install StarPU at the location which was specified during
- configuration:
- \verbatim
- $ make install
- \endverbatim
- Libtool interface versioning information are included in
- libraries names (<c>libstarpu-1.3.so</c>, <c>libstarpumpi-1.3.so</c> and
- <c>libstarpufft-1.3.so</c>).
- \section SettingUpYourOwnCode Setting up Your Own Code
- \subsection SettingFlagsForCompilingLinkingAndRunningApplications Setting Flags for Compiling, Linking and Running Applications
- StarPU provides a <c>pkg-config</c> executable to obtain relevant compiler
- and linker flags. As compiling and linking an application against
- StarPU may require to use specific flags or libraries (for instance
- <c>CUDA</c> or <c>libspe2</c>).
- If StarPU was not installed at some standard location, the path of StarPU's
- library must be specified in the environment variable
- <c>PKG_CONFIG_PATH</c> to allow <c>pkg-config</c> to find it. For
- example if StarPU was installed in
- <c>$STARPU_PATH</c>:
- \verbatim
- $ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$STARPU_PATH/lib/pkgconfig
- \endverbatim
- The flags required to compile or link against StarPU are then
- accessible with the following commands:
- \verbatim
- $ pkg-config --cflags starpu-1.3 # options for the compiler
- $ pkg-config --libs starpu-1.3 # options for the linker
- \endverbatim
- Note that it is still possible to use the API provided in the version
- 1.0 of StarPU by calling <c>pkg-config</c> with the <c>starpu-1.0</c> package.
- Similar packages are provided for <c>starpumpi-1.0</c> and <c>starpufft-1.0</c>.
- It is also possible to use the API provided in the version
- 0.9 of StarPU by calling <c>pkg-config</c> with the <c>libstarpu</c> package.
- Similar packages are provided for <c>libstarpumpi</c> and <c>libstarpufft</c>.
- Make sure that <c>pkg-config --libs starpu-1.3</c> actually produces some output
- before going further: <c>PKG_CONFIG_PATH</c> has to point to the place where
- <c>starpu-1.3.pc</c> was installed during <c>make install</c>.
- Also pass the option <c>--static</c> if the application is to be
- linked statically.
- It is also necessary to set the environment variable <c>LD_LIBRARY_PATH</c> to
- locate dynamic libraries at runtime.
- \verbatim
- $ export LD_LIBRARY_PATH=$STARPU_PATH/lib:$LD_LIBRARY_PATH
- \endverbatim
- And it is useful to get access to the StarPU tools:
- \verbatim
- $ export PATH=$PATH:$STARPU_PATH/bin
- \endverbatim
- It is then useful to check that StarPU executes correctly and finds your hardware:
- \verbatim
- $ starpu_machine_display
- \endverbatim
- If it does not, please check the output of \c lstopo from \c hwloc and report
- the issue to the \c hwloc project, since this is what StarPU uses to detect the hardware.
- <br>
- A tool is provided to help setting all the environment variables
- needed by StarPU. Once StarPU is installed in a specific directory,
- calling the script <c>bin/starpu_env</c> will set in your current
- environment the variables <c>STARPU_PATH</c>, <c>LD_LIBRARY_PATH</c>,
- <c>PKG_CONFIG_PATH</c>, <c>PATH</c> and <c>MANPATH</c>.
- \verbatim
- $ source $STARPU_PATH/bin/starpu_env
- \endverbatim
- \subsection IntegratingStarPUInABuildSystem Integrating StarPU in a Build System
- \subsubsection StarPUInMake Integrating StarPU in a Make Build System
- When using a Makefile, the following lines can be added to set the
- options for the compiler and the linker:
- \verbatim
- CFLAGS += $$(pkg-config --cflags starpu-1.3)
- LDLIBS += $$(pkg-config --libs starpu-1.3)
- \endverbatim
- If you have a \c test-starpu.c file containing for instance:
- \code{.c}
- #include <starpu.h>
- #include <stdio.h>
- int main(void)
- {
- int ret;
- ret = starpu_init(NULL);
- if (ret != 0)
- {
- return 1;
- }
- printf("%d CPU cores\n", starpu_worker_get_count_by_type(STARPU_CPU_WORKER));
- printf("%d CUDA GPUs\n", starpu_worker_get_count_by_type(STARPU_CUDA_WORKER));
- printf("%d OpenCL GPUs\n", starpu_worker_get_count_by_type(STARPU_OPENCL_WORKER));
- starpu_shutdown();
- return 0;
- }
- \endcode
- You can build it with <code>make test-starpu</code> and run it with <code>./test-starpu</code>
- \subsubsection StarPUInCMake Integrating StarPU in a CMake Build System
- This section shows a minimal example integrating StarPU in an existing application's CMake build system.
- Let's assume we want to build an executable from the following source code using CMake:
- \code{.c}
- #include <starpu.h>
- #include <stdio.h>
- int main(void)
- {
- int ret;
- ret = starpu_init(NULL);
- if (ret != 0)
- {
- return 1;
- }
- printf("%d CPU cores\n", starpu_worker_get_count_by_type(STARPU_CPU_WORKER));
- printf("%d CUDA GPUs\n", starpu_worker_get_count_by_type(STARPU_CUDA_WORKER));
- printf("%d OpenCL GPUs\n", starpu_worker_get_count_by_type(STARPU_OPENCL_WORKER));
- starpu_shutdown();
- return 0;
- }
- \endcode
- The \c CMakeLists.txt file below uses the Pkg-Config support from CMake to
- autodetect the StarPU installation and library dependences (such as
- <c>libhwloc</c>) provided that the <c>PKG_CONFIG_PATH</c> variable is set, and
- is sufficient to build a statically-linked executable. This example has been
- successfully tested with CMake 3.2, though it may work with earlier CMake 3.x
- versions.
- \code{File CMakeLists.txt}
- cmake_minimum_required (VERSION 3.2)
- project (hello_starpu)
- find_package(PkgConfig)
- pkg_check_modules(STARPU REQUIRED starpu-1.3)
- if (STARPU_FOUND)
- include_directories (${STARPU_INCLUDE_DIRS})
- link_directories (${STARPU_STATIC_LIBRARY_DIRS})
- link_libraries (${STARPU_STATIC_LIBRARIES})
- else (STARPU_FOUND)
- message(FATAL_ERROR "StarPU not found")
- endif()
- add_executable(hello_starpu hello_starpu.c)
- \endcode
- The following \c CMakeLists.txt implements an alternative, more complex
- strategy, still relying on Pkg-Config, but also taking into account additional
- flags. While more complete, this approach makes CMake's build types (Debug,
- Release, ...) unavailable because of the direct affectation to variable
- <c>CMAKE_C_FLAGS</c>. If both the full flags support and the build types
- support are needed, the \c CMakeLists.txt below may be altered to work with
- <c>CMAKE_C_FLAGS_RELEASE</c>, <c>CMAKE_C_FLAGS_DEBUG</c>, and others as needed.
- This example has been successfully tested with CMake 3.2, though it may work
- with earlier CMake 3.x versions.
- \code{File CMakeLists.txt}
- cmake_minimum_required (VERSION 3.2)
- project (hello_starpu)
- find_package(PkgConfig)
- pkg_check_modules(STARPU REQUIRED starpu-1.3)
- # This section must appear before 'add_executable'
- if (STARPU_FOUND)
- # CFLAGS other than -I
- foreach(CFLAG ${STARPU_CFLAGS_OTHER})
- set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CFLAG}")
- endforeach()
- # Static LDFLAGS other than -L
- foreach(LDFLAG ${STARPU_STATIC_LDFLAGS_OTHER})
- set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LDFLAG}")
- endforeach()
- # -L directories
- link_directories(${STARPU_STATIC_LIBRARY_DIRS})
- else (STARPU_FOUND)
- message(FATAL_ERROR "StarPU not found")
- endif()
- add_executable(hello_starpu hello_starpu.c)
- # This section must appear after 'add_executable'
- if (STARPU_FOUND)
- # -I directories
- target_include_directories(hello_starpu PRIVATE ${STARPU_INCLUDE_DIRS})
- # Static -l libs
- target_link_libraries(hello_starpu PRIVATE ${STARPU_STATIC_LIBRARIES})
- endif()
- \endcode
- \subsection RunningABasicStarPUApplication Running a Basic StarPU Application
- Basic examples using StarPU are built in the directory
- <c>examples/basic_examples/</c> (and installed in
- <c>$STARPU_PATH/lib/starpu/examples/</c>). You can for example run the example
- <c>vector_scal</c>.
- \verbatim
- $ ./examples/basic_examples/vector_scal
- BEFORE: First element was 1.000000
- AFTER: First element is 3.140000
- \endverbatim
- When StarPU is used for the first time, the directory
- <c>$STARPU_HOME/.starpu/</c> is created, performance models will be stored in
- this directory (\ref STARPU_HOME).
- Please note that buses are benchmarked when StarPU is launched for the
- first time. This may take a few minutes, or less if <c>libhwloc</c> is
- installed. This step is done only once per user and per machine.
- \subsection RunningABasicStarPUApplicationOnMicrosoft Running a Basic StarPU Application on Microsoft Visual C
- Batch files are provided to run StarPU applications under Microsoft
- Visual C. They are installed in <c>$STARPU_PATH/bin/msvc</c>.
- To execute a StarPU application, you first need to set the environment
- variable \ref STARPU_PATH.
- \verbatim
- c:\....> cd c:\cygwin\home\ci\starpu\
- c:\....> set STARPU_PATH=c:\cygwin\home\ci\starpu\
- c:\....> cd bin\msvc
- c:\....> starpu_open.bat starpu_simple.c
- \endverbatim
- The batch script will run Microsoft Visual C with a basic project file
- to run the given application.
- The batch script <c>starpu_clean.bat</c> can be used to delete all
- compilation generated files.
- The batch script <c>starpu_exec.bat</c> can be used to compile and execute a
- StarPU application from the command prompt.
- \verbatim
- c:\....> cd c:\cygwin\home\ci\starpu\
- c:\....> set STARPU_PATH=c:\cygwin\home\ci\starpu\
- c:\....> cd bin\msvc
- c:\....> starpu_exec.bat ..\..\..\..\examples\basic_examples\hello_world.c
- \endverbatim
- \verbatim
- MSVC StarPU Execution
- ...
- /out:hello_world.exe
- ...
- Hello world (params = {1, 2.00000})
- Callback function got argument 0000042
- c:\....>
- \endverbatim
- \subsection KernelThreadsStartedByStarPU Kernel Threads Started by StarPU
- StarPU automatically binds one thread per CPU core. It does not use
- SMT/hyperthreading because kernels are usually already optimized for using a
- full core, and using hyperthreading would make kernel calibration rather random.
- Since driving GPUs is a CPU-consuming task, StarPU dedicates one core
- per GPU.
- While StarPU tasks are executing, the application is not supposed to do
- computations in the threads it starts itself, tasks should be used instead.
- If the application needs to reserve some cores for its own computations, it
- can do so with the field starpu_conf::reserve_ncpus, get the core IDs with
- starpu_get_next_bindid(), and bind to them with starpu_bind_thread_on().
- Another option is for the application to pause StarPU by calling
- starpu_pause(), then to perform its own computations, and then to
- resume StarPU by calling starpu_resume() so that StarPU can execute
- tasks.
- \subsection EnablingOpenCL Enabling OpenCL
- When both CUDA and OpenCL drivers are enabled, StarPU will launch an
- OpenCL worker for NVIDIA GPUs only if CUDA is not already running on them.
- This design choice was necessary as OpenCL and CUDA can not run at the
- same time on the same NVIDIA GPU, as there is currently no interoperability
- between them.
- To enable OpenCL, you need either to disable CUDA when configuring StarPU:
- \verbatim
- $ ./configure --disable-cuda
- \endverbatim
- or when running applications:
- \verbatim
- $ STARPU_NCUDA=0 ./application
- \endverbatim
- OpenCL will automatically be started on any device not yet used by
- CUDA. So on a machine running 4 GPUS, it is therefore possible to
- enable CUDA on 2 devices, and OpenCL on the 2 other devices by doing
- so:
- \verbatim
- $ STARPU_NCUDA=2 ./application
- \endverbatim
- \section BenchmarkingStarPU Benchmarking StarPU
- Some interesting benchmarks are installed among examples in
- <c>$STARPU_PATH/lib/starpu/examples/</c>. Make sure to try various
- schedulers, for instance <c>STARPU_SCHED=dmda</c>.
- \subsection TaskSizeOverhead Task Size Overhead
- This benchmark gives a glimpse into how long a task should be (in µs) for StarPU overhead
- to be low enough to keep efficiency. Running
- <c>tasks_size_overhead.sh</c> generates a plot
- of the speedup of tasks of various sizes, depending on the number of CPUs being
- used.
- \image html tasks_size_overhead.png
- \image latex tasks_size_overhead.eps "" width=\textwidth
- \subsection DataTransferLatency Data Transfer Latency
- <c>local_pingpong</c> performs a ping-pong between the first two CUDA nodes, and
- prints the measured latency.
- \subsection MatrixMatrixMultiplication Matrix-Matrix Multiplication
- <c>sgemm</c> and <c>dgemm</c> perform a blocked matrix-matrix
- multiplication using BLAS and cuBLAS. They output the obtained GFlops.
- \subsection CholeskyFactorization Cholesky Factorization
- <c>cholesky_*</c> perform a Cholesky factorization (single precision). They use different dependency primitives.
- \subsection LUFactorization LU Factorization
- <c>lu_*</c> perform an LU factorization. They use different dependency primitives.
- \subsection SimulatedBenchmarks Simulated Benchmarks
- It can also be convenient to try simulated benchmarks, if you want to give a try
- at CPU-GPU scheduling without actually having a GPU at hand. This can be done by
- using the SimGrid version of StarPU: first install the SimGrid simulator from
- http://simgrid.gforge.inria.fr/ (we tested with SimGrid from 3.11 to 3.16, and
- 3.18 to 3.22, other versions may have compatibility issues, 3.17 notably does
- not build at all. MPI simulation does not work with version 3.22),
- then configure StarPU with \ref enable-simgrid
- "--enable-simgrid" and rebuild and install it, and then you can simulate the performance for a
- few virtualized systems shipped along StarPU: attila, mirage, idgraf, and sirocco.
- For instance:
- \verbatim
- $ export STARPU_PERF_MODEL_DIR=$STARPU_PATH/share/starpu/perfmodels/sampling
- $ export STARPU_HOSTNAME=attila
- $ $STARPU_PATH/lib/starpu/examples/cholesky_implicit -size $((960*20)) -nblocks 20
- \endverbatim
- Will show the performance of the cholesky factorization with the attila
- system. It will be interesting to try with different matrix sizes and
- schedulers.
- Performance models are available for <c>cholesky_*</c>, <c>lu_*</c>, <c>*gemm</c>, with block sizes
- 320, 640, or 960 (plus 1440 for sirocco), and for <c>stencil</c> with block size 128x128x128, 192x192x192, and
- 256x256x256.
- Read the chapter \ref SimGridSupport for more information on the SimGrid support.
- */
|