123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499 |
- /* StarPU --- Runtime system for heterogeneous multicore architectures.
- *
- * Copyright (C) 2009-2021 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
- *
- * StarPU is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or (at
- * your option) any later version.
- *
- * StarPU is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * See the GNU Lesser General Public License in COPYING.LGPL for more details.
- */
- /*! \page BasicExamples Basic Examples
- \section HelloWorldUsingStarPUAPI Hello World
- This section shows how to implement a simple program that submits a task
- to StarPU.
- \subsection RequiredHeaders Required Headers
- The header starpu.h should be included in any code using StarPU.
- \code{.c}
- #include <starpu.h>
- \endcode
- \subsection DefiningACodelet Defining A Codelet
- A codelet is a structure that represents a computational kernel. Such a codelet
- may contain an implementation of the same kernel on different architectures
- (e.g. CUDA, x86, ...). For compatibility, make sure that the whole
- structure is properly initialized to zero, either by using the
- function starpu_codelet_init(), or by letting the
- compiler implicitly do it as examplified below.
- The field starpu_codelet::nbuffers specifies the number of data buffers that are
- manipulated by the codelet: here the codelet does not access or modify any data
- that is controlled by our data management library.
- We create a codelet which may only be executed on CPUs. When a CPU
- core will execute a codelet, it will call the function
- <c>cpu_func</c>, which \em must have the following prototype:
- \code{.c}
- void (*cpu_func)(void *buffers[], void *cl_arg);
- \endcode
- In this example, we can ignore the first argument of this function which gives a
- description of the input and output buffers (e.g. the size and the location of
- the matrices) since there is none. We also ignore the second argument
- which is a pointer to optional arguments for the codelet.
- \code{.c}
- void cpu_func(void *buffers[], void *cl_arg)
- {
- printf("Hello world\n");
- }
- struct starpu_codelet cl =
- {
- .cpu_funcs = { cpu_func },
- .nbuffers = 0
- };
- \endcode
- \subsection SubmittingATask Submitting A Task
- Before submitting any tasks to StarPU, starpu_init() must be called. The
- <c>NULL</c> argument specifies that we use the default configuration.
- Tasks can then be submitted until the termination of StarPU -- done by a
- call to starpu_shutdown().
- In the example below, a task structure is allocated by a call to
- starpu_task_create(). This function allocates and fills the
- task structure with its default settings, it does not
- submit the task to StarPU.
- The field starpu_task::cl is a pointer to the codelet which the task will
- execute: in other words, the codelet structure describes which computational
- kernel should be offloaded on the different architectures, and the task
- structure is a wrapper containing a codelet and the piece of data on which the
- codelet should operate.
- If the field starpu_task::synchronous is non-zero, task submission
- will be synchronous: the function starpu_task_submit() will not return
- until the task has been executed. Note that the function starpu_shutdown()
- does not guarantee that asynchronous tasks have been executed before
- it returns, starpu_task_wait_for_all() can be used to this effect, or
- data can be unregistered (starpu_data_unregister()), which will
- implicitly wait for all the tasks scheduled to work on it, unless
- explicitly disabled thanks to
- starpu_data_set_default_sequential_consistency_flag() or
- starpu_data_set_sequential_consistency_flag().
- \code{.c}
- int main(int argc, char **argv)
- {
- /* initialize StarPU */
- starpu_init(NULL);
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl; /* Pointer to the codelet defined above */
- /* starpu_task_submit will be a blocking call. If unset,
- starpu_task_wait() needs to be called after submitting the task. */
- task->synchronous = 1;
- /* submit the task to StarPU */
- starpu_task_submit(task);
- /* terminate StarPU */
- starpu_shutdown();
- return 0;
- }
- \endcode
- \subsection ExecutionOfHelloWorld Execution Of Hello World
- \verbatim
- $ make hello_world
- cc $(pkg-config --cflags starpu-1.3) hello_world.c -o hello_world $(pkg-config --libs starpu-1.3)
- $ ./hello_world
- Hello world
- \endverbatim
- \subsection PassingArgumentsToTheCodelet Passing Arguments To The Codelet
- The optional field starpu_task::cl_arg field is a pointer to a buffer
- (of size starpu_task::cl_arg_size) with some parameters for the kernel
- described by the codelet. For instance, if a codelet implements a
- computational kernel that multiplies its input vector by a constant,
- the constant could be specified by the means of this buffer, instead
- of registering it as a StarPU data. It must however be noted that
- StarPU avoids making copy whenever possible and rather passes the
- pointer as such, so the buffer which is pointed at must be kept allocated
- until the task terminates, and if several tasks are submitted with
- various parameters, each of them must be given a pointer to their
- own buffer.
- \code{.c}
- struct params
- {
- int i;
- float f;
- };
- void cpu_func(void *buffers[], void *cl_arg)
- {
- struct params *params = cl_arg;
- printf("Hello world (params = {%i, %f} )\n", params->i, params->f);
- }
- \endcode
- As said before, the field starpu_codelet::nbuffers specifies the
- number of data buffers which are manipulated by the codelet. It does
- not count the argument --- the parameter <c>cl_arg</c> of the function
- <c>cpu_func</c> --- since it is not managed by our data management
- library, but just contains trivial parameters.
- // TODO rewrite so that it is a little clearer ?
- Be aware that this may be a pointer to a
- \em copy of the actual buffer, and not the pointer given by the programmer:
- if the codelet modifies this buffer, there is no guarantee that the initial
- buffer will be modified as well: this for instance implies that the buffer
- cannot be used as a synchronization medium. If synchronization is needed, data
- has to be registered to StarPU, see \ref VectorScalingUsingStarPUAPI.
- \code{.c}
- int main(int argc, char **argv)
- {
- /* initialize StarPU */
- starpu_init(NULL);
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl; /* Pointer to the codelet defined above */
- struct params params = { 1, 2.0f };
- task->cl_arg = ¶ms;
- task->cl_arg_size = sizeof(params);
- /* starpu_task_submit will be a blocking call */
- task->synchronous = 1;
- /* submit the task to StarPU */
- starpu_task_submit(task);
- /* terminate StarPU */
- starpu_shutdown();
- return 0;
- }
- \endcode
- \verbatim
- $ make hello_world
- cc $(pkg-config --cflags starpu-1.3) hello_world.c -o hello_world $(pkg-config --libs starpu-1.3)
- $ ./hello_world
- Hello world (params = {1, 2.000000} )
- \endverbatim
- \subsection DefiningACallback Defining A Callback
- Once a task has been executed, an optional callback function
- starpu_task::callback_func is called when defined.
- While the computational kernel could be offloaded on various architectures, the
- callback function is always executed on a CPU. The pointer
- starpu_task::callback_arg is passed as an argument to the callback
- function. The prototype of a callback function must be:
- \code{.c}
- void (*callback_function)(void *);
- \endcode
- \code{.c}
- void callback_func(void *callback_arg)
- {
- printf("Callback function (arg %x)\n", callback_arg);
- }
- int main(int argc, char **argv)
- {
- /* initialize StarPU */
- starpu_init(NULL);
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl; /* Pointer to the codelet defined above */
- task->callback_func = callback_func;
- task->callback_arg = 0x42;
- /* starpu_task_submit will be a blocking call */
- task->synchronous = 1;
- /* submit the task to StarPU */
- starpu_task_submit(task);
- /* terminate StarPU */
- starpu_shutdown();
- return 0;
- }
- \endcode
- \verbatim
- $ make hello_world
- cc $(pkg-config --cflags starpu-1.3) hello_world.c -o hello_world $(pkg-config --libs starpu-1.3)
- $ ./hello_world
- Hello world
- Callback function (arg 42)
- \endverbatim
- \subsection WhereToExecuteACodelet Where To Execute A Codelet
- \code{.c}
- struct starpu_codelet cl =
- {
- .where = STARPU_CPU,
- .cpu_funcs = { cpu_func },
- .cpu_funcs_name = { "cpu_func" },
- .nbuffers = 0
- };
- \endcode
- We create a codelet which may only be executed on the CPUs. The
- optional field starpu_codelet::where is a bitmask which defines where
- the codelet may be executed. Here, the value ::STARPU_CPU means that
- only CPUs can execute this codelet. When the optional field
- starpu_codelet::where is unset, its value is automatically set based
- on the availability of the different fields <c>XXX_funcs</c>.
- TODO: explain starpu_codelet::cpu_funcs_name
- \section VectorScalingUsingStarPUAPI Vector Scaling
- The previous example has shown how to submit tasks. In this section,
- we show how StarPU tasks can manipulate data.
- The full source code for
- this example is given in \ref FullSourceCodeVectorScal.
- \subsection SourceCodeOfVectorScaling Source Code of Vector Scaling
- Programmers can describe the data layout of their application so that StarPU is
- responsible for enforcing data coherency and availability across the machine.
- Instead of handling complex (and non-portable) mechanisms to perform data
- movements, programmers only declare which piece of data is accessed and/or
- modified by a task, and StarPU makes sure that when a computational kernel
- starts somewhere (e.g. on a GPU), its data are available locally.
- Before submitting those tasks, the programmer first needs to declare the
- different pieces of data to StarPU using the functions
- <c>starpu_*_data_register</c>. To ease the development of applications
- for StarPU, it is possible to describe multiple types of data layout.
- A type of data layout is called an <b>interface</b>. There are
- different predefined interfaces available in StarPU: here we will
- consider the <b>vector interface</b>.
- The following lines show how to declare an array of <c>NX</c> elements of type
- <c>float</c> using the vector interface:
- \code{.c}
- float vector[NX];
- starpu_data_handle_t vector_handle;
- starpu_vector_data_register(&vector_handle, STARPU_MAIN_RAM, (uintptr_t)vector, NX, sizeof(vector[0]));
- \endcode
- The first argument, called the <b>data handle</b>, is an opaque pointer which
- designates the array within StarPU. This is also the structure which is used to
- describe which data is used by a task. The second argument is the node number
- where the data originally resides. Here it is ::STARPU_MAIN_RAM since the array <c>vector</c> is in
- the main memory. Then comes the pointer <c>vector</c> where the data can be found in main memory,
- the number of elements in the vector and the size of each element.
- The following shows how to construct a StarPU task that will manipulate the
- vector and a constant factor.
- \code{.c}
- float factor = 3.14;
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl; /* Pointer to the codelet defined below */
- task->handles[0] = vector_handle; /* First parameter of the codelet */
- task->cl_arg = &factor;
- task->cl_arg_size = sizeof(factor);
- task->synchronous = 1;
- starpu_task_submit(task);
- \endcode
- Since the factor is a mere constant float value parameter,
- it does not need a preliminary registration, and
- can just be passed through the pointer starpu_task::cl_arg like in the previous
- example. The vector parameter is described by its handle.
- starpu_task::handles should be set with the handles of the data, the
- access modes for the data are defined in the field
- starpu_codelet::modes (::STARPU_R for read-only, ::STARPU_W for
- write-only and ::STARPU_RW for read and write access).
- The definition of the codelet can be written as follows:
- \code{.c}
- void scal_cpu_func(void *buffers[], void *cl_arg)
- {
- unsigned i;
- float *factor = cl_arg;
- /* length of the vector */
- unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);
- /* CPU copy of the vector pointer */
- float *val = (float *)STARPU_VECTOR_GET_PTR(buffers[0]);
- for (i = 0; i < n; i++)
- val[i] *= *factor;
- }
- struct starpu_codelet cl =
- {
- .cpu_funcs = { scal_cpu_func },
- .cpu_funcs_name = { "scal_cpu_func" },
- .nbuffers = 1,
- .modes = { STARPU_RW }
- };
- \endcode
- The first argument is an array that gives
- a description of all the buffers passed in the array starpu_task::handles. The
- size of this array is given by the field starpu_codelet::nbuffers. For
- the sake of genericity, this array contains pointers to the different
- interfaces describing each buffer. In the case of the <b>vector
- interface</b>, the location of the vector (resp. its length) is
- accessible in the starpu_vector_interface::ptr (resp.
- starpu_vector_interface::nx) of this interface. Since the vector is
- accessed in a read-write fashion, any modification will automatically
- affect future accesses to this vector made by other tasks.
- The second argument of the function <c>scal_cpu_func</c> contains a
- pointer to the parameters of the codelet (given in
- starpu_task::cl_arg), so that we read the constant factor from this
- pointer.
- \subsection ExecutionOfVectorScaling Execution of Vector Scaling
- \verbatim
- $ make vector_scal
- cc $(pkg-config --cflags starpu-1.3) vector_scal.c -o vector_scal $(pkg-config --libs starpu-1.3)
- $ ./vector_scal
- 0.000000 3.000000 6.000000 9.000000 12.000000
- \endverbatim
- \section VectorScalingOnAnHybridCPUGPUMachine Vector Scaling on an Hybrid CPU/GPU Machine
- Contrary to the previous examples, the task submitted in this example may not
- only be executed by the CPUs, but also by a CUDA device.
- \subsection DefinitionOfTheCUDAKernel Definition of the CUDA Kernel
- The CUDA implementation can be written as follows. It needs to be compiled with
- a CUDA compiler such as nvcc, the NVIDIA CUDA compiler driver. It must be noted
- that the vector pointer returned by ::STARPU_VECTOR_GET_PTR is here a
- pointer in GPU memory, so that it can be passed as such to the
- kernel call <c>vector_mult_cuda</c>.
- \snippet vector_scal_cuda.c To be included. You should update doxygen if you see this text.
- \subsection DefinitionOfTheOpenCLKernel Definition of the OpenCL Kernel
- The OpenCL implementation can be written as follows. StarPU provides
- tools to compile a OpenCL kernel stored in a file.
- \code{.c}
- __kernel void vector_mult_opencl(int nx, __global float* val, float factor)
- {
- const int i = get_global_id(0);
- if (i < nx)
- {
- val[i] *= factor;
- }
- }
- \endcode
- Contrary to CUDA and CPU, ::STARPU_VECTOR_GET_DEV_HANDLE has to be used,
- which returns a <c>cl_mem</c> (which is not a device pointer, but an OpenCL
- handle), which can be passed as such to the OpenCL kernel. The difference is
- important when using partitioning, see \ref PartitioningData.
- \snippet vector_scal_opencl.c To be included. You should update doxygen if you see this text.
- \subsection DefinitionOfTheMainCode Definition of the Main Code
- The CPU implementation is the same as in the previous section.
- Here is the source of the main application. You can notice that the fields
- starpu_codelet::cuda_funcs and starpu_codelet::opencl_funcs are set to
- define the pointers to the CUDA and OpenCL implementations of the
- task.
- \snippet vector_scal_c.c To be included. You should update doxygen if you see this text.
- \subsection ExecutionOfHybridVectorScaling Execution of Hybrid Vector Scaling
- The Makefile given at the beginning of the section must be extended to
- give the rules to compile the CUDA source code. Note that the source
- file of the OpenCL kernel does not need to be compiled now, it will
- be compiled at run-time when calling the function
- starpu_opencl_load_opencl_from_file().
- \verbatim
- CFLAGS += $(shell pkg-config --cflags starpu-1.3)
- LDLIBS += $(shell pkg-config --libs starpu-1.3)
- CC = gcc
- vector_scal: vector_scal.o vector_scal_cpu.o vector_scal_cuda.o vector_scal_opencl.o
- %.o: %.cu
- nvcc $(CFLAGS) $< -c $@
- clean:
- rm -f vector_scal *.o
- \endverbatim
- \verbatim
- $ make
- \endverbatim
- and to execute it, with the default configuration:
- \verbatim
- $ ./vector_scal
- 0.000000 3.000000 6.000000 9.000000 12.000000
- \endverbatim
- or for example, by disabling CPU devices:
- \verbatim
- $ STARPU_NCPU=0 ./vector_scal
- 0.000000 3.000000 6.000000 9.000000 12.000000
- \endverbatim
- or by disabling CUDA devices (which may permit to enable the use of OpenCL,
- see \ref EnablingOpenCL) :
- \verbatim
- $ STARPU_NCUDA=0 ./vector_scal
- 0.000000 3.000000 6.000000 9.000000 12.000000
- \endverbatim
- */
|