/* * This file is part of the StarPU Handbook. * Copyright (C) 2009--2011 Universit@'e de Bordeaux * Copyright (C) 2010, 2011, 2012, 2013, 2015, 2017 CNRS * Copyright (C) 2011, 2012 INRIA * See the file version.doxy for copying conditions. */ /*! \defgroup API_CUDA_Extensions CUDA Extensions \def STARPU_USE_CUDA \ingroup API_CUDA_Extensions This macro is defined when StarPU has been installed with CUDA support. It should be used in your code to detect the availability of CUDA as shown in \ref FullSourceCodeVectorScal. \def STARPU_MAXCUDADEVS \ingroup API_CUDA_Extensions This macro defines the maximum number of CUDA devices that are supported by StarPU. \fn cudaStream_t starpu_cuda_get_local_stream(void) \ingroup API_CUDA_Extensions Return the current worker’s CUDA stream. StarPU provides a stream for every CUDA device controlled by StarPU. This function is only provided for convenience so that programmers can easily use asynchronous operations within codelets without having to create a stream by hand. Note that the application is not forced to use the stream provided by starpu_cuda_get_local_stream() and may also create its own streams. Synchronizing with cudaThreadSynchronize() is allowed, but will reduce the likelihood of having all transfers overlapped. \fn const struct cudaDeviceProp *starpu_cuda_get_device_properties(unsigned workerid) \ingroup API_CUDA_Extensions Return a pointer to device properties for worker \p workerid (assumed to be a CUDA worker). \fn void starpu_cuda_report_error(const char *func, const char *file, int line, cudaError_t status) \ingroup API_CUDA_Extensions Report a CUDA error. \def STARPU_CUDA_REPORT_ERROR(status) \ingroup API_CUDA_Extensions Calls starpu_cuda_report_error(), passing the current function, file and line position. \fn int starpu_cuda_copy_async_sync(void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t ssize, cudaStream_t stream, enum cudaMemcpyKind kind) \ingroup API_CUDA_Extensions Copy \p ssize bytes from the pointer \p src_ptr on \p src_node to the pointer \p dst_ptr on \p dst_node. The function first tries to copy the data asynchronous (unless \p stream is NULL). If the asynchronous copy fails or if \p stream is NULL, it copies the data synchronously. The function returns -EAGAIN if the asynchronous launch was successfull. It returns 0 if the synchronous copy was successful, or fails otherwise. \fn void starpu_cuda_set_device(unsigned devid) \ingroup API_CUDA_Extensions Calls cudaSetDevice(\p devid) or cudaGLSetGLDevice(\p devid), according to whether \p devid is among the field starpu_conf::cuda_opengl_interoperability. \fn void starpu_cublas_init(void) \ingroup API_CUDA_Extensions This function initializes CUBLAS on every CUDA device. The CUBLAS library must be initialized prior to any CUBLAS call. Calling starpu_cublas_init() will initialize CUBLAS on every CUDA device controlled by StarPU. This call blocks until CUBLAS has been properly initialized on every device. \fn void starpu_cublas_set_stream(void) \ingroup API_CUDA_Extensions This function sets the proper CUBLAS stream for CUBLAS v1. This must be called from the CUDA codelet before calling CUBLAS v1 kernels, so that they are queued on the proper CUDA stream. When using one thread per CUDA worker, this function does not do anything since the CUBLAS stream does not change, and is set once by starpu_cublas_init(). \fn cublasHandle_t starpu_cublas_get_local_handle(void) \ingroup API_CUDA_Extensions This function returns the CUBLAS v2 handle to be used to queue CUBLAS v2 kernels. It is properly initialized and configured for multistream by starpu_cublas_init(). \fn void starpu_cublas_shutdown(void) \ingroup API_CUDA_Extensions This function synchronously deinitializes the CUBLAS library on every CUDA device. \fn void starpu_cublas_report_error(const char *func, const char *file, int line, int status) \ingroup API_CUDA_Extensions Report a cublas error. \def STARPU_CUBLAS_REPORT_ERROR(status) \ingroup API_CUDA_Extensions Calls starpu_cublas_report_error(), passing the current function, file and line position. \fn void starpu_cusparse_init(void) \ingroup API_CUDA_Extensions Calling starpu_cusparse_init() will initialize CUSPARSE on every CUDA device controlled by StarPU. This call blocks until CUSPARSE has been properly initialized on every device. \fn cusparseHandle_t starpu_cusparse_get_local_handle(void) \ingroup API_CUDA_Extensions This function returns the CUSPARSE handle to be used to queue CUSPARSE kernels. It is properly initialized and configured for multistream by starpu_cusparse_init(). \fn void starpu_cusparse_shutdown(void) \ingroup API_CUDA_Extensions This function synchronously deinitializes the CUSPARSE library on every CUDA device. */