cuda_extensions.doxy 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2010-2013,2015,2017 CNRS
  4. * Copyright (C) 2009-2011,2014,2017 Université de Bordeaux
  5. * Copyright (C) 2011-2012 Inria
  6. *
  7. * StarPU is free software; you can redistribute it and/or modify
  8. * it under the terms of the GNU Lesser General Public License as published by
  9. * the Free Software Foundation; either version 2.1 of the License, or (at
  10. * your option) any later version.
  11. *
  12. * StarPU is distributed in the hope that it will be useful, but
  13. * WITHOUT ANY WARRANTY; without even the implied warranty of
  14. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  15. *
  16. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  17. */
  18. /*! \defgroup API_CUDA_Extensions CUDA Extensions
  19. \def STARPU_USE_CUDA
  20. \ingroup API_CUDA_Extensions
  21. This macro is defined when StarPU has been installed with CUDA
  22. support. It should be used in your code to detect the availability of
  23. CUDA as shown in \ref FullSourceCodeVectorScal.
  24. \def STARPU_MAXCUDADEVS
  25. \ingroup API_CUDA_Extensions
  26. This macro defines the maximum number of CUDA devices that are
  27. supported by StarPU.
  28. \fn cudaStream_t starpu_cuda_get_local_stream(void)
  29. \ingroup API_CUDA_Extensions
  30. Return the current worker’s CUDA stream. StarPU
  31. provides a stream for every CUDA device controlled by StarPU. This
  32. function is only provided for convenience so that programmers can
  33. easily use asynchronous operations within codelets without having to
  34. create a stream by hand. Note that the application is not forced to
  35. use the stream provided by starpu_cuda_get_local_stream() and may also
  36. create its own streams. Synchronizing with <c>cudaThreadSynchronize()</c> is
  37. allowed, but will reduce the likelihood of having all transfers
  38. overlapped.
  39. \fn const struct cudaDeviceProp *starpu_cuda_get_device_properties(unsigned workerid)
  40. \ingroup API_CUDA_Extensions
  41. Return a pointer to device properties for worker \p workerid (assumed to be a CUDA worker).
  42. \fn void starpu_cuda_report_error(const char *func, const char *file, int line, cudaError_t status)
  43. \ingroup API_CUDA_Extensions
  44. Report a CUDA error.
  45. \def STARPU_CUDA_REPORT_ERROR(status)
  46. \ingroup API_CUDA_Extensions
  47. Calls starpu_cuda_report_error(), passing the current function, file and line position.
  48. \fn int starpu_cuda_copy_async_sync(void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t ssize, cudaStream_t stream, enum cudaMemcpyKind kind)
  49. \ingroup API_CUDA_Extensions
  50. Copy \p ssize bytes from the pointer \p src_ptr on \p src_node
  51. to the pointer \p dst_ptr on \p dst_node. The function first tries to
  52. copy the data asynchronous (unless \p stream is <c>NULL</c>). If the
  53. asynchronous copy fails or if \p stream is <c>NULL</c>, it copies the
  54. data synchronously. The function returns <c>-EAGAIN</c> if the
  55. asynchronous launch was successfull. It returns 0 if the synchronous
  56. copy was successful, or fails otherwise.
  57. \fn void starpu_cuda_set_device(unsigned devid)
  58. \ingroup API_CUDA_Extensions
  59. Calls <c>cudaSetDevice(\p devid)</c> or <c>cudaGLSetGLDevice(\p devid)</c>,
  60. according to whether \p devid is among the field
  61. starpu_conf::cuda_opengl_interoperability.
  62. \fn void starpu_cublas_init(void)
  63. \ingroup API_CUDA_Extensions
  64. This function initializes CUBLAS on every CUDA device. The
  65. CUBLAS library must be initialized prior to any CUBLAS call. Calling
  66. starpu_cublas_init() will initialize CUBLAS on every CUDA device
  67. controlled by StarPU. This call blocks until CUBLAS has been properly
  68. initialized on every device.
  69. \fn void starpu_cublas_set_stream(void)
  70. \ingroup API_CUDA_Extensions
  71. This function sets the proper CUBLAS stream for CUBLAS v1. This must be called from the CUDA
  72. codelet before calling CUBLAS v1 kernels, so that they are queued on the proper
  73. CUDA stream. When using one thread per CUDA worker, this function does not
  74. do anything since the CUBLAS stream does not change, and is set once by
  75. starpu_cublas_init().
  76. \fn cublasHandle_t starpu_cublas_get_local_handle(void)
  77. \ingroup API_CUDA_Extensions
  78. This function returns the CUBLAS v2 handle to be used to queue CUBLAS v2
  79. kernels. It is properly initialized and configured for multistream by
  80. starpu_cublas_init().
  81. \fn void starpu_cublas_shutdown(void)
  82. \ingroup API_CUDA_Extensions
  83. This function synchronously deinitializes the CUBLAS library on
  84. every CUDA device.
  85. \fn void starpu_cublas_report_error(const char *func, const char *file, int line, int status)
  86. \ingroup API_CUDA_Extensions
  87. Report a cublas error.
  88. \def STARPU_CUBLAS_REPORT_ERROR(status)
  89. \ingroup API_CUDA_Extensions
  90. Calls starpu_cublas_report_error(), passing the current
  91. function, file and line position.
  92. \fn void starpu_cusparse_init(void)
  93. \ingroup API_CUDA_Extensions
  94. Calling starpu_cusparse_init() will initialize CUSPARSE on every CUDA device
  95. controlled by StarPU. This call blocks until CUSPARSE has been properly
  96. initialized on every device.
  97. \fn cusparseHandle_t starpu_cusparse_get_local_handle(void)
  98. \ingroup API_CUDA_Extensions
  99. This function returns the CUSPARSE handle to be used to queue CUSPARSE
  100. kernels. It is properly initialized and configured for multistream by
  101. starpu_cusparse_init().
  102. \fn void starpu_cusparse_shutdown(void)
  103. \ingroup API_CUDA_Extensions
  104. This function synchronously deinitializes the CUSPARSE library on
  105. every CUDA device.
  106. */