400_python.doxy 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2020 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  4. *
  5. * StarPU is free software; you can redistribute it and/or modify
  6. * it under the terms of the GNU Lesser General Public License as published by
  7. * the Free Software Foundation; either version 2.1 of the License, or (at
  8. * your option) any later version.
  9. *
  10. * StarPU is distributed in the hope that it will be useful, but
  11. * WITHOUT ANY WARRANTY; without even the implied warranty of
  12. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  13. *
  14. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  15. */
  16. /*! \page PythonInterface Python Interface
  17. A Python interface is also provided to allow the use of StarPU for Python users. This interface caters for the needs of all users accustomed to Python language who want a more concise and easily operability StarPU interface.
  18. The need to exploit the computing power of the available CPUs and GPUs, while relieving them from the need to specially adapt their programs to the target machine and processing units is present in most programs regardless the programming language. Providing a Python interface, in addition to the existing C interface, will extend the use of StarPU to Python users and thus support them in this perpetual quest for optimization.
  19. The Python interface provides an interface to some of the main functionalities of StarPU. All the functionalities of the C API are not provided, however, new functions especially adapted to Python have been added in this interface.
  20. You can simply import the StarPU module and use the provided functions of StarPU in your own Python library.
  21. \section ImplementingStarPUInPython Implementing StarPU in Python
  22. The StarPU module should be imported in any Python code wanting to use the StarPU Python interface.
  23. \code{.py}
  24. >>> import starpu
  25. \endcode
  26. \subsection SubmittingTasks Submitting Tasks
  27. One of the most important functionality in StarPU is to submit tasks. Unlike the original C interface, Python interface simplifies the use of this function. It is more convenient for Python users to call the function directly without requiring more preparations. However, this simplification does not affect the final implementation.
  28. The function task_submit(options)(func, *args, **kwargs) is used to submit tasks to StarPU in Python interface. The task that you will submit may be a function, and in the second parentheses you need to set parameters of task_submit to your function name and its arguments. When you want to let StarPU make optimizations for your program, you should submit all tasks and StarPU does smart scheduling to manage tasks. Submitted tasks will not be executed immediately, and you can only get the return value until the task has been executed.
  29. In the first parentheses, you can set options which must be specified using keywords. If you set none of options, you still need to keep the parentheses, and options will be set with their default values. All options are introduced as follow:
  30. \subsubsection name name
  31. (string, default: None)
  32. Optional name of the task. This can be useful for debugging purposes.
  33. \subsubsection synchronous synchronous
  34. (unsigned, default: 0)
  35. If this flag is set, the function task_submit is blocking and returns only when the task has been executed (or if no worker is able to process the task). Otherwise, task_submit returns immediately.
  36. \subsubsection priority priority
  37. (int, default: 0)
  38. This field indicates a level of priority for the task. This is an integer value that must be set between the return values of the function starpu.sched_get_min_priority() for the least important tasks, and that of the function starpu.sched_get_max_priority() for the most important tasks (included). Default priority is always defined as 0 in order to allow static task initialization. Scheduling strategies that take priorities into account can use this parameter to take better scheduling decisions, but the scheduling policy may also ignore it.
  39. \subsubsection color color
  40. (unsigned, default: None)
  41. Setting color of the task to be used in dag.dot.
  42. \subsubsection flops flops
  43. (double, default: None)
  44. This can be set to the number of floating points operations that the task will have to achieve. This is useful for easily getting GFlops curves from the function starpu.perfmodel_plot, and for the hypervisor load balancing.
  45. \subsubsection perfmodel perfmodel
  46. (string, default: None)
  47. Setting a symbol for a function and its performance model will be saved in the file named by the symbol. Ideally, the same function should use the same symbol. After the task is executed, calling the function perfmodel_plot by giving the symbol of perfmodel to view the performance curve.
  48. \subsection ReturningFutureObject Returning Future Object
  49. In order to realize asynchronous frameworks, the <c>task_submit</c> function will return a Future object. This is an extended use for the Python interface. A Future represents an eventual result of an asynchronous operation. It is an awaitable object, Coroutines can await on Future objects until they either have a result or an exception set, or until they are canceled.
  50. The asyncio module should be imported in this case.
  51. \code{.py}
  52. >>> import asyncio
  53. \endcode
  54. When submitting a task to StarPU, the task will not be executed immediately, but with this Future object, you do not need to wait for the eventual result but to perform other operations during task execution. When the return value is ready, awaiting this Future object, then you can get the return value.
  55. Here is an example to show how to submit a task in the most basic way.
  56. Suppose that there is a function:
  57. \code{.py}
  58. >>> def add(a, b):
  59. ... print("The result is ready!")
  60. ... return a+b
  61. \endcode
  62. Then submitting this function as a task to StarPU. After calling task_submit function to create a Future object <c>fut</c>, we perform awaiting until receiving a signal that the result is ready. Then we get the eventual result.
  63. \code{.py}
  64. >>> fut = starpu.task_submit(perfmodel="add")(add, 1, 2)
  65. The result is ready!
  66. >>> res = await fut
  67. >>> res
  68. 3
  69. \endcode
  70. Special attention is needed in above example that we use the argument <c>-m asyncio</c> (available in Python version > 3.8) when executing the program, then we can use <c>await</c> directly instead of <c>asyncio.run()</c>. In addition, this argument only applies to execute programs in the command line. Therefore, if you want to write your program in Python script file or you only have an old version of Python, you need to await the Future in an asyncio function and use <c>asyncio.run()</c> to execute the function, like this:
  71. \code{.py}
  72. import starpu
  73. import asyncio
  74. def add(a, b):
  75. return a+b
  76. async def main():
  77. fut = starpu.task_submit(perfmodel="add")(add, 1, 2)
  78. res = await fut
  79. print("The result of function is", res)
  80. asyncio.run(main())
  81. \endcode
  82. Execution:
  83. \verbatim
  84. The result of function is 3
  85. \endverbatim
  86. You can also use decorator starpu.delayed to wrap your own function. The operation effect is the same as the previous example. However you can call your function directly, and the function will be submitted to StarPU as a task automatically with returning a Future object. Once the result is ready, you can perform awaiting to get it.
  87. \code{.py}
  88. >>> @starpu.delayed
  89. ... def add_deco(a, b):
  90. ... print("The result is ready!")
  91. ... return a+b
  92. ...
  93. >>> fut = add_deco(1, 2)
  94. The result is ready!
  95. >>> res = await fut
  96. >>> res
  97. 3
  98. \endcode
  99. If you want to set options when using decorator, you can just add parameters in starpu.delayed, like this:
  100. \code{.py}
  101. >>> @starpu.delayed(name="add", color=2, perfmodel="add_deco")
  102. ... def add_deco(a, b):
  103. ... print("The result is ready!")
  104. ... return a+b
  105. ...
  106. >>> fut = add_deco(1, 2)
  107. The result is ready!
  108. >>> res = await fut
  109. >>> res
  110. 3
  111. \endcode
  112. The Future object can be also used for the next step calculation even you do not get the task result. The eventual result will be awaited until the Future has a result.
  113. In this example, after submitting the first task, a Future object <c>fut1</c> is created, and it is used in the second task as one of arguments. During the first task is executed, the second task is submitted even we do not have the first return value. Then we receive the signal that the second result is ready right after the signal that the first result is ready. We can perform awaiting to get the eventual result.
  114. \code{.py}
  115. >>> import asyncio
  116. >>> import starpu
  117. >>> import time
  118. >>> def add(a, b):
  119. ... time.sleep(10)
  120. ... print("The first result is ready!")
  121. ... return a+b
  122. ...
  123. >>> def sub(x, a):
  124. ... print("The second result is ready!")
  125. ... return x-a
  126. ...
  127. >>> fut1 = starpu.task_submit(perfmodel="add")(add, 1, 2)
  128. >>> fut2 = starpu.task_submit(perfmodel="sub")(sub, fut1, 1)
  129. >>> The first result is ready!
  130. The second result is ready!
  131. >>> res = await fut2
  132. >>> res
  133. 2
  134. \endcode
  135. \section ImitatingJoblibLibrary Imitating Joblib Library
  136. StarPU Python interface also provides parallel computing for loops using multiprocessing. The main idea is to imitate <a href="https://joblib.readthedocs.io/en/latest/index.html">Joblib Library</a>. Writing the code to be executed as a generator expression, and submitting it as task to StarPU parallel.
  137. \code{.py}
  138. >>> from math import log10
  139. >>> [log10(10 ** i) for i in range(10)]
  140. [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  141. \endcode
  142. In order to spread it over several CPUs, you can use starpu.joblib module, and call Parallel class:
  143. \code{.py}
  144. >>> from math import log10
  145. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(log10)(10**i)for i in range(10))
  146. [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  147. \endcode
  148. Or you can create an object of Parallel class, and then call starpu.joblib.delayed to execute the function.
  149. \code{.py}
  150. >>> from math import log10
  151. >>> parallel=starpu.joblib.Parallel(mode="normal", n_jobs=2)
  152. >>> parallel(starpu.joblib.delayed(log10)(10**i)for i in range(10))
  153. [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  154. \endcode
  155. You can also generate a list of functions instead of a generator expression, and submit it as task to StarPU parallel.
  156. \code{.py}
  157. #generate a list to store functions
  158. g_func=[]
  159. #function no input no output print hello world
  160. def hello():
  161. print ("Example 1: Hello, world!")
  162. g_func.append(starpu.joblib.delayed(hello)())
  163. #function has 2 int inputs and 1 int output
  164. def multi(a, b):
  165. res_multi = a*b
  166. print("Example 2: The result of ",a,"*",b,"is",res_multi)
  167. return res_multi
  168. g_func.append(starpu.joblib.delayed(multi)(2, 3))
  169. #function has 4 float inputs and 1 float output
  170. def add(a, b, c, d):
  171. res_add = a+b+c+d
  172. print("Example 3: The result of ",a,"+",b,"+",c,"+",d,"is",res_add)
  173. return res_add
  174. g_func.append(starpu.joblib.delayed(add)(1.2, 2.5, 3.6, 4.9))
  175. #function has 2 int inputs 1 float input and 1 float output 1 int output
  176. def sub(a, b, c):
  177. res_sub1 = a-b-c
  178. res_sub2 = a-b
  179. print ("Example 4: The result of ",a,"-",b,"-",c,"is",res_sub1,"and the result of",a,"-",b,"is",res_sub2)
  180. return res_sub1, res_sub2
  181. g_func.append(starpu.joblib.delayed(sub)(6, 2, 5.9))
  182. #input is iterable function list
  183. starpu.joblib.Parallel(mode="normal", n_jobs=2)(g_func)
  184. \endcode
  185. Execution:
  186. \verbatim
  187. Example 1: Hello, world!
  188. Example 2: The result of 2 * 3 is 6
  189. Example 3: The result of 1.2 + 2.5 + 3.6 + 4.9 is 12.200000000000001
  190. Example 4: The result of 6 - 2 - 5.9 is -1.9000000000000004 and the result of 6 - 2 is 4
  191. \endverbatim
  192. When you want to apply parallel computing for a function which contains arrays, for example:
  193. \code{.py}
  194. >>> def multi_array(a, b):
  195. ... for i in range(len(a)):
  196. ... a[i]=a[i]*b[i]
  197. ... return a
  198. \endcode
  199. You should provide either a Numpy array or a generator for the argument of starpu.joblib.delayed.
  200. \code{.py}
  201. >>> import numpy as np
  202. >>> A=np.arange(10)
  203. >>> B=np.arange(10, 20, 1)
  204. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(multi_array)((i for i in A), (j for j in B)))
  205. [0, 11, 24, 39, 56, 75, 96, 119, 144, 171]
  206. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(multi_array)(A, B))
  207. [0, 11, 24, 39, 56, 75, 96, 119, 144, 171]
  208. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(multi_array)(A, (j for j in B)))
  209. [0, 11, 24, 39, 56, 75, 96, 119, 144, 171]
  210. \endcode
  211. The above three writing methods are equivalent and their execution time are very close.
  212. Of course, you can also provide a scalar, but not with the generator expression, for example:
  213. \code{.py}
  214. >>> import numpy as np
  215. >>> def scal(a, t):
  216. ... for i in range(len(t)):
  217. ... t[i]=t[i]*a
  218. ... return t
  219. >>> A=np.arange(10)
  220. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(scal)(2,A))
  221. [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  222. >>> starpu.joblib.Parallel(mode="normal", n_jobs=2)(starpu.joblib.delayed(scal)(2, (i for i in A)))
  223. [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  224. \endcode
  225. \subsection ParallelParameters Parallel Parameters
  226. Without setting options of function task_submit, starpu.joblib.Parallel also provides some own parameters:
  227. \subsubsection mode mode
  228. (string, default: "normal")
  229. You need to choose the mode between <c>normal</c> and <c>future</c>. As in the previous example, with <c>normal</c> mode, you can call starpu.joblib.Parallel directly without using asyncio module and you will get the result when the task is executed. With <c>future</c> mode, when you call starpu.joblib.Parallel, you will get a Future object as return value. Here if you set another parameter <c>end_msg</c>, you will receive a signal with this message that the result is ready, then you can perform awaiting to get the eventual result. The asyncio module should be imported in this case.
  230. \code{.py}
  231. >>> import starpu
  232. >>> import asyncio
  233. >>> from math import log10
  234. >>> fut = starpu.joblib.Parallel(mode="future", n_jobs=3, end_msg="The result is ready!")(starpu.joblib.delayed(log10)(10**i)for i in range(10))
  235. >>> The result is ready! <_GatheringFuture finished result=[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]>
  236. >>> await fut
  237. [[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
  238. \endcode
  239. \subsubsection end_msg end_msg
  240. (string, default: None)
  241. As we introduced in the previous section, this parameter can be set with a prompt message to remind you that the task is executed and the result is ready, then you can perform awaiting and get the eventual result. If you do not set this parameter, the default value is None, and you will not receive any prompt message, but you still can perform awaiting and get the eventual result.
  242. \subsubsection n_jobs n_jobs
  243. (int, default: None)
  244. You need to set the number of CPUs which is used for parallel computing. Thus for n_jobs=2, 2 CPUs are used. If 1 is given, no parallel computing. For n_jobs below 0, (n_cpus+1+n_jobs) CPUs are used. Thus for n_jobs=-2, all CPUs but one are used.
  245. In the following example, for the function log10 (i+1) for i in range(N), we set the performodel symbol to "log", and we submit the task in turn when N=10, 20, ..., 100, 200, ..., 1000, 2000, ..., 10000, 2000, ..., 100000,200000, ..., 1000000, 2000000, ..., 9000000.
  246. \code{.py}
  247. >>> from math import log10
  248. >>> for x in [10, 100, 1000, 10000, 100000, 1000000]:
  249. ... for N in range(x, x*10, x):
  250. ... starpu.joblib.Parallel(mode="normal", n_jobs=2, perfmodel="log")(starpu.joblib.delayed(log10)(i+1)for i in range(N))
  251. \endcode
  252. You can call the function perfmodel_plot by giving the symbol of perfmodel to view the performance curve.
  253. \code{.py}
  254. starpu.perfmodel_plot(perfmodel="log")
  255. \endcode
  256. The performance curve of this example will be shown as:
  257. \image html starpu_log.png
  258. \image latex starpu_log.eps "" width=\textwidth
  259. */