Nathalie Furmento 4 роки тому
батько
коміт
000797f0e3

+ 118 - 52
doc/doxygen/chapters/440_fpga_support.doxy

@@ -17,15 +17,34 @@
 /*! \page FPGASupport FPGA Support
 
 \section Introduction Introduction
-Maxeler provides hardware and software solutions for accelerating computing applications on dataflow engines (DFEs). DFEs are in-house designed accelerators that encapsulate reconfigurable high-end FPGAs at their core and are equipped with large amounts of DDR memory.
-We extend the StarPU task programming library that initially targets heterogeneous architectures to support Field Programmable Gate Array (FPGA). 
-To create <c>StarPU/FPGA</c> applications exploiting DFE configurations, MaxCompiler allows an application to be split into three parts:
 
-- <c>Kernel</c>, which implements the computational components of the application in hardware.
-- <c>Manager configuration</c>, which connects Kernels to the CPU, engine RAM, other Kernels and other DFEs via MaxRing.
-- <c>CPU application</c>, which interacts with the DFEs to read and write data to the Kernels and engine RAM.
-
-The Simple Live CPU interface (SLiC) is Maxeler’s application programming interface for seamless CPU-DFE integration. SLiC allows CPU applications to configure and load a number of DFEs as well as to subsequently schedule and run actions on those DFEs using simple function calls. In StarPU/FPGA applications, we use <c>Dynamic SLiC Interface</c> to exchange data streams between the CPU (Main Memory) and DFE (Local Memory).
+Maxeler provides hardware and software solutions for accelerating
+computing applications on dataflow engines (DFEs). DFEs are in-house
+designed accelerators that encapsulate reconfigurable high-end FPGAs
+at their core and are equipped with large amounts of DDR memory.
+
+We extend the StarPU task programming library that initially targets
+heterogeneous architectures to support Field Programmable Gate Array
+(FPGA).
+
+To create <c>StarPU/FPGA</c> applications exploiting DFE
+configurations, MaxCompiler allows an application to be split into
+three parts:
+
+- <c>Kernel</c>, which implements the computational components of the
+  application in hardware.
+- <c>Manager configuration</c>, which connects Kernels to the CPU,
+  engine RAM, other Kernels and other DFEs via MaxRing.
+- <c>CPU application</c>, which interacts with the DFEs to read and
+  write data to the Kernels and engine RAM.
+
+The Simple Live CPU interface (SLiC) is Maxeler’s application
+programming interface for seamless CPU-DFE integration. SLiC allows
+CPU applications to configure and load a number of DFEs as well as to
+subsequently schedule and run actions on those DFEs using simple
+function calls. In StarPU/FPGA applications, we use <em>Dynamic SLiC
+Interface</em> to exchange data streams between the CPU (Main Memory)
+and DFE (Local Memory).
 
 \section PortingApplicationsToFPGA Porting Applications to FPGA
 
@@ -43,12 +62,22 @@ struct starpu_codelet cl =
 
 \subsection FPGAExample StarPU/FPGA Application
 
-To give you an idea of the interface that we used to exchange data between <c>host</c> (CPU) and <c>FPGA</c> (DFE), here is an example, based on one of the examples of Maxeler (https://trac.version.fz-juelich.de/reconfigurable/wiki/Public). 
-<c>StreamFMAKernel.maxj</c> represents the Java kernel code; it implements a very simple kernel (c=a+b), and <c>Test.c</c> starts it from the <c>fpga_add</c> function; it first sets streaming up from the CPU pointers, triggers execution and waits for the result. The API to interact with DFEs is called <c>SLiC</c> which then also involves the <c> MaxelerOS</c> runtime.
+To give you an idea of the interface that we used to exchange data
+between <c>host</c> (CPU) and <c>FPGA</c> (DFE), here is an example,
+based on one of the examples of Maxeler
+(https://trac.version.fz-juelich.de/reconfigurable/wiki/Public).
 
+<c>StreamFMAKernel.maxj</c> represents the Java kernel code; it
+implements a very simple kernel (<c>c=a+b</c>), and <c>Test.c</c> starts it
+from the <c>fpga_add</c> function; it first sets streaming up from the
+CPU pointers, triggers execution and waits for the result. The API to
+interact with DFEs is called <em>SLiC</em> which then also involves the
+<c>MaxelerOS</c> runtime.
 
-- <c>StreamFMAKernel.maxj</c>: the DFE part is described in the MaxJ programming language which is a Java-based metaprogramming approach.
-\code{.c}
+- <c>StreamFMAKernel.maxj</c>: the DFE part is described in the MaxJ
+  programming language which is a Java-based metaprogramming approach.
+
+\code{.java}
 package tests;
 
 import com.maxeler.maxcompiler.v2.kernelcompiler.Kernel;
@@ -56,11 +85,13 @@ import com.maxeler.maxcompiler.v2.kernelcompiler.KernelParameters;
 import com.maxeler.maxcompiler.v2.kernelcompiler.types.base.DFEType;
 import com.maxeler.maxcompiler.v2.kernelcompiler.types.base.DFEVar;
 
-class StreamFMAKernel extends Kernel {
+class StreamFMAKernel extends Kernel
+{
 
    private static final DFEType type = dfeInt(32);
 
-   protected StreamFMAKernel(KernelParameters parameters) {
+   protected StreamFMAKernel(KernelParameters parameters)
+   {
              super(parameters);
 
 	     DFEVar a = io.input("a", type);
@@ -70,25 +101,27 @@ class StreamFMAKernel extends Kernel {
 	     c = a+b;
 
 	     io.output("output", c, type);
-	}
-
+   }
 }
-
 \endcode
 
-- <c>StreamFMAManager.maxj</c>: is also described in the MaxJ programming language and orchestrates data movement between the host and the DFE.
-\code{.c}
+- <c>StreamFMAManager.maxj</c>: is also described in the MaxJ
+  programming language and orchestrates data movement between the host
+  and the DFE.
+
+\code{.java}
 package tests;
 
 import com.maxeler.maxcompiler.v2.build.EngineParameters;
 import com.maxeler.maxcompiler.v2.managers.custom.blocks.KernelBlock;
 import com.maxeler.platform.max5.manager.Max5LimaManager;
 
-class StreamFMAManager extends Max5LimaManager {
-
+class StreamFMAManager extends Max5LimaManager
+{
 	private static final String kernel_name = "StreamFMAKernel";
 
-	public StreamFMAManager(EngineParameters arg0) {
+	public StreamFMAManager(EngineParameters arg0)
+	{
 		super(arg0);
 		KernelBlock kernel = addKernel(new StreamFMAKernel(makeKernelParameters(kernel_name)));
 		kernel.getInput("a") <== addStreamFromCPU("a");
@@ -96,41 +129,54 @@ class StreamFMAManager extends Max5LimaManager {
 		addStreamToCPU("output") <== kernel.getOutput("output");
 	}
 
-	public static void main(String[] args) {
+	public static void main(String[] args)
+	{
 		StreamFMAManager manager = new StreamFMAManager(new EngineParameters(args));
 		manager.build();
 	}
 }
 \endcode
 
-Once <c>StreamFMAKernel.maxj</c> and <c>StreamFMAManager.maxj</c> are written, there are other steps to do:
+Once <c>StreamFMAKernel.maxj</c> and <c>StreamFMAManager.maxj</c> are
+written, there are other steps to do:
 
 - Building the JAVA program: (for Kernel and Manager (.maxj))
 \verbatim
 $ maxjc -1.7 -cp $MAXCLASSPATH streamfma/
 \endverbatim
 
-- Running the Java program to generate a DFE implementation (a .max file) that can be called from a StarPU/FPGA application and slic headers (.h) for simulation:
+- Running the Java program to generate a DFE implementation (a .max
+  file) that can be called from a StarPU/FPGA application and slic
+  headers (.h) for simulation:
+
 \verbatim
 $ java -XX:+UseSerialGC -Xmx2048m -cp $MAXCLASSPATH:. streamfma.StreamFMAManager DFEModel=MAIA maxFileName=StreamFMA target=DFE_SIM
 \endverbatim
 
-- Build the slic object file (simulation): 
+- Build the slic object file (simulation):
+
 \verbatim
 $ sliccompile StreamFMA.max
 \endverbatim
 
 - <c>Test.c </c>:
-to interface StarPU task-based runtime system with Maxeler's DFE devices, we use the advanced dynamic interface of <c>SLiC</c> in <b>non_blocking</b> mode.  
-Test code must include <c>MaxSLiCInterface.h</c> and <c>MaxFile.h</c>. The .max file contains the bitstream. The StarPU/FPGA application can be written in C, C++, etc.
+
+to interface StarPU task-based runtime system with Maxeler's DFE
+devices, we use the advanced dynamic interface of <em>SLiC</em> in
+<b>non_blocking</b> mode.
+
+Test code must include <c>MaxSLiCInterface.h</c> and <c>MaxFile.h</c>.
+The .max file contains the bitstream. The StarPU/FPGA application can
+be written in C, C++, etc.
+
 \code{.c}
 #include "StreamFMA.h"
 #include "MaxSLiCInterface.h"
 
 void fpga_add(void *buffers[], void *cl_arg)
-{   
+{
     (void)cl_arg;
-    
+
     int *a = (int*) STARPU_VECTOR_GET_PTR(buffers[0]);
     int *b = (int*) STARPU_VECTOR_GET_PTR(buffers[1]);
     int *c = (int*) STARPU_VECTOR_GET_PTR(buffers[2]);
@@ -142,11 +188,11 @@ void fpga_add(void *buffers[], void *cl_arg)
 
     /* set the number of ticks for a kernel */
     max_set_ticks  (act, "StreamFMAKernel", size);
-    
+
     /* send input streams */
-    max_queue_input(act, "a", a, size *sizeof(a[0])); 
+    max_queue_input(act, "a", a, size *sizeof(a[0]));
     max_queue_input(act, "b", b, size*sizeof(b[0]));
-    
+
     /* store output stream */
     max_queue_output(act,"output", c, size*sizeof(c[0]));
 
@@ -158,7 +204,6 @@ void fpga_add(void *buffers[], void *cl_arg)
 
     printf("*** wait for the actions on DFE to complete *** \n");
     max_wait(run0);
-     
   }
 
   static struct starpu_codelet cl =
@@ -172,14 +217,13 @@ void fpga_add(void *buffers[], void *cl_arg)
 
 int main(int argc, char **argv)
 {
- 
     ...
 
     /* Implementation of a maxfile */
     max_file_t *maxfile = StreamFMA_init();
 
     /* Implementation of an engine */
-    max_engine_t *engine = max_load(maxfile, "*"); 
+    max_engine_t *engine = max_load(maxfile, "*");
 
     starpu_init(NULL);
 
@@ -192,19 +236,26 @@ int main(int argc, char **argv)
 
     /* unload and deallocate an engine obtained by way of max_load */
     max_unload(engine);
-    
+
     return 0;
 }
 \endcode
 
-To write the StarPU/FPGA application: first, the programmer must describe the codelet using StarPU’s C API. This codelet provides both a CPU implementation and an FPGA one. It also specifies that the task has two inputs and one output through the <c>nbuffers</c> and <c>modes</c> attributes.
+To write the StarPU/FPGA application: first, the programmer must
+describe the codelet using StarPU’s C API. This codelet provides both
+a CPU implementation and an FPGA one. It also specifies that the task
+has two inputs and one output through the starpu_codelet::nbuffers and
+starpu_codelet::modes attributes.
 
-<c>fpga_add</c> function is the name of the FPGA implementation and is mainly divided in four steps:
+<c>fpga_add</c> function is the name of the FPGA implementation and is
+mainly divided in four steps:
 
 - Init actions to be run on DFE.
 - Add data to an input stream for an action.
 - Add data storage space for an output stream.
-- Run actions on DFE in <b>non_blocking</b> mode; a non-blocking call returns immediately, allowing the calling code to do more CPU work in parallel while the actions are run.
+- Run actions on DFE in <b>non_blocking</b> mode; a non-blocking call
+  returns immediately, allowing the calling code to do more CPU work
+  in parallel while the actions are run.
 - Wait for the actions to complete.
 
 In the <c>main</c> function, there are four important steps:
@@ -214,31 +265,46 @@ In the <c>main</c> function, there are four important steps:
 - Free actions.
 - Unload and deallocate the DFE.
 
-The rest of the application (data registration, task submission, etc.) is as usual with StarPU
+The rest of the application (data registration, task submission, etc.)
+is as usual with StarPU.
 
 \subsection FPGADataTransfers Data Transfers in StarPU/FPGA Applications
 
-The communication between the host and the DFE is done through the <c>Dynamic advance interface</c> to exchange data between the main memory and the local memory of the DFE.
-For instant, we use \ref STARPU_MAIN_RAM to send and store data to/from DFE's local memory. However, we aim to use a multiplexer to choose which memory node we will use to read/write data. So, the user can tell that the computational kernel will take data from the main memory or DFE's local memory for example.
+The communication between the host and the DFE is done through the
+<em>Dynamic advance interface</em> to exchange data between the main
+memory and the local memory of the DFE.
+
+For the moment, we use \ref STARPU_MAIN_RAM to send and store data
+to/from DFE's local memory. However, we aim to use a multiplexer to
+choose which memory node we will use to read/write data. So, the user
+can tell that the computational kernel will take data from the main
+memory or DFE's local memory for example.
 
-In starPU applications, When \ref starpu_codelet::specific_nodes is 1, this specifies the memory nodes where each data should be sent to for task execution.
-  
+In StarPU applications, when \ref starpu_codelet::specific_nodes is
+set to 1, this specifies the memory nodes where each data should be
+sent to for task execution.
 
 \subsection FPGAConfiguration FPGA Configuration
 
-To configure StarPU with FPGA accelerators, we can enable <c>FPGA</c> through the \c configure option <b>"--with-fpga"</b>.
+To configure StarPU with FPGA accelerators, we can enable <c>FPGA</c>
+through the \c configure option \ref with-fpga "--with-fpga".
+
+Compiling and installing StarPU/FPGA application is done following the
+standard procedure:
 
-Compiling and installing StarPU/FPGA application is done following the standard procedure:
 \verbatim
 $ make
 $ make install
 \endverbatim
 
-
 \subsection FPGALaunchingprograms  Launching Programs: Simulation
 
-Maxeler provides a simple tutorial to use MaxCompiler (https://trac.version.fz-juelich.de/reconfigurable/wiki/Public). Running the Java program to generate maxfile and slic headers (hardware) on Maxeler's DFE device, takes a VERY long time, approx. 2 hours even for this very small example. That's why we use the simulation.  
-
+Maxeler provides a simple tutorial to use MaxCompiler
+(https://trac.version.fz-juelich.de/reconfigurable/wiki/Public).
+Running the Java program to generate maxfile and slic headers
+(hardware) on Maxeler's DFE device, takes a VERY long time, approx. 2
+hours even for this very small example. That's why we use the
+simulation.
 
 - To start the simulation on Maxeler's DFE device:
 \verbatim
@@ -256,8 +322,8 @@ cores by setting the \ref STARPU_NCPU environment variable to 0.
 \verbatim
 $ STARPU_NCPU=0 ./StreamFMA
 \endverbatim
- 
-- To stop the simulation 
+
+- To stop the simulation
 \verbatim
 $ maxcompilersim -c LIMA -n StreamFMA stop
 \endverbatim

+ 8 - 0
doc/doxygen/chapters/510_configure_options.doxy

@@ -370,6 +370,14 @@ the macro ::STARPU_MAXNODES. Reducing it allows to considerably reduce memory
 used by StarPU data structures.
 </dd>
 
+<dt>--with-fpga=<c>dir</c></dt>
+<dd>
+\anchor with-fpga
+\addindex __configure__--with-fpga
+Enable the FPGA driver support, and optionally specify the location of
+the FPGA library.
+</dd>
+
 </dl>
 
 \section ExtensionConfiguration Extension Configuration