FSCL Kernel Programming
With the compiler and the object-model provided by the FSCL.Compiler project you can write OpenCL kernels as F# functions, static/instance methods and lambdas. This page gives an overview on kernel programming in FSCL.
Basic example: Vector Addition
The most simple example of kernel programming is very likely parallel vector addition, where each thread (known as work-item in OpenCL) sums the matching elements of the two input vectors whose index is determined by the thread id.
1: 2: 3: 4: 5: 6: 7: 8: |
|
Every FSCL kernel is characterized by an additional parameter of type WorkItemInfo
(not necessarily the last one) that is used from within the kernel body
to retrieve all the information related to the work-items space (global/local id of the thread, global/local thread count, work-items space rank, etc.).
In addition, every kernel must be marked with [<ReflectedDefinition>]
attribute to enable the compiler to inspect the AST of its body.
A more complex example: Sobel Filter
The FSCL compiler library exposes an object-model that allows to write every possible OpenCL kernel in F#. In particular, all the OpenCL built-in math/vector data/geometric functions are available to be used inside kernels, as like as vector data-types (e.g. float4, int3) and parameter qualifiers (e.g. __local_, __constant). The Image subset of OpenCL has not been ported to FSCL yet, but it will be very soon.
The following example shows some of these features applied to the Sobel filter algorithm optimised for GPU execution.
In particular, we use vector data-types (float4
, uchar4
), we perform vector-types conversions (ToFloat4()
, ToUChar4()
)
and we use built-in OpenCL math functions, such as float4.hypot()
.
An important aspect to note is that the function input are 2D arrays.
In fact, while in OpenCL C, kernel inputs are restricted to flat 1D arrays, FSCL allows to work with data of type Array2D
and Array3D
. The compiler
will automatically flat every istance of those types and appropriately manipulate the indexes used to access it.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: |
|
Another complex example: Matrix Multiplication
Matrix multiplication optimised for GPU execution is another example that shows how OpenCL programming
constructs and built-in functions are mapped into F#.
Here, the kernel uses a global property BLOCK_SIZE
marked with [<ReflectedDefinition>]
. Whenever a kernel
references a reflected property, the compiler produces an appropriate #define
in the OpenCL source (in the particular case #define BLOCK_SIZE 16
).
The example also shows how to declare local variables (i.e. data shares among the threads in a work group) inside kernels, that is
by wrapping Array.zeroCreate
calls in the function local()
.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: |
|
In OpenCL there are two ways to share data among the threads in a group. The first is by declaring local variables inside the kernel body as shown in the previous example, the second is by using parameters qualified with local. This last way allows to establish the size of the local data dynamically.
Parameter qualifiers are mapped to FSCL as .NET custom attributes. Given this, we may rewrite the example above, lifting the local declarations from the kernel body and adding two local parameters as follows:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: |
|
The attribute AddressSpace
is only one of the many provided by FSCL to add meta-information to kernels and kernel parameters.
For additional information about them, see Dynamic Metadata Tutorial.
High-level constructs example: Vector Addition with return type
While enabling to code "classic" OpenCL kernels in F#, FSCL gives the chance to employ additional .NET/F# programming constructs and data-types that are generally not supported in OpenCL.
The most important, especially from the kernel composition point of view, is the ability for and FSCL kernel to return a value. In OpenCL, kernels are forced to return void, which is a constrain respected in all the previous examples. Nevertheless, the FSCL compiler is able to transform kernel returning non-void values into legal OpenCL kernels, replacing the returned variable (it must be a variable) with an additional kernel parameter whose purpose is to be a container for the output data.
We can exploit this feature and rewrite our first example. The following code shows the two versions of the vector addition kernel, semantically equivalent, where the second is employing the FSCL kernel-return-types feature.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
|
FSCL is supporting other kind of high-level constructs and data-types, such as structs, records and reference cells. Reference cells are particularly interesting (and expressive) whenever a kernel output is a singleton (1-element) array. For example, consider a kernel that executes a computation and eventually produces a scalar value as output. Generally, the task of writing this value to the output buffer is performed by a specific thread (often the first one), so that the kernel code looks like the following.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
In such a case, you can use a reference cell in place of the output array, as shown below.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
Utility functions
When programming a kernel, you're not fored to encapsulate the whole code in a single kernel. Kernels can leverage on utility functions to performs some computations or well defined tasks. For example, in the vector addition sample we may put the operation to be applied to the matching elements of the two input arrays in a separate function. The OpenCL source produced will contain both the kernel and the utility function definitions.
1: 2: 3: 4: 5: 6: 7: 8: |
|
FSCL kernels as lambdas
FSCL kernels can be also expressed using the lambdas. For example, instad of defining a function, we may write the vector addition kernel as follows. While the FSCL Compiler is still able to produce the appropriate kernel code in such a case, this time the name of the kernel in the OpenCL source produced is automatically generated (it is no more VectorAdd ).
1: 2: 3: |
|
Using collection functions
In addition to custom FSCL kernels, programmers can compile to OpenCL references and calls to Array
collection functions, such as Array.sum
, Array.map2
and Array.reduce
.
In such a case, the kernel code is not specified by the programmer but produced automatically by the compiler given the intrinsic semantic of those functions.
For more information about the kernel source produced from collection functions, see Compiler Interface Tutorial.
type ReflectedDefinitionAttribute =
inherit Attribute
new : unit -> ReflectedDefinitionAttribute
Full name: Microsoft.FSharp.Core.ReflectedDefinitionAttribute
--------------------
new : unit -> ReflectedDefinitionAttribute
Full name: KernelProgrammingTutorial.VectorAdd
val float32 : value:'T -> float32 (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float32
--------------------
type float32 = System.Single
Full name: Microsoft.FSharp.Core.float32
--------------------
type float32<'Measure> = float32
Full name: Microsoft.FSharp.Core.float32<_>
Full name: KernelProgrammingTutorial.SobelFilter2D
Full name: KernelProgrammingTutorial.BLOCK_SIZE
Full name: KernelProgrammingTutorial.MatrixMult
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Array2D.zeroCreate
Full name: KernelProgrammingTutorial.MatrixMultWithLocalParam
Full name: KernelProgrammingTutorial.VectorAddNoReturn
Full name: KernelProgrammingTutorial.VectorAddWithReturn
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Array.zeroCreate
Full name: KernelProgrammingTutorial.MyKernelWithArray
Full name: KernelProgrammingTutorial.MyKernelWithRefVar
val ref : value:'T -> 'T ref
Full name: Microsoft.FSharp.Core.Operators.ref
--------------------
type 'T ref = Ref<'T>
Full name: Microsoft.FSharp.Core.ref<_>
Full name: KernelProgrammingTutorial.op
Full name: KernelProgrammingTutorial.VectorAddWithUtilityFunction