Justin's Code Haus
Ramblings of a compiler engineer

Category: Programming

UnrealScript: Brace Placement Matters!

I was playing around with the Unreal Development Kit this evening, and discovered a rather interesting quirk in the handling of braces within UnrealScript.  All of the sample code I read use a syntax style that places opening braces on the following line:

event PostBeginPlay()
{
  // Do something
}

However, my typical style places the opening brace on the current line:

event PostBeginPlay() {
  // Do something
}

Unfortunately, this does not seem to work for defaultproperties blocks. If you place the brace on the same line, the compiler will not give you any warnings or errors, but the entire defaultproperties block is just ignored!

So this code works:

defaultproperties
{
  PlayerControllerClass=class'MyPlayerController'
}

while the following code compiles but silently just ignores all of the contained settings:

defaultproperties {
  PlayerControllerClass=class'MyPlayerController'
}

I was banging my head on the wall for at least an hour figuring this one out!

I hope this can help prevent someone else from repeating my mistake.

Posted Sat 17 March 2012 by Justin Holewinski in Programming (Unreal)

Space Hogs Binary Release

I've converted my old Space Hogs game project to XNA 4.0 (it was originally written in XNA 1.0). There were enough API changes to make it a pain, but I think I have everything working now.

This game was developed by myself, Jason Kim, Joseph Ahn, Vjekoslav Kovacevic, and Daniel Guinn for a computer animation class during Winter Quarter 2007.

Spacehogs Screenshot

You can find a zip file here. This requires XNA 4.0 and the February 2010 DX packages to be installed on your machine. For convenience, I've included both of the redistributable packages in the zip file.

The source can be found on BitBucket.

Enjoy!

Posted Fri 17 February 2012 by Justin Holewinski in Programming (Windows, Games)

Direct3D 11 with Qt 4

(If you're in a hurry, the full source can be found on my BitBucket account)

When it comes to GUI frameworks for C++, it's very hard to beat Qt.  It's modular, easy to use, and available on practically any desktop system (and even a few mobile systems).  The MOC'ing can get a bit annoying, but IDE and command-line support is very mature at this point.  However, only OpenGL is supported currently for real-time 3D rendering. If you want to render to a Qt widget from a Direct3D 11 device, you end up having to do a lot of setup yourself.

Unfortunately, there is not a lot of information out on the internet about setting up Direct3D to play nice with Qt.  Most of the information is either out-dated, or only applies to Direct3D 9.  Lately, I've been playing around with this and I want to share my method for combining Direct3D 11 and Qt.

Screenshot

Creating a Widget

To start, we define a new widget sub-class specifically for Direct3D 11 rendering. On the Qt side, the key to eliminating flickering or UI artifacts is the paintEngine() method.  We need a way to tell Qt that we want complete control over drawing for our widget, so we can override paintEngine() in our widget definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class D3DRenderWidget : public QWidget {
  Q_OBJECT
  Q_DISABLE_COPY(D3DRenderWidget)
public:
  D3DRenderWidget(QWidget* parent = NULL);
  virtual ~D3DRenderWidget();
  virtual QPaintEngine* paintEngine() const { return NULL; }
protected:
  virtual void resizeEvent(QResizeEvent* evt);
  virtual void paintEvent(QPaintEvent* evt);
};

(Note that for ease of viewing, all of the fields have been removed from this code snippet)

We also need to set a few attributes on our widget, as shown in the constructor:

1
2
3
4
5
6
7
8
D3DRenderWidget::D3DRenderWidget(QWidget* parent)
  : QWidget(parent) {
  setAttribute(Qt::WA_PaintOnScreen, true);
  setAttribute(Qt::WA_NativeWindow, true);

  // Create Device
  createDevice();
}

First, we tell Qt that we do not want it to do any draw buffering for us. Second, we require a native window handle for our widget. Otherwise, Qt may re-use the same native handle for multiple widgets and cause problems for our Direct3D rendering. You may have also noticed the createDevice() method call; this will be explained in a bit.

Creating the Direct3D 11 Device

Now that we have a basic widget that can support Direct3D rendering, we can initialize the Direct3D 11 device we want. This procedure is mostly identical to setting up Direct3D in a raw window. The only difference is that we must use the width(), height(), and winId() methods to return the widget size and native window handle, respectively:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
swapChainDesc_.BufferCount = 1;
swapChainDesc_.BufferDesc.Width = width();
swapChainDesc_.BufferDesc.Height = height();
swapChainDesc_.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
swapChainDesc_.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDesc_.SampleDesc.Count = 4;
swapChainDesc_.SampleDesc.Quality = 0;
swapChainDesc_.Windowed = true;
swapChainDesc_.OutputWindow = winId();
swapChainDesc_.BufferDesc.RefreshRate.Numerator = 60;
swapChainDesc_.BufferDesc.RefreshRate.Denominator = 1;

Everything else remains the same... pretty easy, huh? :)

Handling Paint Events

Remember the paintEvent override from the widget class definition? We can simply implement it with a call to some rendering function:

1
2
3
void D3DRenderWidget::paintEvent(QPaintEvent* evt) {
  render();
}

Here, render() is just some arbitrary method that uses the Direct3D 11 device to render something to the primary swap chain.

Handling Resize Events

Resize events are perhaps the hardest events to handle when integrating Direct3D 11 and Qt. To resize our swap chain, we need to release all device-allocated resources, and reallocate them. The procedure I follow is:

1
2
3
4
5
6
7
8
void D3DRenderWidget::resizeEvent(QResizeEvent* evt) {
  releaseBuffers();
  swapChain_->ResizeBuffers(1, width(), height(), swapChainDesc_.BufferDesc.Format, 0);
  swapChain_->GetDesc(&swapChainDesc_);
  viewport_.Width = width();
  viewport_.Height = height();
  createBuffers();
}

We start by releasing all of the buffers we had allocated (vertex buffers, index buffers, shaders, textures, etc.). We then issue a resize request to the swap chain, resize our rendering viewport, and then recreate all of our needed buffers. In this snippet, releaseBuffers() will call Release() on all buffers, and createBuffers() will create all of the needed resources (again).

It would probably be easier to just allow the swap chain to grow and just adjust the viewport if the widget shrinks, but this method shows how to keep the swap chain the exact same size as the widget.

Conclusion

At this point, you should have a functional Direct3D 11 rendering context for a Qt widget. For brevity, I have omitted most of the Direct3D initialization code (this can be found in many places on the web).

If you want to check out the complete sample program, it is located on my BitBucket account. To build it, you need a relatively recent Qt release, the DirectX SDK, and the Qt Visual Studio Add-in.

Posted Thu 16 February 2012 by Justin Holewinski in Programming (Direct3D, Qt, Windows, C++)

AMD APP: Getting Device Assembly

Sometimes it is useful to look at the intermediate and assembly code for GPU programs.  This can lead to some interesting performance insights, especially for compiler writers.  Unfortunately, the AMD APP SDK is a bit limited on Linux, and the AMD APP KernelAnalyzer, which conveniently dumps the AMDIL and Device ISA for an OpenCL kernel, is not available on Linux.  However, digging through the AMD APP OpenCL Programming Guide, one finds an environment variable that can be used for the same purpose: GPU_DUMP_DEVICE_KERNEL.

According to the programming guide, this environment variable can take one of three values:

1 Save intermediate IL files in local directory.
2 Disassemble ISA file and save in local directory.
3 Save both the IL and ISA files in local directory.

Therefore, if you run your OpenCL program with:

$ GPU_DUMP_DEVICE_KERNEL=3 ./my-program

You will get two files in your local directory: [kernel-name]_[device-name].il and [kernel-name]_[device-name].isa, which contain AMDIL and Device ISA disassembly, respectively.

Posted Thu 09 February 2012 by Justin Holewinski in Programming (GPU, OpenCL, AMD)

LLVM 3.0: PTX Backend

NOTE: The information is this article only applies to LLVM 3.0 and 3.1. As of LLVM 3.2, the PTX back-end has been replaced with the NVPTX back-end.

With the release of LLVM 3.0, the PTX back-end is now in a fairly usable state.  It even integrates with the Clang OpenCL front-end to produce correct PTX code usable by the nVidia OpenCL run-time.  However, please note that the back-end is still experimental and there are unimplemented features.  As always, please post any questions to the llvm-dev mailing list.

In this post, I aim to give a quick overview of how to use the back-end to compile OpenCL kernels.

As an example, consider the following matrix multiplication routine written in OpenCL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#define BLOCK_SIZE 16

__kernel
void matmul(__global float* A, __global float* B, __global float* C) {

  __local float scratchA[BLOCK_SIZE][BLOCK_SIZE];
  __local float scratchB[BLOCK_SIZE][BLOCK_SIZE];

  int globalX = get_global_id(0);
  int globalY = get_global_id(1);
  int size = get_global_size(0);
  int k;
  float sum = 0.0f;
  int numBlocks = size / BLOCK_SIZE;
  int b;

  int tidX = get_local_id(0);
  int tidY = get_local_id(1);

  for(b = 0; b < numBlocks; ++b) {
    // Populate a cache for A/B
    int x;
    int y;

    x = b * BLOCK_SIZE + tidX;
    y = globalY;

    scratchA[tidY][tidX] = A[y * size + x];

    x = globalX;
    y = b * BLOCK_SIZE + tidY;

    scratchB[tidY][tidX] = B[y * size + x];

    barrier(CLK_LOCAL_MEM_FENCE);

    for(k = 0; k < BLOCK_SIZE; ++k) {
      float myA;
      float myB;

      myA = scratchA[tidY][k];
      myB = scratchB[k][tidX];

      sum += myA * myB;
    }

    barrier(CLK_LOCAL_MEM_FENCE);
  }

  C[globalY * size + globalX] = sum;
}

We can use the libclc library, written by Peter Collingbourne, to provide the OpenCL built-in functions for Clang.  This library will map OpenCL built-in functions to target-specific functions in the LLVM IR that the PTX back-end knows how to handle.  If $LIBCLC points to the download of libclc, then you can invoke Clang with:

clang -ccc-host-triple ptx32
  -Xclang -target-feature -Xclang +ptx23
  -Xclang -target-feature -Xclang +sm20
  -I$LIBCLC/include/generic -I$LIBCLC/include/ptx
  -include clc/clc.h -Dcl_clang_storage_class_specifiers
  -O3 matmul_kernel.cl -S -o matmul_kernel.ptx

The options can be a bit verbose at the moment, but practically all of them can be placed in a wrapper script.  Clang will compile the kernel and emit the generated PTX code to matmul_kernel.ptx.  This code can then be loaded as an OpenCL binary kernel using the nVidia OpenCL SDK, using the clCreateProgramWithBinary function.  As an added bonus, the performance is about the same as if the kernel was compiled using the nVidia OpenCL compiler!

Posted Fri 02 December 2011 by Justin Holewinski in Programming (GPU, LLVM, OpenCL)