Burde View

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 20 February 2013

A journey from FORTRAN to C and OpenCL

Posted on 19:18 by Unknown
You may have noticed a reduction in blog posts of late. The cause is a GPU porting project I've undertaken for theSkyNet POGS. I guess I should post something about it, in case some of you are interested.

As you may or may not know, the guts of the main POGS application is MAGPHYS. Basically,  this science application loops through library files to find a best fit of different attributes for a given image pixel. The problem with the existing client is the main section of code loops through in a brute-force sequential kind of fashion. Hence, the application by default is not very "parallel" and requires re-work to parallalise and produce a successful GPU port.

We decided that converting from FORTRAN (F77) to C/C++ initially would be the best option as most of the GPU platform frameworks uses some derivation of C99. This is not to say that frameworks such as OpenCL and CUDA don't support other languages, it was just cleaner this way. Our choice of GPU framework was OpenCL.

The actual port from FORTRAN to C was fairly straight-forward, however, there were quite a few little "gotchas". This included things like: array indexing differences - FORTRAN starts at 1; floating point problems - FORTRAN does not do nearest-even rounding; and, overall output "weirdness" attributed to FORTRAN and C differences. None of these really kept me bogged down for too long. It just meant a lot of debugging and customised functions to get around them. These are temporary since the goal is to eventually move over to the C client only.

For the GPU port, it was a bit of a learning curve being the first time implementing OpenCL kernel code. I've worked with things like threading before, however, parallelising code for processing on a GPU was new to me. After hours of reading, I dove right in and started coding. 

Writing the C code to prepare the OpenCL program, kernel, devices etc... and run the kernel was fairly easy. The tricky part was re-working some of the data that was going to be buffered into device memory and read back later. I decided that batching up sets of library models for the kernel threads to crunch was the best option. This essentially meant parsing the library model arrays to the device memory once and allocating space in device memory for the kernel threads to output data. At the end of the batch I would read back memory from the device and have a C based loop consolidate those results appropriately.

At present, we get a 3 - 4 time speed increase using the OpenCL implementation on modern machines with modern cards. There is a bit more work to increase that slightly by parallelizing the post batch part that accumulates output results. I find the biggest bottleneck to be reading large quantities of memory back from the device. If I can do the post batch stuff inside the device, I can reduce the amount of memory to be read back from the device before moving onto the next batch of models.

Overall, the porting process has been a great learning exercise. I'm hoping that in the next month the application can be stabilised and released to the public to crunch POGS on their GPUs. This needs approval from the project leader of course. I know there are a few more hurdles to overcome, but I'm optimistic.

Regarding my little cruncher projects (Raspberry Pi and ODROID), I will try and get back into them. I have 4 subjects coming up in Semester 1 so I'm guessing that this will drown most of my “play time”.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • SETI@Home v7 Project binaries update
    When I initially uploaded the project binaries for the new v7 client, I neglected to keep the v6 client included. I forgot that the v6 clien...
  • My PSU Project.
    I decided that it was time to set up a cleaner, more permanent solution for providing 5V and 12V in my server rack. Here's some photos w...
  • SETI@Home v7 client for Raspberry Pi
    MarkJ reminded me that SETI now have a new client which means the old v6 client will discontinue to work. I've compiled the new v7 clie...
  • MilkyWay@home done (I hope).
    Well I've managed to compile version 1.12 of the separation client used by MilkyWay@home. I tested the compiled version on my x86 machin...
  • SETI@Home v7 Raspberry Pi Success!
    Looks like the job finally finished and returned valid: http://setiathome.berkeley.edu/result.php?resultid=3029125497 Took ~10 days to finis...
  • A journey from FORTRAN to C and OpenCL
    You may have noticed a reduction in blog posts of late. The cause is a GPU porting project I've undertaken for theSkyNet POGS. I guess I...
  • SETI@home chugging on the Pi. MilkyWay@home to follow.
    I've updated my Raspberry Pi project page. You'll see some success on the SETI front of things. Looks like SETI@home is chugging alo...
  • Initial ODROID-U2 BOINC thoughts...
    So I'm starting to build up the ODROID-U2 page . I managed to find a pre-built image of Debian "wheezy" over at http://odroid....
  • Pete's Blog is useful for crunching on ARM
    Just wanting to plug Pete's Blog over at:  http://hmastuff.com/setiathome-6.12.armv7l-unknown-linux-gnu_cubieboard He's got a secti...
  • The big.LITTLE ODROID-XU
    One of the community members over at POGS  just mentioned that the new ODROID-XU is now available form Hardkernel. Check it out here:  http:...

Blog Archive

  • ▼  2013 (19)
    • ►  August (1)
    • ►  June (4)
    • ►  March (2)
    • ▼  February (5)
      • SubsetSum@Home for the Pi
      • A journey from FORTRAN to C and OpenCL
      • Pete's Blog is useful for crunching on ARM
      • An ATX power option
      • Where's the ODROID update?
    • ►  January (7)
  • ►  2012 (8)
    • ►  December (3)
    • ►  November (5)
Powered by Blogger.

About Me

Unknown
View my complete profile