In order to better optimize and debug OpenCL kernels, sometimes it is very helpful to look at the underlying assembly. This article shows you the tools available in the Intel® SDK for OpenCL™ Applications that allow you to view assembly generated by the offline compiler for individual kernels, highlight the regions of the assembly code that correspond to OpenCL C code, as well … [Read more...]
Learn to Make Windows 10 Apps with Free Microsoft Course Then Add GPU Acceleration!
Free Windows courses by themselves are not newsworthy, but those who wish to create Windows 10 apps for the Windows Marketplace - AND exploit the power of CUDA and OpenCL computing via C# should find the Free Microsoft course in combination with the TechEnablement tutorial "Combine C-Sharp With CUDA and OpenCL On Linux, iOS, Android and Windows" an enabling pair of … [Read more...]
OpenCL SPIR Tutorial Teaches Portability Without Shipping Kernel Source
Intel has released an OpenCL tutorial showing how developers can use SPIR (Standard Portable Intermediate Representation) to preserve vendor and device portability without having to ship OpenCL kernel source code. For more information about how SPIR enables commercial OpenCl applications, see our article, "Commercial OpenCL! SPIR 2.0 Protects IP Yet Allows Powerful, Portable, … [Read more...]
Tutorial on the OpenCL 2.0 Generic Address Space
Adam Lake and Robert Ioffe posted a nice tutorial on the Intel website about the new OpenCL 2.0 generic address space. The OpenCL 2.0 generic address space makes writing OpenCL programs easier by removing the requirement of decorating all pointers with a points to address space. Instead, OpenCL programmers just use pointers as they would in standard C. Utilizing this new … [Read more...]
Intel Posts OpenCL 2.0 QuickSort Tutorial (Compare to TE CUDA Version)
Intel Engineer Robert Ioffe has posted an OpenCL QuickSort tutorial that utilizes nested parallelism and Workgroup-scan functions. In particular, the tutorial shows how to use the OpenCL™ 2.0 enqueue_kernel functions that queue kernels from the device without host intervention (Much like dynamic parallelism) plus work_group_scan_exclusive_add and … [Read more...]
Combine C-Sharp With CUDA and OpenCL On Linux, iOS, Android and Windows
Google Protobufs (via protobuf-net) in combination with the click-together framework taught in my CUDA and OpenCL tutorials allows C# and .NET programmers to include Linux and Windows GPU and Intel Xeon Phi codes in their workflows. Mono The freely available opensource mono-project creates C# executables that can run unchanged on both Linux and Windows - just copy the … [Read more...]
MultiOS Gaming CUDA & OpenCL Via a Virtual Machine
Update 12/1/14: Intel now offers through the Xen project full GPU virtualization for Intel 4th generation devices. Operating system virtualization is a convenient way to run multiple operating systems at the same time, on the same hardware, without requiring rebooting. There are several technologies that allow sharing of the GPU by both the host (native) and guest … [Read more...]
Part 1: OpenCL™ – Portable Parallelism
This first article in a series on portable multithreaded programming using OpenCL™ briefly discusses the thought behind the standard and demonstrates how to download and use the ATI Stream software development kit (SDK) to build and run an OpenCL program. view at The Code Project (http://www.codeproject.com/Articles/110685/Part-OpenCL-Portable-Parallelism) The thought … [Read more...]