This short chapter gives an introduction to the Intel COI library and discusses the pros and cons of different data buffers as well as provides benchmarks on transfer latency and bandwidth between the host and the coprocessor. For any non-trivial applications, there is likely going to be a need to share data between the host and the coprocessor. These valuable information are essential in choosing the proper method to communicate data efficiently. An application example is also provided to give a real world context of why it is necessary to consider optimize communication and avoid a potential bottleneck.
The COI library is built on top the Symmetric Communications InterFace (SCIF) which provides low level and optimized process and coprocessor communications within the Intel® Manycore Platform Software Stack (Intel® MPSS). In contrast to the compiler assisted offload mechanism (which also uses the COI library), using the COI library directly allows the programmer to manually control how data is explicitly transferred onto and off of the coprocessor. In this chapter, we will introduce how to use COI buffers to transfer data, evaluate the effectiveness of the COI library in real world applications, and discuss characteristics of the different types of COI buffers through benchmarks.
Louis Feng is a software engineer at Intel working on high performance graphics in collaboration with DreamWorks Animation. He has previously worked at Disney ImageMovers Digital and Pixar on movie production rendering. Louis received his PhD in Computer Science from University of California, Davis on tensor field visualization research. His current research interests include ray tracing, photorealistic image synthesis on highly parallel architectures, and parallel programming models.
Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.