Jim Dempsey bests expert Intel programmers by 40% – 50% simply by using a little bit of ingenuity, along with a slightly different programming technique. He notes that, “a substantial portion of previously lost thread barrier wait time” can be recovered simply by using loosely synchronous (plesiochronous) barriers instead of strictly synchronous barriers. Jim points out that, “those [Intel] programmers are likely much better that I am at program optimization, I merely saw an opportunity they missed.” You too have the opportunity to increase application performance with plesiochronous phasing barriers. Not just for Intel Xeon Phi, Jim notes thatthe optimizations in his High Performance Parallelism Pearls chapter, “are equally applicable to programming processors”.
The numbers in the preceding graphs, “diffusion code speedups” represent the ratio of the identified program results versus the single threaded ‘base’ program results. All results are averaged to remove timing artifacts. Jim also notes that the figure is not a scaling chart where the number of cores or threads change, but a comparison the performance benefits as the implementation is tuned to take advantage of the full computational capability of the coprocessor.
The optimizations performed used to take full advantage of the coprocessor are discussed in detail in his chapter and include:
- base: single thread version of the program
- omp: simplified conversion to parallel program
- ompvect: adds simd vectorization directives
- peel: removes unneeded code from the inner loop
- tiled: partitions work to improve cache hit ratios
Author
Mr. Dempsey began programming in 1967-‐1968 with a Digital Equipment Corporation PDP8/L (4K word, 10cps paper tape for storage). Worked at DEC 1972-‐1974 in support of operating systems (OS/8, COS300, RT11). Joined Educomp Corp. in 1974 and wrote the ETOS operating system for PDP8-‐E. Formed first corporation, Network-‐Systems Design, Inc. in 1977 and wrote OMNI-‐8 operating system (8-‐way cluster and networked O/S). Formed several privately owned companies (Fox Valley Data Services Inc., TapeDisk Corporation, eNoMonie Inc., and QuickThread Programming, LLC.) between then and now in the software development and services area, as well as providing consulting services. Now serving as a consultant, specializing in High Performance computing as well as embedded systems. Position: President of QuickThread Programming, LLC. Extensive programming Chapter Title experiences in operating system, device drivers, utilities, compute intensive applications. Strong skills with assembler, C/C++, Fortran. Some experience with C# and Java. Comfortable working on SMP systems running Windows or Linux. Highly efficient at optimization on Xeon and Xeon Phi processors. Available for consulting: jim@quickthreadprogramming.com
Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.
Leave a Reply