Who would have thought that a mere two hundred lines of code provide so many capabilities! The chapter authors (Iosif Meyerov, Alexander Sysoyev, Nikita Astafiev, and Ilya Burylov) apply their optimization expertise for Intel Xeon and Intel Xeon Phi to calculate the fair prices of a set of European options.

They chose this the Black-Scholes calculation for the following reasons:

- European option pricing is traditionally used as a benchmark for checking of capabilities of new architectures.
- Black-Scholes option pricing is of great practical interest and is one of the basic tasks of financial markets analysis,.
- The code is succinct and readily understandable plus the Black-Scholes formula is described in almost any textbook on financial mathematics and does not require any special knowledge for its implementation.
- The simplicity of the code and algorithm creates an illusion of simplicity and gives the student a wonderful opportunity to experiment.

That said, it is difficult to imagine from the algorithm description what pitfalls and mysteries are contained in the implementation and what optimization techniques will be most appropriate. Through a step-by-step case study, the authors walk the student through the optimization. In their opinion, these same techniques can be used to optimizat other Xeon and Intel Xeon Phi software as well.

The chapter is organized as follows:

- A short description of a financial market model is given, the basic concepts are discussed to ensure an understanding of the key elements of the algorithm.
- The baseline implementation is described and the performance analyzed.
- Step‐by‐step optimizations are applied including the elimination of unnecessary type conversion, loop-‐invariant code hoisting, equivalent conversions replacing “heavy” mathematical functions by “lighter” ones, vectorization of calculations, parallelization, “warming up” of thread creation to avoid skewing the results with one time overhead, reduction of the accuracy of floating-‐point calculations, memory optimization (via the use of streaming stores).
- The optimization effects are shown for both the Xeon processor and Intel Xeon Phi coprocessor by starting with the most generic ones that help any parallel program and illustrate their effect showing the processor run times.
- Towards the end of the chapter the authors begin highlighting performance differences between the coprocessor and processor which reinforces the need to focus first on general parallelism and then on architecture-specific fine-tuning.

### Chapter Authors

*Dr. Iosif Meyerov is the vice-head of Software department at Lobachevsky State University of Nizhni Novgorod (UNN), principal investigator in several R&D projects. He received a Ph.D. degree in Technical Sciences from UNN (2005) and M.S. degree in Applied Mathematics from UNN (1999). His research interests include High performance computing, Scientific computing, Performance analysis and optimization, System programming, Applied mathematics.*

*Dr. Alexander Sysoyev is the associate professor of Software department at Lobachevsky State University of Nizhni Novgorod (UNN), principal investigator in several R&D projects. He received a Ph.D. degree in Technical Sciences from UNN (2012) and M.S. degree in Applied Mathematics from UNN (1999). His research interests include High performance computing, Global optimizization, Performance analysis and optimization, System programming, Applied mathematics. *

*Nikita Astafiev is a senior software engineer at Intel Corporation, Numerics team. He worked on highly optimized math functions for Intel software products since 2003. He received a MS degree in Mathematics from Moscow State University. Interested in automated floating-‐point error analysis and low-level optimizations.*

*Ilya Burylov is a senior software engineer at Intel Corporation, Numerics team. He works at Intel since 2006. He received MS degree in Applied Mathematics from Perm State Technical University (2005). His current focus is in optimization of computation intensive analytics algorithms and data manipulation steps for Big Data workflows within distributed systems. His work experience includes computation optimizations in statistical, financial and transcendental math functions algorithms.*

Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of *High Performance Parallelism Pearls.*

## Leave a Reply