The problem of jointly allocating computations and data is a known NP-hard problem. A heuristic proposed by MIT researchers Nathan Beckmann, Po-An Tsai, and Daniel Sanchez recently the best-paper award at the IEEE Symposium on High-Performance Computer Architecture for a place-and-route algorithm that runs in milliseconds and finds a solution that is more than 99 percent as efficient as that produced by standard place-and-route algorithms that take hours for a 64-core chip. The paper “Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling” reports increased computational speeds in a simulated 64-core chip by 46% and reduced power consumption by 36%.
While the paper proposes a hardware solution, it is conceivable that the MIT algorithm could also be adapted to help software developers place threads. Previous reports in the literature note that other thread placement algorithms observe similar, significant power savings that can potentially save megawatts on Intel Xeon Phi powered supercomputers.
“There was a big National Academy study and a DARPA-sponsored [information science and technology] study on the importance of communication dominating computation,” says David Wood, a professor of computer science at the University of Wisconsin at Madison. “What you can see in some of these studies is that there is an order of magnitude more energy consumed moving operands around to the computation than in the actual computation itself. In some cases, it’s two orders of magnitude. What that means is that you need to not do that.”
The MIT researchers “have a proposal that appears to work on practical problems and can get some pretty spectacular results,” Wood says. “It’s an important problem, and the results look very promising.”