We are studying the foundations of efficient Big Data management, which leads to new fundamental results regarding the complexity of various ad hoc data transformations in modern massive-scale parallel systems.
Our model of computation is an extension of the Massively Parallel (MPC) model, where computation is performed by p severs that alternate between local computation and re-shuffling steps, and each server is limited to O(n/p) space, where n is the size of the data. We study the time/space complexity of data transformations and extensions to cope with failures.