Parallel Multiway Methods for Compression of Massive Data and Other Applications
SessionInvited Talks - Sterling and Kolda
Session ChairDavid E. Keyes
Presenter
Event Type
Invited Talk
Introductory
LocationBallroom-EFGHIJ
DescriptionScientists are drowning in data. The scientific data produced by high-fidelity simulations and high-precision experiments are far too massive to store. For instance, a modest simulation on a 3D grid with 500 grid points per dimension, tracking 100 variables, for 100 time steps yields 5TB of data. Working with this massive data is unwieldy, and it may not be retained for future analysis or comparison. Data compression is a necessity, but there are surprisingly few options available for scientific data. We propose to exploit the 5-way structure (3D spatial grid x time x variable) of the data by applying Tucker tensor decomposition to reveal a latent low-dimensional representation. By taking advantage of multiway structure, we are able to compress combustion science data by a factor of 10-1000 with negligible loss in accuracy. Additionally, we need not reconstruct the entire data set to extract subparts or down-sampled versions. However, compressing such massive data requires a parallel implementation of the Tucker tensor decomposition. We explain the data distribution and algorithm and accompanying analysis. We apply the algorithm to real-world data sets to demonstrate the speed, compression performance, and accuracy of the method. We also consider extensions of this work into functional representations (useful for hierarchical/irregular grids and reduced order models) as well as acceleration via randomized computations. This talk will highlight work by collaborators Woody Austin, Grey Ballard, Alicia Klinvex, Hemanth Kolla, and others.









