The Prototype of a Cloud Store for Distributed Acoustic Sensing Data

Project Lead: Yiyu Ni, Earth and Space Sciences, UW College of the Environment

Data Science Leads: Naomi Alterman and Rob Fatland

Distributed Acoustic Sensing (DAS) is an emerging seismic observation method that has recently enabled entirely new types of geophysical observation. DAS utilizes repeated laser pulses along optical fibers up to ~100 km in length to measure phase changes of backscattered light, which occur due to rapid (0.01 Hz – 100 kHz) straining rate of the fiber. DAS dramatically expands the capability of dense seismic observation and has been used for buried tectonic fault detection and near-surface imaging. However, the large data volumes generated by DAS challenge data I/O from both local servers and data centers, hence limiting the processing required for seismological research. Commercial cloud environments have promising computing architectures for such large-scale data processing, yet the current and standard DAS data formats are not optimized for cloud-native object storage.

During this incubator project, we propose a data platform to host DAS data that deploys an object storage service with cloud-optimized data formats (Zarr and TileDB) on local servers. This high-performance framework is able to host data locally for individual research groups or institutions that own mid-scale Linux servers and are connected to high-speed internet (e.g. Internet2 member institutions). The outcome of this project would be the storage backend of the Photonic Sensing Facility (PSF), which will host 1PB DAS data by the end of 2023. The platform also uses the same API as AWS Cloud storage (Simple Storage Service, S3), which allows for seamless access from cloud computing clusters. 

To test the performance of this framework, we implemented an ambient noise cross-correlation workflow on 1 month of urban DAS data with 600 channels sampled at 100 Hz. The hourly cross-correlation function of 180k channel-pairs is calculated and saved directly to the cloud storage on the fly. With AWS batch service and auto-scaling, 8 billion cross-correlation functions are computed in parallel. Computation finished in 4 hours using 67 virtual machines equipped with 4 CPUs and 6GB of RAM. With AWS Spot instance enabled, the total cost of the cross-correlation workflow was less than $11. This study shows that our data platform is able to serve massive DAS correlation on the cloud. It also natively supports data processing and product distribution.

View the project GitHub here.