Checkpointing Preemptible VMs: Get $3.33 of cloud compute for $1.00
Join Zoom Meeting: https://washington.zoom.us/j/
Meeting ID: 931 3641 1861
This Cloud Clinic is intended for researchers using cloud platforms for data science. Specifically we will focus on checkpointing: Storing partial results from a cloud compute task so that if it is interrupted it can be restarted roughly where it left off.
Emphasis: on cloud efficiency, terminology, use cases, best practices
- Including GPU access, persistent (object) storage, distinguishing preemptible VM types e.g. “one time” versus “persistent” on AWS, useful details such as the userdata option on AWS
Abstract: In today’s busy world we can lose track of small details that have a big impact. Suppose you have a cloud budget of $10,000 but your computations could be scaled up beyond that limitation to produce better results. What you need is access to immutable storage (easy), access to cheap preemptible cloud VM instances (easy) and a reliable method of checkpointing your progress (easy? hard?). This one-two-three punch means you can purchase $33,333 worth of cloud computing for a mere $10,000 and get better research results as a consequence. This cloud clinic will catch you up on the how-tos and other small details of such a substantial gain in compute power. We use a CNN as our example implementation of a compute-intensive research task.