Computing and Storage Platforms
Where should you go for information and help about eScience resources at UW and beyond? The eScience Institute was founded to address the specialized computing needs of domain scientists, in dealing with the shift toward information and data-based discovery. In addition, UW Technology provides excellent local infrastructure in data-center, networking, and storage and data protection services. Another resources is the eScience SIG (Special Interest Group).
The eScience Institute can help you navigate the maze of options, and help steer you toward the best fit - whether it's local or remote resources. Among the resources we can provide pointers to are:
Storage
eScience is all about the data. Lots and lots of data. This is the place to share techniques for dealing with this problem and experiences with approaches you've tried in the past. Broadly speaking, the storage problem breaks down into three components: performance, capacity, and protection, and the technologies for addressing these three areas often differ substantially.
We can provide guidance in the deployment of a variety of storage technologies, from fiber channel Storage Area Networks, to cluster filesystems, to Hieracrhical Storage Management data protection schemes. We invite you to share and contribute your experiences, too!
Cloud storage options are now starting to become available, and in a variety of ways. For instance, Amazon is creating a repository of public datasets – such as the mapping of the Human Genone and the US Census data – in their cloud storage service (Amazon S3) and accessible from their cloud compute services (Amazon EC2), to save the cost and trouble of loading these common datasets on your own.
Computing
Computing problems are not all equal, nor are computing resources. Some problems are amenable to large-scale distributed approaches, such as BOINC or commercial cloud computing services, while others require large shared memory architectures typically only found at the national supercomputer centers. And some, in the middle ground of eScience problems, are well served by the modern successor to the Beowulf cluster, operated locally.
There are several commercial cloud computing platforms available for scientific use:
- Amazon EC2;
- Amazon offers grants for research in their AWS in Education Program;
- Microsoft's Azure
Platform
(read about Microsoft Research's commitment to science partnerships);
- Google/IBM's Cloud Computing University Initiative; and
- NSF's Cluster Exploratory Program
(CluE)
(see press release).
Local compute clusters come in many flavors, from a few PCs on a shelf connected via 100Mbs Ethernet, to large deployments of blades sharing low-latency interconnects such as Infiniband. We can discuss approaches for the deployment of your own HPC clusters, with an emphasis on the hands-on details: How to place an order with a vendor so the nodes automatically PXE boot so you can avoid tedious hands-on configuration of every node; Why choose netbooting? Why choose ROCKS? When to pay the extra cost of Infiniband and when you can avoid it. Look in the tools section for details on schedulers and systems management.
In early 2010, the UW will have a shared high
performance compute cluster, dedicated to research computing at UW,
named "Hyak". Hyak will have nearly 1,500 nodes, a high-performance
interconnect network, well-connected high-speed scratch storage, and
an associated high-capacity archive (protected) storage
system. Individual nodes are configured according to each user's
requirements for CPU architecture, RAM, and local disk. Users
deploying nodes in the system pay a one-time hardware cost for a
three-year deployment. All infrastructure (racks, networking, blade
chassis, etc) and all operational costs (electricity, data center
co-location, system administration, etc.) are covered through central
funding, significantly reducing the overhead burden to individual
researchers. The system will be located in a new, professionally
staffed data center being built on the edge of the Seattle campus (UW
Tower). Power, cooling and networking to the campus backbone and to
external high performance research networks is excellent. Hyak will be
operated in a fashion that is sometimes described as a "condominium"
style organization. For more information, see the
Hyak
Overview (pdf). If you are interested in using this system, or
would like more information, please
contact
info
escience.washington.edu .
In 2009, Microsoft donated a 10-node cluster to the eScience Institute to provide UW scientists with a development platform for Dryad. Dryad is a programming model for writing parallel and distributed programs to scale from a small cluster to a large data-center - similar to MapReduce, but with support for multi-step computations, a set of operators derived from relational algebra, and algebraic cost-based optimization. Additional information, including access to the eScience Dryad cluster, is available on the Computational Methods page.
Finally, some computationally-intensive problems
are best suited for the NSF
TeraGrid. Contact Jeff Gardner
(gardnerj
phys.washington.edu) for information about FREE TIME! on the TeraGrid.
Networking
You're producing heaps of data, either from instruments and sensors or from simulations data. Or, you need to draw upon external data sets to analyze or visualize at UW. In any case, you have to move it from here to there, or there to here. We can help sort out the issues related to bandwidth-limited applications (NFS, etc. to hundreds or thousands of nodes), latency-limited applications (fine-grained parallel codes), and surprising combinations (bandwidth delay product in long-haul transfers).
In August, 2009, the eScience Institute submitted a proposal to the the National Science Foundation to upgrade the campus research network, so that we can be better positioned to handle massive amounts of data movement to, from, and within the campus. "A Campus Network for the Emerging World of Data-Intensive Science"

