The Cornell University Center for Advanced Computing, with partners University at Buffalo and University of California, Santa Barbara, deployed data infrastructure building blocks (DIBBs) for multi-institutional cyberinfrastructure (CI). Our approach combined data analytics and flexible workflow management in the form of a federated cloud model capable of supporting large-scale, shared, and collaborative data analysis. This project was metrics driven, leveraging system analytics provided by Open XDMoD and DrAFTS, to ensure effective resource utilization and optimal time to science.
This project was multi-institutional in two important ways. First, it supported 7 science use cases with researchers and their collaborators at each partner site and extended to collaborators located at other institutions. This provided access to extended research groups with common data interests and requirements without each group having to replicate critical data assets.
Second, our cloud federation model enabled sharing the project’s storage assets among the three partner institutions, providing their researchers and collaborators elasticity that might not have been financially or logistically possible at each individual site, particularly as the number of researchers requiring large scale data analysis grew.
An important goal of this project was to demonstrate a new model of cloud federation that is a sustainable and effective way for institutions to augment campus CI for collaborative data analysis. This model included investigating a common allocation mechanism and explored accounting mechanisms that would provide a transparent resource exchange between partner institutions.