Distributed data access consists of a set of components that provide the wide range of data-related services required by the SIMDAT application scenarios:
The Data infrastructure is based on using an extending existing, third-party components wherever possible, with SIMDAT focusing on hardening the SW components, ensuring interoperability between themselves and integration with the GRIA-based SIMDAT Grid infrastructure, and extending functionality or improving performance where required by the application scenarios.
The central elements of the Data infrastructure are based on the OGSA-DAI package. OGSA-DAI provides the framework for accessing file repositories and databases through a Web Service interface regardless of their location.
Automatic distribution, replication and synchronization of data is performed through the IGOR-FS distributed filesystem. IGOR-FS partitions files (and directories) into blocks, each of which is uniquely characterized by it’s hash value. Blocks are looked up by hash value, and chains of blocks are likewise assembled by referencing hash values. In a Grid, a network of IGOR daemons provide access to file blocks – they can uniquely identify and verify each block regardless of its location (since blocks are indexed by their content, not their location), and manage adaptive, local caches of blocks. Synchronization of changes is fully automatic – IGOR-FS is designed for the case of one/few writers and many readers, and changes to a file are automatically propagated, since they amount to creating a new sequence of blocks rather than modifying existing blocks. This scheme also delivers a very powerful version control functionality.
OGSA-DAI is used by SIMDAT as a common interface to local data repositories (f.i. abstracting the large variety of archive systems used by Weather centers in the Meteorology application area), and as the standard interface for accessing and manipulating data across a Grid (in the Automotive and Aerospace application scenarios).
IGOR-FS is used in the Pharmaceutical scenario to distribute large gene and protein databases amongst partners. Here, it really shines, since only blocks actually used by an application will be transferred, and since changes/updates are managed in a totally transparent way.
The results of this technology can be found under Grid Solution Portfolio at Data.