Skip to topic | Skip to bottom
ADAPT - A Digital Approach to Preservation Technology

Main
Main.LPEr1.4 - 11 May 2005 - 21:01 - MikeSmorultopic end

Start of topic | Skip to actions

Lightweight Preservation Environment (LPE)

Historically, the LPE has been a complete data grid with preservation features. For the purposes of ADAPT, the LPE needs to run supplementary to other data grid technologies instead of existing independently.

Proposed System

We propose the following implementation for the next version of the LPE.

1. Built entirely on SRB

Integration with SRB is a requirement at this point, since we must work with SDSC. For this reason, I think we should start there, and leave integration with Globus components until later.

2. Lightweight

We let the SRB do most of the heavy lifting: SRB protocol for moving data, SRB MCAT for storing location information as well as locating masters. Basically, the LPE would exist as a data manager on a per-master basis, making a one to one correspondence between replica sites and SRB masters. The data manager handles storing per-replica policy information and enforcement.

3. De-centralized policy

Policy is managed on a replica-by-replica basis. For this version, policy consists of a positive and bounded number of required replicas, a lower-bounded frequency for checking these replicas, and an expiration date (which we don't have to act on). When a file is replicated from one replica site to the next, it's policy is replicated as well and enforced independently.

4. Simple interface

By keeping the interface simple and abstract, we can extend this system later beyond the SRB.

PAWN uploads and registers files with the LPE in a single step. It then might optionally ask for proof that the file was stored. An initial push might also initiate a copy in to the deep archive.

The inter-manager interface is the same, with a store-and-register operation to create a new replica and a proof operation to verify that the remote replica is correct.

For retrieval, the LPE provides functionality to find replicas, retrieve a single replica, and would also allow the aforementioned proof operation for retrieval clients.

For this to work, I assume self-identifying names. That is, the file name should be a cryptographic digest of the file itself, so that an error in file replication can be immediately detected. It also helps prevent two different files with the same name from showing up on different replica sites at the same time.

Implementation Issues

The following must be feasible for this to work. The easier these are, the easier it will be to implement the LPE.

1. Master location through federated MCATs

2. Inter-zone replication

3. Finding the actual local filename for a given replica from the master, for local I/O operations

To ensure inter-zone replication, we could code the manager to pick replica sites in other zones before picking more local replica sites.

If we wanted to go real simple, we can associate the data manager with the MCAT in a federated MCAT system instead of on a master-by-master basis, but this might make verification operations prohibitively expensive.

I think this should get us started with a system that meets our needs in a relatively short amount of time.

-- GaryJackson - 04 May 2005

Comments, etc.

What would having the master location located in the MCAT cost us later in terms running the LPE on non-srb or mixed systems?

Within the SRB, they use the dce-based guid generation for unique ID's, PAWN already tracks these upon ingest and updates it's manifest in the SRB. Switching to a digest is do-able.

How would any non-file information be preserved, ie hiearchy information in the SRB as we replicate between zones, or do we not care and will leave this up to front-end interfaces to present the user with a nice view?

-- MikeSmorul - 11 May 2005
to top


You are here: Main > LPE

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.