Ideas on the ADAPT middle layer
Feel free to make changes to any part of this.
The goal of the middle layer is to store objects in a form that is recoverable while providing quick access to content. To do this, the middle layer should be focused on creating static objects and populating various levels of disposable access and maintenance systems from those static objects.
Overall Design
The middle layer is divided into two layers, bit level preservation and content preservation. The two components should be able to function as seperate components. (IE, I can toss arbitrary files into bit preservation and retrieve them, and I can store content on a disk with no thought given to bit level preservation.) This should allow the two components to work with various parts of our persistant archive and also allow for distribution of a 'complete' solution. Two scenarios for seperating the two components could be:
- the SRB to hold content objects
- have DSPACE use the bit storage as it's backend archive
Items should be stored into a storage vault in a way that allows object recovery even when all supporting software has been destroyed. The first release will assume that file-system recovery is still possible, but nothing else. Objects will be encoded and split up into segments using erasure codes. A segment should be able to identify itself and also how it combines to form the original objects. Management software will be build on ensuring that segments are distributed, and caching information to allow for rapid reassembly of segments.
The AIP should be a static object in the storage vault, or filesystem. It should contain information allowing AIP's to link together and any structure/metadata mapping of objects contained within it.
Components
-
- Storage Vault
- Simple object storage and replication using FEC, Similiar to oceanstore, but much more simple. Caching entire objects should be handled here.
-
-
- digital object
- Using erasure codes, objects are broken up and encoded into redundant/self-identifiable segments. Objects and segments must have appropriate headers to allow for reassembly. The goal of objects are to be as immutable and recoverable as possible. This is different from oceanstore/freenet where object confidentiality is a large concern.
-
-
- _Replica manager / tracker _
- Make sure enough segments exist and are seperated enough to prevent failure. Act as a front-end to buckets of objects. Should also be the ingest point for items going into the vault, and handle any access/caching that may be necessary. This component should be able discover complete objects given a pile of segments.
-
-
- Should objects be versioned?
- Is an audit trail necessary at this level, or does encoding provide robust enough assurance?
- How dispersed should objects be? What level of redundancy is really necessary.
- Oceanstore has a niceself identifying object name that may be useful. This involves identifying pieces by running a digest across all segments and tracking neighboring digests.
- Can we work this into a deep archive where the deep archive is no more then objects on more static media?
-
- AIP
- Higher level format that contains object linking information. All AIP's should be pushed into backed vault. Should we have multiple types of AIP's. Should AIPs allow direct access to data in storage vault objects, or should unpacking of vault objects be required.
-
-
- AIP format
- how do we statically record relations between objects, metadata, and other AIP's. If any encryption or other security is required it should be applied at the AIP level.
-
-
- AIP Cache
- cache of aip structures and linking information that allows for rapid retrieval.
-
-
- Should aip's exist in a tightly coupled file (jar, tar, etc), or should they be allowed to reference external identifiers? (if external , how do you handle updates)
- How do aip's span physical volumes/objects
- What does aip linking metadata look like? mets, xfdu, other xml, or just other?
-
- Display
- Database and quick-access items, thumbnails and such derived from aip's in vault. This will likely be tightly coupled to the AIP format that we choose.
Comments, revision notes
to top