Skip to topic | Skip to bottom
ADAPT - A Digital Approach to Preservation Technology

Main
Main.MalachCollectionr1.9 - 02 Nov 2006 - 15:14 - MikeSmorultopic end

Start of topic | Skip to actions

Ingesting Malach Interviews

Overview of Collection

The collection contains technical reports and other papers and a set of interviews and their description. There are ~52,000 interviews totalling ~116,000 hours. The interviews have been indexed using two types of indexing, old and new.

There are three levels of description, collection level, interview level, and segment level. The segment level is indexed either by 1 minute segments (new), or by variable length content based segments(old).

The collection contains a thesaurus with ~40,000 terms. The thesaurus is stored in a set of db tables.

Interview Level

We should look at this as the base unit for archiving. Packages should focus on creating 'interview' bundles. The interview is comprised of the following:
  • processed paper questionare, original tiff scan, re-keyed complete form, and a keyed short form of the questionare , (name, etc..)
  • Interview summary (only in old), free form text

Segment level

Each interview is broken up into a set of segments. A segment contains the following:

  • Each segment is either fixed length segments(new), or variable length content based(old)
  • segment summary
  • ASR extracted keywords

PAWN Integration

Integration will consist of the following:

1. PAWN will need to be made aware of MPEG-7 metadata
Initially, this can be a simple viewer/text editor.

2. Software to encode malach metadata as MPEG-7
Software will have to be written that will encode necessary malach metadata as mpeg-7. The information that will be encoded in the mpeg-7 files will be extracted from the collection level thesaurus. Metadata that has time coding will also be included here. This has been done using a perl script that parses the Segment.xml file and a segment-time map file.

3. Packaging of Malach data
Packages will contain two folders, one for media which will contain the mpg, mp2, etc files and will have mpeg-7 files attached to the folder level. The second folder will be for scanned and other complete information.

Package

  • Root of Interview
    • Audio / Video
      • mpg, mp2, etc... files
      • mpeg7 metadata
    • Supporting Documentation
      • interview summary / questionaire.

  • Mpeg-7 Usage We are following the DAVP mpeg-7 profile. This profile is designed to support audio-visual types of data with each mpeg 7 document describing one video or audio item. We will generate one mpeg-7 file per interview using the Temporal Decomposition tools (11.6.2) to index segment level information. Specifically, there will be a set of Temporal Decomposition elements, each specifying a time segment using the Media Time (6.4.10) and keywords using the 'Text Annotation' elements in the decomposition.
  • Mpeg-7 Layout
    • AudioVisual
      • Temporal Decomposition
        • MediaTime - duration of segment
        • TextAnnotation / KeywordAnnotation - up to 3 sections, keywords from auto generation(2) ( AUTOKEYWORD2004A1, AUTOKEYWORD2004A2) and manual entry(1) (MANUALKEYWORD)
        • TextAnnotation / FreeText - up to 3 sections, segment summary(1) (SUMMARY) and asr text(3) ( ASRTEXT2003A, ASRTEXT2004A )
      • Temporal Decomp
      • ...


to top


You are here: Main > MalachCollection

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.