Benutzer-Werkzeuge

Webseiten-Werkzeuge


docuteam:sip_dc_public_documentation

DocuteamDublinCore1.0 SIP format

Generalities

Definition

  • A Docuteam DublinCore SIP is a .ZIP file containing a folder named sip which is a bagit container.
  • The bagit must be created using at least sha256 checksums (other checksum algorithms supported by bagit are optional).
  • Inside the bagit container, a hierarchical folder contains data objects described using XML DublinCore metatdata.

References

bagit container structure specification

Within the zipped bagit, these SIP are organized as follows:

  • bagit contains at least sha256 checksums
  • the root folder, corresponding to the root object within the SIP, is named „data“ (this is handled automatically by bagit libraries)
  • subfolders may be named freely
  • subfolders may be organized recursively
  • in each folder (at all levels) there is a mandatory metadata file always named „dc.xml“
  • in addition, each folder (at all levels) may contain either (but not both!):
    • one or more subfolders
    • one datafile, which may be named freely (except „dc.xml“)

A more formal structure definition :

<rootfolder>     ::= <metadata file> <children>*
<metadata file>  ::= dc.xml
<children>       ::= <folder>* | <file>
<folder>         ::= <metadata file> <children>*
<file>           ::= filename.ext

container structure examples

example 1 : container structure with only one file

data/
├── dc.xml
└── filename1.ext

example 2 : container structure with several files

data/
├── dc.xml
├── folder1
│   ├── dc.xml
│   └── fileA.ext
├── folder2
│   ├── dc.xml
│   └── fileB.ext
└── folder3
    ├── dc.xml
    └── fileC.ext

example 3 :complex structure with several files

data/
├── dc.xml
├── folder1
│   ├── dc.xml
│   ├── folder2
│   │   ├── dc.xml
│   │   └── file3.ext
│   └── folder4
│       ├── dc.xml
│       └── folder5
│           ├── dc.xml
│           └── file5.ext
├── folder6
│   ├── dc.xml
│   └── file6.ext
└── folder7
    ├── dc.xml
    └── folder8
        ├── dc.xml
        └── folder9
            ├── dc.xml
            └── file8.ext

metadata specification

Metadata is restricted to the Dublin Core Metadata Element Set, i.e. to 15 elements (dc 1.1 terms, see http://dublincore.org/documents/dcmi-terms/#section-3).

In addition, the following constraints apply:

  1. The „Identifier“ field is mandatory at each level in „dc.xml“, it must contain:
    • At each level: the the client application identifier of the object with the prefix „clientid:“ e.g. „clientid:1234567“ or „clientid:d4FTw3v6T“
    • At root level, a mandatory identifier with the customer namespace in the repository (this is often the ISIL code) prefixed with „namespace:“, e.g. „namespace:CH-1234-1“
  2. The „Title“ field is mandatory at each level in the „dc.xml“ file. It is not repeatable.
  3. All other 13 fields are optional and repeatable, they are:
    • Creator (e.g. the authors, one per field repetition, that can be persons or institutions)
    • Subject (typically keywords, one per field repetition)
    • Description (a textual description of the object or folder)
    • Publisher
    • Contributor
    • Date (use ISO-8601, e.g. 2018-11-30)
    • Type
    • Format
    • Source
    • Language
    • Relation
    • Coverage
    • Rights

metadata examples

example 1: minimal metadata at root level

<?xml version="1.0" encoding="UTF-8"?>

<metadata
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:dc="http://purl.org/dc/elements/1.1/">

<dc:title>Minimalist Example</dc:title>
<dc:identifier>namespace:CH-123456-12</dc:identifier>
<dc:identifier>clientid:12345</dc:identifier>

</metadata>

example 2: full metadata at root level

<?xml version="1.0" encoding="UTF-8"?>

<metadata
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
  
<dc:title>All fields are set</dc:title>
<dc:creator>Atreid, Leto</dc:creator>
<dc:creator>docuteam</dc:creator>
<dc:subject>dublincore</dc:subject>
<dc:subject>package</dc:subject>
<dc:subject>format</dc:subject>
<dc:description>Description of the docuteam dublin core package format, version 1.0.</dc:description>
<dc:publisher>docuteam</dc:publisher>
<dc:contributor>Smith, John</dc:contributor>
<dc:contributor>Jaquard, Paul</dc:contributor>
<dc:date>2018-11-05</dc:date>
<dc:type>Text</dc:type>
<dc:format>application/pdf</dc:format>
<dc:identifier>namespace:CH-123456-12</dc:identifier>
<dc:identifier>clientid:999full</dc:identifier>
<dc:source>Dublin Core Package Structure (https://docs.google.com/document/d/1lxqiqkmlNYVWlwJSsIe4b5DwJxN6DZqNvpo0MouAFIA/edit)</dc:source>
<dc:language>en</dc:language>
<dc:relation>docuteam bridge api for client applications (https://docs.google.com/document/d/1GTHuk0lme_fLlZZ-An8lEy2f2joMkjasHHAt0Asri_0/edit)</dc:relation>
<dc:coverage>2018-2022</dc:coverage>
<dc:coverage>Baden</dc:coverage>
<dc:rights>CreativeCommons CC-By</dc:rights>

</metadata>
docuteam/sip_dc_public_documentation.txt · Zuletzt geändert: 2019/11/05 08:26 von Frédéric Noyer