Benutzer-Werkzeuge

Webseiten-Werkzeuge


docuteam:feeder_steps

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

docuteam:feeder_steps [2015/04/20 16:36]
christian
docuteam:feeder_steps [2019/01/07 11:24]
Zeile 1: Zeile 1:
-====== Allgemein ======= 
- 
-Die Liste der Operationen kann über targets im build.xml erstellt werden: ''​listOperationUsagePublic''​ und ''​listOperationUsage''​ 
- 
-==== Hilfe-Output ==== 
- 
-Um die Parameter einer Operation zu sehen, kann die entsprechende Klasse ohne Parameter auf der Shell aus dem Verzeichnis ''​%FEEDER_JAVA%''​ aufgerufen werden, bspw. SIP-Extractor:​ 
-<​code>​java ch.docuteam.feeder.qualityassurance.SIPExtractor</​code>​ 
-Ergibt folgenden Output: 
-<​code>​ERROR 2014-01-02T10:​54:​25.140 (SIPExtractor) A wrong number of parameters was passed to the command executor 
-INFO 2014-01-02T10:​54:​25.140 (SIPExtractor) Usage: java ch.docuteam.documill.qualityassurance.SIPExtractor [path/​to/​]SIP 
-Parameters: 
-        [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the docudarc.workbench.workdir ... 
-</​code>​ 
- 
-===== Admin ===== 
- 
-==== Version ==== 
- 
-The Version class returns the current version number of the docuteam feeder library and the version numbers of the docuteam libraries it depends on. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.admin.Version -v 
-docuteam feeder: 2.4.9 (17.04.2015) 
-docuteam darc: 2.14.2 (10.03.2015) 
-docuteam converter: 1.0.4 (10.03.2015) 
-docuteam tools: 1.10.6 (25.02.2015) 
-AIPCreatorETH:​ 1.0.0i (17.02.2015)</​code>​ 
- 
-===== Ingest ===== 
- 
-==== BARSIPConverter ==== 
- 
-The BARSIPConverter converts a BARSIP SIP to a SIP, that  the Matterhorn Profile. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.BARSIPConverter [path/​to/​]BAR-SIP [targetFolder] 
-Parameters: 
- [path/​to/​]BAR-SIP:​ name of the BAR-SIP folder or ZIP-file; if no path is given, it will be expected to be in the location defined by the '​feeder.workbench.dropbox'​ property 
- [targetFolder]:​ directory where to move the created SIP to; if omitted, the SIP will be moved to the location defined by the '​feeder.workbench.workdir'​ property</​code>​ 
- 
-==== CheckWorkbenchSpace ==== 
- 
-The CheckWorkbenchSpace checks whether the workbench has the necessary space available for the processing of the given SIP. By default, it calculates with three potential copies for a given SIP on the workbench at the same time. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.BARSIPConverter [path/​to/​]BAR-SIP [targetFolder] 
-Parameters: 
- [path/​to/​]BAR-SIP:​ name of the BAR-SIP folder or ZIP-file; if no path is given, it will be expected to be in the location defined by the '​feeder.workbench.dropbox'​ property 
- [targetFolder]:​ directory where to move the created SIP to; if omitted, the SIP will be moved to the location defined by the '​feeder.workbench.workdir'​ property</​code>​ 
- 
-==== Cleanup ==== 
- 
-The Cleanup is used to delete a given SIP from the work folder in the workbench. It will use the workbench defined in the ''​docuteamFeeder.properties''​ file. Optionally, also the preparation directory of the workbench will be cleaned from any SIP version of the same name. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.Cleanup [path/​to/​]SIP [prep] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [prep]: if '​true',​ SIPs of the same name in the preparation folder will be removed as well; defaults to '​false'</​code>​ 
- 
-==== CreateEADFile ==== 
- 
-The CreateEADFile operation creates a single EAD file from the EAD chunks of each node of a given SIP and puts it into the ''​finished''​ directory 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.CreateEADFile [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== ExtentCalculator ==== 
- 
-The ExtentCalculator will calculate the amount of files within each folder node and assign its value to the respectives folder element '​extent'​. '​material'​ is set to '​Datei(en)'​. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.ExtentCalculator [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== SIPFileMigrator ==== 
- 
-The SIPFileMigrator compares the files of a SIP with the settings of a configuration file (migration-config.xml) and converts the files according to the definitions in that file. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.SIPFileMigrator [path/​to/​]SIP keepOriginals 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- keepOriginals:​ { true | false }, indicating whether to keep the original files after the migration process</​code>​ 
- 
-==== SIPRemoveFromDropbox ==== 
- 
-The SIPRemoveFromDropbox operation will remove a given SIP from the dropbox to an indicated folder or deletes it if no target folder is given. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.ingest.SIPRemoveFromDropbox [path/​to/​]SIP [targetFolder] 
-Parameters: 
- [path/​to/​]SIP:​ path of the SIP; if only the name is given, it will be expected to be in the location defined by the feeder.workbench.dropbox property 
- [targetFolder]:​ directory where to move the SIP to; if omitted, the SIP will be deleted</​code>​ 
- 
-===== Quality Assurance ===== 
- 
-==== FilePathLengthCheck ==== 
- 
-Check if any canonical filepath within a given folder exceeds a given number ​ 
-of allowed characters. \\ 
-Any file or folder exceeding the maximal allowed path lengh will be logged. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.FilePathLengthCheck /​absolute/​path/​to/​folder maxAllowedFilePathLength 
-Parameters: 
- /​absolute/​path/​to/​folder:​ absolute path of the folder that should be checked 
- maxAllowedFilePathLength:​ the max allowed number of characters of the canonical file path</​code>​ 
- 
-==== SIPConfirmation ==== 
- 
-Connect to the Fedora repository and get a single PID to identify the SIP. \\ 
-In the sequence, this PID will be used as the main entry point in the 
-repository for the submission. The value will be stored in the <​mets:​OBJID>​ 
-element. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPConfirmation [path/​to/​]SIP [PIDNamespace[:###​]] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [PID namespace[:###​]]:​ namespace for new PID or complete PID to use for the object; if omitted, the standard namespace from the submission agreement will be used; if the submission agreement cannot be found, the default namespace of the Fedora repository will be used.</​code>​ 
- 
-==== SIPConvertToSafeFileNames ==== 
- 
-Rename files containing special characters. \\ 
-Safe filenames contain only the characters A-Z, a-z, 0-9, and "​_.-"​. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPConvertToSafeFileNames [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== SIPDeleteBackupFiles ==== 
- 
-Delete backup files of a SIP. \\ 
-A list of filename patterns must be supplied to specify which files to delete. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPDeleteBackupFiles [path/​to/​]SIP ​ [filenamePattern filenamePattern ...] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [filenamePattern filenamePattern ...]: a list of filename patterns (NOT case-sensitive,​ '​*'​ is wildcard, but is only allowed at the beginning or end of the pattern). Files matching any one of this patterns will be deleted</​code>​ 
- 
-==== SIPExtractor ==== 
- 
-Extract a zipped SIP into the work folder of the workbench. \\ 
-The optional second argument can be used to indicte a different target folder. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPExtractor [path/​to/​]SIP [targetdir] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [targetdir]:​ target directory; absolute path of the directory where to unzip the SIP to. Optional, default to the workdir defined in the docuteamFeeder.properties</​code>​ 
- 
-==== SIPFixityCheck ==== 
- 
-Check the files contained for conformance with the checksums in the METS 
-file. \\ 
-The results of the check will be written in the form of PREMIS events as 
-inline xml code into the METS file. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPFixityCheck [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== SIPPathLengthCheck ==== 
- 
-Check if any canonical file path within the SIP exceeds the maximal allowed path length. \\ 
-Any file or folder exceeding the maximal allowed path lengh will be logged. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPPathLengthCheck [path/​to/​]SIP ​ maxAllowedFilePathLength 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- maxAllowedFilePathLength:​ the max allowed number of characters of the canonical file path</​code>​ 
- 
-==== SIPSubmissionAgreementCheck ==== 
- 
-Check the files formats contained for compliance with the submission agreement. \\ 
-There are two modes: In the first mode (removeBadFiles = false), any files not 
-complying with the submission agreement will be listed (using WARN log entries) and an 
-error code will be returned. \\ 
-In the second mode (removeBadFiles = true), any files not complying with the 
-submission agreement will be deleted from the SIP, and the modified METS.xml 
-will be saved (the original SIP remains untouched as a backup). 
- 
-<​code>​Usage:​ JAVA ch.docuteam.feeder.qualityassurance.SIPSubmissionAgreementCheck [path/​to/​]SIP [removeBadFiles] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [removeBadFiles]:​ optional, { true | false }; indicating whether to automatically remove files that are not valid according to the submission agreement</​code>​ 
- 
-==== SIPVirusCheck ==== 
- 
-Check each file of the SIP for viruses using a ClamAV (www.clamav.net). \\ 
-A running clamav daemon is required. Depending on the second argument, it 
-will either throw exceptions or automatically delete infected files. 
- 
-<​ocde>​Usage:​ java ch.docuteam.feeder.qualityassurance.SIPVirusCheck [path/​to/​]SIP deleteInfected 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- deleteInfected:​ if '​true',​ the operation automatically removes infected files from the SIP</​code>​ 
- 
-===== Storage ===== 
- 
-==== ChecksumChecker ==== 
- 
-The ChecksumChecker will check the objects with ORIGINAL datastreams in fedora against the generated checksum. \\ 
-It will use the FEEDER_JAVA system variable to locate configuration files 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.ChecksumChecker recipient [namespace] [namespace] ... 
-Parameters: 
- mailto:​recipient@email.com 
-Optional Parameters: 
- [namespace] [namespace] ...: Fedora namespaces separated by space. If no namespace is given, all datastreams are checked.</​code>​ 
- 
-==== DIPDeliverer ==== 
- 
-The DIPDeliverer gets the datastream(s) for a provided fedora PID or file format PUID, repackages each as a DIP  
-and stores them in a given directory. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.DIPDeliverer['​pid'​|'​puid'​] [PID|PUID] [targetLocation] 
-Parameters: 
- ['​pid'​ | '​puid'​] (if '​pid'​ then provide a fedora PID, if '​puid'​ then provide a pronom PUID) 
- [PID|PUID] (PID = fedora persistent unique identifier, PUID = pronom persistent unique identifier) 
-Optional Parameters: 
- [targetLocation]:​ Location where to save DIP.</​code>​ 
- 
-==== FedoraObjectUpdater ==== 
- 
-This operation uploads new versions of object to the fedora server. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.FedoraObjectUpdater [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== FOXMLCreator ==== 
- 
-The FOXMLCreator converts a given METS package into separate FOXML 
-(Fedora Object) files. \\ 
-It will use the current working directory or - if not available - the FEEDER_JAVA system variable to locate configuration files 
-and the workbench defined in the docuteamFeeder.properties file. \\ 
-The code makes use of the Directory Ingest tool which is available from the 
-Fedora website. Modifications were done to support distinctions between root 
-folders and folders through conversion rules (crules.xml) and to be able to 
-handle already gotten PIDs during the ingest process. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.FOXMLCreator [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== FOXMLIngester ==== 
- 
-The FOXMLIngester will transfer a given list of FOXML (Fedora Object) files to a Fedora repository for storage. \\ 
-It will use the FEEDER_JAVA system variable to locate configuration files 
-and the workbench defined in the docuteamFeeder.properties file. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.FOXMLIngester [path/​to/​]SIP [keepFOXML] 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If no path is given, it will be expected to be in the location defined by the feeder.workbench.finished property 
- keepFOXML: One of { true | false }, indicating whether to keep the FOXML files after a successful ingest; defaults to '​true'</​code>​ 
- 
-==== METSValidator ==== 
- 
-The METSValidator validates the mets xml file with the linked schema 
-definitions and places the namespace declarations from the root element to 
-the respective elements. \\ 
-This is a necessary preparation for the mets xml when it has to be split up 
-into different parts, as is the case when several foxml files are being 
-created out of the SIP (FOXMLCreator). 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.METSValidator [path/​to/​]SIP [withEAD] 
-Parameters: 
- [path/​to/​]SIP:​ path of the SIP; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- [withEAD]: whether to include EAD as descriptive metadata and create a datastream in the fedora objects; defaults to false</​code>​ 
- 
-==== PIDAssigner ==== 
- 
-The PIDAssigner will obtain PIDs from fedora and assign them to the nodes in the sip. \\ 
-It will use the FEEDER_JAVA system variable to locate configuration files and 
-the workbench defined in the docuteamFeeder.properties file. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.PIDAssigner [path/​to/​]SIP 
-Parameters: 
- [path/​to/​]SIP:​ name of the SIP. If no path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property</​code>​ 
- 
-==== PIDListPublisher ==== 
- 
-The PIDListPublisher saves/sends the file '​PIDs.txt'​ which results from the FOXMLIngester class to a given URL. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.PIDListPublisher [path/​to/​]SIP receiverURL 
-Parameters: 
- [path/​to/​]SIP:​ path of the SIP; if no path is given, it will be expected to be in the location defined by the feeder.workbench.finished property 
- receiverURL:​ An URL in the style of { file: | mailto: }, indicating whether to put/send the list with PIDs.</​code>​ 
- 
-==== RenameSIPasAIPforIaas ==== 
- 
-The operation RenameSIPasAIPforIaas renames an SIP using the pid of the root element of the mets file as prefix. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.RenameSIPasAIPforIaaS [path/​to/​]SIP [targetFolder] 
-Parameters: 
- [path/​to/​]SIP:​ path of the SIP; if only the name is given, it will be expected to be in the location defined by the feeder.workbench.dropbox property 
- [targetFolder]:​ directory where to put the AIP to; if omitted, the AIP will be copied to standard output directory '​4_output'</​code>​ 
- 
-==== UpdateExcelWithPID ==== 
- 
-This class will write PIDs from an SIP's nodes into excel sheet(s). The excel 
-sheet(s) must have a column with a label of either '​identifier'​ or '​id'​ in 
-the first row. PIDs will be written into the column with the header string 
-'​PID'​ or – if such a column is not available – into the next free column. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.storage.UpdateExcelWithPID [path/​to/​]SIP path/​to/​folder/​with/​excel 
-Parameters: 
- [path/​to/​]SIP:​ path to the SIP; if relative path is given, try to find it in the workbench'​s working directory 
- path/​to/​folder/​with/​excel:​ path to the excel files to be updated</​code>​ 
- 
-===== Submission ===== 
- 
-==== AgreementsOverviewGenerator ==== 
- 
-Creates a simple overview of submission agreements located in a given folder. 
-This is done by XSL transformations,​ for which the class looks for any 
-submission agreement files in the given directory and lists them in a simple 
-xml structure: 
-<​code>​ 
-<​safilelist>​ \\ 
-  <sa_1 /> \\ 
-  <sa_2 /> \\ 
-  ... \\ 
-  <sa_x /> \\ 
-</​safilelist></​code>​ 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.submission.AgreementsOverviewGenerator agreements_directory type output_directory 
-Parameters: 
- agreements_directory:​ location where the collection of submission agreements can be found 
- type: one of { Hierarchy | Flat | CSV }, defining the structure of the resulting overview file 
- output_directory:​ target location for the created overview file, defaults to the directory, where the agreements are located (args[0])</​code>​ 
- 
-==== CheckFolder ==== 
- 
-Check if sip size and size of each file in folder and file paths within the SIP exceeds the maximal allowed provided value. \\ 
-Any file or folder exceeding the maximal allowed size will be logged. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.submission.CheckFolder [/​path/​to/​]folder maxTotalSize maxSingleFileSize maxFilePathLength 
-Parameters: 
- [/​path/​to/​]folder:​ path of the folder to check; if not path is given, it will be expected to be in the location defined by the feeder.workbench.workdir property 
- maxTotalSize:​ the max allowed size the folder may have 
- maxSingleFileSize:​ the max allowed size any of the files contained by the sip 
- maxFilePathLength:​ the max allowed length of file paths within the folder</​code>​ 
- 
-==== CreateSIPsFromFileOrFolder ==== 
- 
-The CreateSIPsFromFileOrFolder operation will create SIPs from a given file 
-or folder. If the source is a folder, a parameter will define whether a 
-single SIP or separate SIPs for each child should be created. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.submission.CreateSIPsFromFileOrFolder source recursive saID dssID author [dropbox] 
-Parameters: 
- source: file or folder for which an SIP should be generated 
- split: if '​true',​ a separate SIP will be created for each file/folder within the source (assuming the source is a folder) 
- saID: value to use for referencing a submission agreement in the SIP 
- dssID: value to use for referencing a data submission session of the respective submission agreement 
- author: value to use as the creator for the SIP 
- [dropbox]: optional location where to put the SIPs; if omitted the property '​feeder.workbench.dropbox'​ defined in the docuteamFeeder.properties will be used</​code>​ 
- 
-==== SubmitSIPsFromFolder ==== 
- 
-The SubmitSIPsFromFolder will use the given arguments for selecting SIPs in a 
-folder and submitting them to a number of workflows using the feeder 
-REST-interface. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.submission.SubmitSIPsFromFolder dropbox filter feeder_url workflows user password useAbsolutePaths checkEmptyQueue [maxNumberSIPs] 
-Parameters: 
- dropbox: path to the folder containing the SIPs 
- errorbox: path to the folder where to put unsuccessful SIPs 
- filter: regex filter string for the SIPs within the dropbox; put the regex expression into quotation marks! 
- feeder_url:​ URL pointing to the feeder main page, f.ex. http://​localhost/​feeder 
- workflows: comma separated list of workflows to execute on each SIP 
- user: username for feeder 
- password: password for feeder 
- useAbsolutePaths:​ true/false, indicating whether to submit SIPs by absolute paths or just their filenames 
- checkEmptyQueue:​ true/false, indicating whether to check if the queue is empty before submitting new SIPs 
- [maxNumberSIPs] (optional): maximum number of SIPs to send to feeder; if omitted, all SIPs matching the filter string will be submitted</​code>​ 
- 
-==== WebjaxeAgreementCollector ==== 
- 
-The WebjaxeAgreementCollector will look for the any submission agreements created within the 
-webjaxe editor and copy them to a given directory. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.submission.WebjaxeAgreementCollector target_directory [webjaxe_home] 
-Parameters: 
- target_directory:​ location where to store the submission agreements xml files 
- [webjaxe_home]:​ optional location of the webjaxe installation directory, usually in the webservers htdocs directory. If omitted, the environment variable $WEBJAXE_HOME will be used.</​code>​ 
- 
-===== Util ===== 
- 
-==== MailSender ==== 
- 
-This MailSender operation sends an email to the given recipient with optional attachments. 
- 
-<​code>​Usage:​ java ch.docuteam.feeder.util.MailSender receiver subject text [attachment1 [attachment2 [...] ] ] 
-Parameters: 
- receiver: the receiver'​s e-mail address(es),​ comma separated if several 
- subject: the mail subject 
- text: the message text 
- attachments:​ filepaths to attachments;​ if the first attachment is '​zip',​ all attachments will be zipped into a single attachment</​code>​ 
  
docuteam/feeder_steps.txt · Zuletzt geändert: 2019/01/07 11:24 (Externe Bearbeitung)