iridium – dissemination at Jisc MRD02 Achievements, Challenges and Recommendations Programme Workshop March 2013

The iridium project (@iridium_mrd) is attending the Jisc MRD02 Achievements, Challenges and Recommendations Programme Workshop March 25-26 2013.

Ben Allen will be presenting on observations of the technical landscape.

We will also present a poster on key project outputs.




iridium – workshop talk and dissemination at JISC Progress Meeting, Nottingham

The iridium project presented at the JISC MRD02 Progress Meeting in Nottingham. The two day schedule from the event is here, together with the Programme introductory/close slides.

Workshop topics were:

  • Institutional RDM policies; developing an institutional strategy and an ‘EPSRC’ roadmap
  • Managing active data: storage, access, academic dropbox services
  • Data management planning: developing good practice and providing effective support
  • Data repositories and storage: options for repository service solutions
  • Training & guidance
  • Triage and handover: what to keep and where to entrust it? Selection and appraisal, deposit and handover
  • Business case: covering roles, responsibility, costing, sustainability, advocacy etc
  • Data catalogues: metadata profiles, identifiers

Individual projects were encouraged to contextualise presentations around the following themes:

[1] “what has worked/is working”
[2] “what lessons you have learned and how generalisable these may be”
[3] “what challenges remain”
[4] “how such challenges may be approached and what your institution/project intends to do”
[5] “what DCC / MRD activity you think may help make the challenge more tractable”

iridium ‘support’ presentation within ‘Training & Guidance’ session:

iridium presentation thumbnail

iridium presentation

iridium_JISC_Progress_25_10_2012_v4_web_sml_LW [.pdf]

We also presented two posters, one on the research data catalogue proof-of-concept and the second on our thematic analysis requirements gathering.

Other project presentations from the Programme are available here.

iridium – summary of online RDM requirements gathering survey findings

Quantitative online survey –  summary

Key Findings:

  • One hundred and twenty eight projects completed the online survey and over half of the projects are from the Faculty of Medical Sciences (nearly 52%).
  • Over 97% of projects’ data is in digital / physical format.
  • Generally projects have many files e.g. 23.4% of projects have between 100- 1000 files, 28.1% between 10-100 files and 19.5% have between 1000 – 10000 files.
  • Thirty one percent of projects’ files take up to 4GB of space. Just over 11% of projects’ files take up between 64 – 0.5TB. One project had more than 100TB, but no project required more than 1 PB.
  • For nearly 54% of projects space required at collection is greatly different from space required after processing / analysis. However for nearly 29% of projects the space was not greatly different.
  • There was lots of variability in data file format amongst projects. Some projects used many different file formats indeed. However Excel was the most common file format.
  • About 30% of projects said they store their data on ISS managed systems; about 18% of projects used academic unit managed systems and about 21% used personal systems. A small number of projects used external systems / services such as cloud and SurveyMonkey.
  • The majority of projects (nearly 43%) intend to keep data for 5 to 10 years. Just over 29% intend to keep data for 10 – 25 years and about 17% intend to keep data for more than 25 years. Eleven percent intends to keep data for 1 – 5 years.
  • 93% of projects have multiple copies / partial back up of data.
  • Nearly 49% of the projects have tested how successful it will be to retrieve backed up data.
  • Just over 73% of projects share their data with others within the University.
  • Just over 50% of projects share data externally.
  • Of 64 projects (50%) who share data external to the University nearly 30% don’t have any agreement in place while nearly 33% have other types of agreement not specified.
  • About 56% of projects have a data management plan or partial / informal plan.
  • Nearly 65% of projects don’t have any specific tool for RDM.
  • The majority (46%) of projects said that they do not have any deletion policy and just fewer than 17% have a deletion policy.
  • For over 75% of projects data have to be quite secure or very secure.
  • 84% of projects said they store their data quite securely or very securely.
  • Over 90% of projects used password, anonymisation or physical measures for data security.
  • Not many projects are aware of the policies and legislation that applies to their data e.g. only 63% of projects are aware of DPA and only 40% are aware of FOI.
  • 64% of projects said that the PI should have the primary responsibility for RDM support. Next in line was the Research Associate with just 11%.
  • 55% of projects believe that going forward the PI should still have the primary responsibility for RDM support. All the other officers got less than 10% of the vote except for computing support officer where just over 17% of projects think they should have the primary responsibility for RDM support.
  • Only 5 projects said they are aware of training sessions and materials on RDM.
  • 60% of projects gave a positive response to make their research data publicly available at the end of the project.
  • 73% of projects have not deposited any of their data in a data repository.
  • Nearly 60% of projects are willing to submit data to a data repository.
  • An overwhelming majority of projects (nearly 80%) are happy to submit data to a repository at the publication stage.
  • Nearly 41% of projects are willing to make data supporting any publication available immediately.
  • Nearly 73% of projects are willing to share data if they have control over who can access the data.
  • There is no clear consensus from projects on intellectual property rights (IPR), just over 30% of projects believe that it is owned by the University and about 17% of projects do not know; just over 19% think it belongs to their research group and 10.4% said other. For about 7% and 15% of projects it belonged to the funder and the researcher respectively.
  • The majority of projects were funded by either charity or research council 35.5% and 31.4% respectively, that is, a total of nearly 67%.

For more details, see full survey report:

iridium – poster dissemination at JISC Progress Meeting, Nottingham

Janet Wheeler and I will be attending the JISC Progress Meeting/DCC event in Nottingham and are presenting a poster on developing an RDM tool to support implementation of policy principles.

JISC MRD Progress Meeting poster

JISC MRD Progress Meeting poster

iridium – poster presentation of RDM thematic analysis at Digital Research 2012, Oxford

iridium project team members from the Digital Institute presented a poster on the JISC-funded thematic analysis work carried out with local research and related staff to gather and understand RDM requirements. The work was present at the recent Digital Research 2012 conference in Oxford.

iridium DI Digital Research 2012 poster

iridium DI Digital Research 2012 poster

SWORD v2 – From clueless to claymore

What follows is a summary of my steps along the path of investigating what the sword technology is, through to being able to actually start to code something useful; I should probably point out that the beginning of this post can be consumed by less technical persons as a quick overview, but the later section assumes that you…

  • Have some knowledge of coding java
  • Have worked with java server containers (e.g. tomcat) before
  • Can place the libraries in an IDE like netbeans/eclipse to do “something” with them

(Since my investigations centered around sword in conjunction with Sakai and e-science central my language of choice therefore is Java).

I should also declare that I still don’t fully understand all of the implementation but this should help you along your way if you’re just starting out!

Taking the Sword course

My first port of call was the SWORD website itself, which will point you to some useful videos and slides to give you insight into what the technology is and what it can be used for. In short, this is what the “Sword Course” will teach you…

An Introduction To SWORD (Video/Slides)

What it is:

The “Simple Webservice Offering Repository Deposit” technology (or SWORD for short) intentionally only deals with how to get data into a repository, nothing else, and is complementary to something like dublin core used to describe stuff in a repository; it also does not deal with packaging, metadata, authentication or authorisation

Existing implementations can be found in:


SWORD Use Cases (Video/Slides)

Use cases sword is trying to enable:

  • Deposit from a desktop machine
  • Deposit to multiple repositories (For example to allow depositing once and ending up in an institution’s repository, funder’s repository and a subject specific repository)
  • Deposit from a piece of lab equipment (non-human depositing data)
  • Deposit from one repository to another (For example Institutional repository to National repository which may have differing visibility of the data…. Dark and light repositories, dark = can’t be seen private repositories, light = can be seen public repository)
  • Deposit from external publisher/publishing system to long term storage (For example from OJS to your own institution’s repository)

How SWORD Works (Video/Slides)

  • SWORD is in the form of an “XML REST webservice to put things in a repository”
  • It has built on the resource creation aspects of the ATOM Pub standard which is for publishing content to the web
  • SWORD is an extension, or “profile”, of the ATOM Pub spec and existing ATOM Pub clients can be made to work if the relevant extensions are added
  • SWORD version 2 now includes full CRUD

When you use a sword client, this is basically what happens…

  • The client asks a repository to describe itself
  • The server returns a “Service document” to describe what you can do with the repository and how to do that. The service document is typically hidden by Basic Auth Authentication (AM: I think this is crying out for an OAuth implementation!) but once authenticated the web service will customise the service document to what you are allowed to / should do with the system. The server can also describe what data formats you want to accept, where it will go and how long you will store it etc… this is your “collection policy”
  • The client then uses the service description to format your data and then deposit it

What sword adds to ATOM Pub:

  • Accept Packaging – Tells the client what types of data the server accepts
  • Mediated Deposit – Allows you to deposit “as” / “on behalf of” someone else, a repository can say whether it allows this or not
  • Developer features – You can state that you want verbose output to say what happened (v1.3 featured a dry run feature called “no-op” that does not actually deposit or do anything. NOTE: this does not appear to be in v2 anymore)
  • Nested Service document – Where there may be many repositories for the service, the top level document provides links to sub documents, instead of repeating the same or similar definitions in one enormous file.

SWORD clients (Video/Slides)

There are generally three sorts of client:

  • Machine to machine – for very specific automated deposits (lab equipment)
  • General – human would use, talks to any repository
  • Specific – for depositing certain data into certain repositories in a given way that has an extra context of a general client, i.e. depositing specific journals, depositing data for a particular project

Interesting possibilities for deposit scenarios:

Writing something useful

My next stop on the journey through sword looked at actual code, how it’s laid out and what you need to do in order to start doing something useful.

As mentioned previously, I have been basing my investigations around the Java client and server libraries but I’d strongly recommend you also get a good grounding in the workings of ATOM Pub (HTML Version) and the Sword Profile specifications themselves. If you’ve ever read specification documents before you’ll know they can make quite dry reading, however, since ATOM Pub and sword are relatively straight forward technologies and the specs only reach into 30-50 odd pages it’s really worth a browse through.

How SWORD works in java

Firstly, it’s probably best to understand a few basic concepts you will need to deal with, the main outputted concepts/objects are:

  • IRI‘s – unique identifiers to a resource
  • Entry – A deposit, has IRI’s/metadata
  • Media – An entry for media (word docs/pdfs/images), can be linked to in an Entry
  • Collection Entries – ATOM Pub collection of entries (member resources)
  • Collections – A set of deposited entries represented by an Atom Feed document, you can distinguish a collection feed from “a.n.other feed” by the presence of a collection IRI in service document
  • Workspaces – A compartmentilisation concept for repositories, has a name but doesn’t have IRI’s or processing models
  • Service Documents – Groups “Collections” in “Workspaces”, can indicate accepted media types and categories of collections

Next, I found you learn the most by studying the server libraries, what you get is a bundle of java and some set up files for your container. We’ll firstly look at the setup in web.xml

Setup and Servlet mappings (web.xml)

The main servlets (i.e ultimately your rest endpoints) that are defined are…



  • Class: org.swordapp.server.servlets.CollectionServletDefault
  • URL: http://<yourServer>/<yourWebapp>/collection/*
  • Purpose: Retrieveing and deposting to/from collections/feeds and entries




The code makes heavy use of interfaces to allow the implementer more freedom to create functionality using the server library in the way they want to, in order to tell the server libraries what code we want to instantiate and what auth. mechanism we will be using, you must set some context parameters to define those settings and implementations used at runtime:


  • param-name: authentication-method
  • param-value: Basic or None (default: “Basic”)

You can set an Authorization header and base64 encode user:password (e.g. try going to a basic auth encoding website and encode “user:password” in plain text box) and send as a header… “Authorization” “Basic dXNlcjpwYXNzd29yZA==”, or, if you prefer set the param-value to “None” for no authorisation. I found (at the time of writing) the default code actually has a bug which means turning the auth off doesn’t work correctly. I found the best way of correcting this was in my own war project (which includes the server libraries as a dependancy) I created a org.swordapp.sever package (where I was implementing the objects needed for the interfaces) and dropped in a copy of the to override the implementation in the library, I then changed the getAuthCredentials to…

protected AuthCredentials getAuthCredentials(HttpServletRequest request, boolean allowUnauthenticated) throws SwordAuthException
   AuthCredentials auth = null;
   String authType = this.config.getAuthType();
   String obo = "";"Auth type = "+authType);
   //If we are insisting on "a" form of authentication that is not of type "none"
   if(!allowUnauthenticated && !authType.equalsIgnoreCase("none"))
      // Has the user passed authentication details
      String authHeader = request.getHeader("Authorization");
      // Is there an On-Behalf-Of header?
      obo = request.getHeader("On-Behalf-Of");
      // Which authentication scheme do we recognise (should only be Basic)
      boolean isBasic = authType.equalsIgnoreCase("basic");

      if(isBasic && (authHeader == null || authHeader.equals("")))
         throw new SwordAuthException(true);
      // decode the auth header and populate the authcredentials object for return
      String[] userPass = this.decodeAuthHeader(authHeader);
      auth = new AuthCredentials(userPass[0], userPass[1], obo);
      log.debug("No Authentication Credentials supplied/required");
      auth = new AuthCredentials(null, null, obo);
   return auth;

The following context parameters set the implementations of interfaces used to implement functionality in the endpoints. However, you are not given default implementations for each of these, so in your war you need to create a new class that implements the respective interface and fill in your functionality….


  • param-value: org.swordapp.server.CollectionListManagerImpl
  • Interface it implements: org.swordapp.server.CollectionListManager


  • param-value: org.swordapp.server.ServiceDocumentManagerImpl
  • Interface it implements: org.swordapp.server.ServiceDocumentManager


  • param-value: org.swordapp.server.CollectionListManagerImpl
  • Interface it implements: org.swordapp.server.CollectionListManager


  • param-value: org.swordapp.server.CollectionDepositManagerImpl
  • Interface it implements: org.swordapp.server.CollectionDepositManager


  • param-value: org.swordapp.server.MediaResourceManagerImpl
  • Interface it implements: org.swordapp.server.MediaResourceManager


  • param-value: org.swordapp.server.ContainerManagerImpl
  • Interface it implements:  org.swordapp.server.ContainerManager


  • param-value: org.swordapp.server.StatementManagerImpl
  • Interface it implements: org.swordapp.server.StatementManager


  • param-value: org.swordapp.server.SwordConfigurationDefault (Yes, this one does have a default implementation in the library you can use)
  • Interface it implements: org.swordapp.server.SwordConfiguration

Endpoint Servlet classes (org.swordapp.server.servlets.*)

Let’s now have a look at the servlets themselves, each servlet contains interfaces which are implemented by loading the classes specified in the web.xml (see above).

All servlets used in the server library extend the “SwordServlet” which (obviously) extends the HttpServlet. Since the SwordServlet contains the server configuration object, all servlets (through inheritance) also hold an implementation of the server configuration (i.e. SwordConfiguration) and a method to allow servlets to load classes from the configuration….

SwordServlet encapsulates:

  • SwordConfiguration interface, instantiated using config-impl
  • loadImplClass() method used for loading implementing classes from tomcat context params

CollectionServletDefault extends SwordServlet and encapsulates:

  • CollectionListManager interface, instantied using collection-list-impl
  • CollectionDepositManager interface, instantied using collection-deposit-impl
  • CollectionAPI object

ServiceDocumentServletDefault extends SwordServlet and encapsulates:

  • ServiceDocumentManager interface, instantiated using service-document-impl
  • ServiceDocumentAPI object

MediaResourceServletDefault extends SwordServlet and encapsulates:

  • MediaResourceManager interface, instantiated using media-resource-impl
  • MediaResourceAPI object

ContainerServletDefault extends SwordServlet and encapsulates:

  • ContainerManager interface, instantiated using container-impl
  • StatementManager interface, instantiated using statement-impl
  • ContainerAPI object

StatementServletDefault extends SwordServlet and encapsulates:

  • StatementManager interface, instantiated using statement-impl
  • StatementAPI object

Endpoint Servlet dependant classes (org.swordapp.server.*)

Those with a keen eye will have noticed that each servlet is also holding an “API” object, these objects fill out the standard Get/Post/Put/Delete HttpServlet methods that the servlets override by taking the configuration object and any interfaces that have been implemented and combine them to do something useful. Similarly to the servlets, they all extend a Sword API super class called “SwordAPIEndpoint”, which holds a SwordConfiguration implementation. The hierarchy (and interfaces they encapsulate) looks like this…


  • SwordConfiguration interface

CollectionAPI extends SwordAPIEndpoint

  • CollectionListManager interface
  • CollectionDepositManager interface

ServiceDocumentAPI extends SwordAPIEndpoint

  • ServiceDocumentManager interface

MediaResourceAPI extends SwordAPIEndpoint

  • MediaResourceManager interface

ContainerAPI extends SwordAPIEndpoint

  • ContainerManager interface
  • StatementManager interface

StatementAPI extends SwordAPIEndpoint

  • StatementManager interface

Implementations of interfaces (org.swordapp.server.*)

I keep mentioning the objects that implement the interfaces, I thought it might be useful to go through in “slightly” more detail what the content of those objects are intended for. Apologies once again, this is not exhaustive as I have not worked my way through what all the methods are intended for:

SwordConfigurationDefault implements org.swordapp.server.SwordConfiguration

  • This is the default object which holds the configuration for the server

CollectionListManagerImpl implements org.swordapp.server.CollectionListManager

CollectionDepositManagerImpl implements org.swordapp.server.CollectionDepositManager

ServiceDocumentManagerImpl implements org.swordpapp.server.ServerDocumentManager

  • Method: getServiceDocument()
  • Accessed via: GET http://<yourServer>/<yourWebapp>/servicedocument/
  • Returns: org.swordapp.server.ServiceDocument
  • Purpose: Serves service documents (xml that explains the contents and deposit policies for the repository(/ies)

MediaResourceManagerImpl implements org.swordapp.server.MediaResourceManager

  • Method: replaceMediaResource()
  • Accessed via: PUT http://<yourServer>/<yourWebapp>/edit-media/
  • Returns: org.swordapp.server.DepositReceipt
  • Purpose: Swap a media resource (pdf/doc etc….) in the repository with the one being “PUT’ed”

ContainerManagerImpl implements org.swordapp.server.ContainerManager

  • Method: replaceMetadataAndMediaResource()
  • Accessed via: PUT http://<yourServer>/<yourWebapp>/edit/
  • Returns: org.swordapp.server.DepositReceipt
  • Purpose: Replaces metadata and media associated with an entry
  • Method: addMetadataAndResources()
  • Accessed via: Does not appear to be “directly” accessible via any specific HTTP request
  • Returns: org.swordapp.server.DepositReceipt
  • Purpose: Not used by the ContainerAPI yet, but presumably it would be for adding a series of entries and associated metadata
  • Method: addResources()
  • Accessed via: Does not appear to be “directly” accessible
  • Returns: org.swordapp.server.DepositReceipt
  • Purpose: Not used by the ContainerAPI yet, but presumably it would be for adding a series of entries
  • Method: useHeaders()
  • Accessed via: POST http://<yourServer>/<yourWebapp>/edit/
  • Returns: org.swordapp.server.DepositReceipt
  • Purpose: Used when depositing only information specified in the HTTP headers, no entry/ies (i.e. no content body to the POST) will have been specified

StatementManagerImpl implements org.swordapp.server.StatementManager

And finally…

Once you have all that setup (and I’d recommend just creating skeleton override methods for the objects implementing interfaces for the time being whilst you figure the code out), you can then start coding the abdera / sword code and try make the client do something. The client itself comes with a handy cli driven (SwordCLI) interface that you can point at your newly created server instance and test the various example methods. I would recommend though, that you comment out the entire list of method references in the main method and go through the list iteratively to slowly make each part of your server work…

As a brief example, if we were to try and get a basic service document to work, try adding this code to your….

    public ServiceDocument getServiceDocument(String sdUri, AuthCredentials auth, SwordConfiguration config) throws SwordError, SwordServerException, SwordAuthException
        //Our test service document
        ServiceDocument sd = new ServiceDocument();

        //Our test workspace
        SwordWorkspace sw = new SwordWorkspace();

        //Our test collection
        SwordCollection sc = new SwordCollection();
        sc.setTreatment("A human-readable statement describing treatment the deposited resource has received or a URI that dereferences to such a description.");
        List iris = new ArrayList();
        iris.add(new IRI("http://<yourServer>/<yourWebapp>/collection/TestCollection/TestSubService"));

        //Add collection to workspace
        //Add workspace to service document

        return sd;

Browsing to http://<yourServer>/<yourWebapp>/collection/ should return something, and removing your comment in your client for the line…


…should now yield some results (If you just get errors try temporarily turning off the auth. requirement on the server for the purposes of testing).

And that’s the basic principle, you then take the specs and implement the returning and unpackaging of ATOM using the abdera/SWORD objects and link what’s passed/returned to the content found in the system you are trying to integrate.

Further reading

Sword Specs
Brief history of Sword
More useful Sword docs
Another set of slides explaining sword

Andrew Martin
Research and Collaborative Services
Newcastle University

iridium – update on upcoming national research data management events

Upcoming national RDM events:

Call for Papers for International Digital Curation Conference
Deadline: extended to 30th (20th August 2012)
The call for papers is very broad and inclusive, so take the opportunity to share your good practice:

DataCite Technical Workshop
11:00-15:00 10th September 2012, The British Library Conference Centre, St. Pancras, London, NW1 2DB
This practical workshop is aimed at those who are considering incorporating DataCite services into their repository and would like to learn more about how to work with the technology.
The event will be limited to 15 people. We will consider running additional sessions if it is oversubscribed.
Please bring your own laptop.
If you would like to register for the event, please reply to this address with the following details:
Project (e.g. repository name or JISC-MRD involvement):
Caroline Wilkinson
Data Management Project Officer
Science, Technology and Medicine
The British Library, 96 Euston Road, London NW1 2DB
Tel: 020 7412 7250
Mendeley Group:

‘Managing the Material: Tackling Visual Arts as Research Data’ Workshop
09:45-16:00 14th September 2012, HEFCE, Centre Point, London WC1A 1DD
Further details and registration is available from:

18th-20th September 2012, The Hubworking Centre, 5 Wormwood Street, EC2M 1RQ, London, United Kingdom
If you’d like to attend the SPRUCE Mashup London, please sign up here:
SPRUCE is a JISC funded Project. For more information see:

Call for Papers/Tools for 4th Annual European DDI User Conference (EDDI12) DDI – The Basis of Managing the Data Life Cycle
Deadline: 3rd October 2012
Seeking both papers and tools on all things DDI. If you are interested in presenting a paper, please use the online submission system<> of the conference. The deadline for submissions is October 3, 2012. Please use the same link to submit an abstract on your tool until October 3, 2012. The deadline for submissions of the final program code is November 17, 2012.

2nd Workshop on Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web
15th-19th October 2012, Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany
Further information on the workshop, including venue details<> and a registration form, is available on the website of the workshop:

%d bloggers like this: