Research Data Management at Euro Sakai 2011

I recently visited Euro Sakai 2011 at the Pakhuis De Zwijger in the Eastern Dockland section of Amsterdam, the main purpose of the visit was to find how best we can make the most out of our VRE software (Sakai) and where it is going, but some presentation strands on research data management also caught my interest. It would seem that the sakai crowd are quite intimately intertwined with R.D.M. and I counted at least 2 research data focused presentations and a couple of others mentioning it.

The most useful was from Hull’s Chris Awre, who described Hull’s approach to managing data through its whole lifecycle. They have built a “Fedora” based, versioned, object repository (i.e. we’re NOT talking about the version of red hat linux but an open source object repository… http://fedora-commons.org/) that was developed and managed using “Duraspace” (http://www.duraspace.org/). They claimed this approach was scalable, standards based, content agnostic and allowed the recording of the relationships between objects.

The history of project stemmed from JISC projects:

To summarise, their implementation uses sakai 2.6.4 and Fedora 3.4 (although they also made an integration to talk to sharepoint) and re-uses the sakai resource section as a GUI driver for the fedora repository, squashing the data objects down into a file and directory “view” inside sakai; all the standard CRUD operations are translated through sakai into the repository. The code for all of this is hosted on github. Looking forward, they would like to create an OAE integration with annotation capture on original documents that gets directly packaged up as metadata into the fedora repository.

The main lessons learned were to stay standards based, it makes everything much easier throughout the entire project; and draw up strong policies around what repositories are for and how they are to be used at the very start.

Presentation hosted by:

Chris Awre – c.awre@hull.ac.uk

Useful links/people:

https://github.com/uohull

http://www2.hull.ac.uk/discover/clif.aspx

https://edocs.hull.ac.uk/muradora/objectView.action?pid=hull:4194 (Final report of the CLIF project)

simon.waddington@kcl.ac.uk (King’s College contact for the CLIF project)

Other interesting stuff worth mentioning:

  • Chris Awre also made mention of the Hydra Project, which is an attempt to standardise data object repositories structures in order to enable and aid interoperability of repositories.

Another presentation by the University of Amsterdam and Edia (A private company that helps with sakai integrations and also the conference organisers) discussed the Fluor research data tool; to briefly summarise sections that differed from Hull’s implementation, they have create the “Fluor tool” inside sakai that talks to their library’s Fedora object repository and have also attached the Fedora Generic Search Service to a SOLR implementation in order to allow searching of the repository (although they also cited the possibility of using Lucene or Zebra; SOLR is apparently based on Lucene but is easier to use and supports REST, JSON and XML). Their implementation works with sakai 2.5 up and allows a fine grain access model on a per object basis so data can be as open or closed as is necessary, all data streams holding object data is encrypted (unlike Hull’s) and even their backups are encrypted.

Presentation host by:

Roland Groen (Edia) – http://www.edia.nl/en/edia/founders

Casper Treijtel (UvA) – dpcmedewerkers-uba@uva.nl

Useful links:

http://www.slideshare.net/RolandGroen/fluor-sakai-la-2011

https://confluence.sakaiproject.org/display/CONF2011/Fluor+-+Your+connection+to+the+Fedora+Digital+Objects+Repository


Here’s a rough representation of my understanding of the model university’s are taking when integrating R.D.M. soutions with the V.R.E (interspersed with a couple of my own ideas)….
To briefly explain, starting at the bottom left, you define structures for the context of your data (so how should a medics repository, or a mathematics repository, or a geography repository look?) this helps with organisation of the contents of the repository (and potentially comparison between repositories); then you can use defined transport standards to interact with your repository and wrap it with search and discover functionality and/or general input/output interactions.
In order to make all of this usable you then need to integrate your research focused tools (your VRE, ELN, profile or research project systems, third party tools etc….) to your repository system via a custom link that caters for the connecting tool (e.g. making the objects appear as files/folders for a VRE system) or front it with some sort of service based system that would offer a defined API to talk to your tools.
Andrew Martin
Research and Collaborative Services

About Andrew Martin
Digital Systems Analyst, Digital Platforms

Leave a comment