February 14, 2012 1 Comment
Making Research Data Discoverable
As followers of this list will be aware, the IRDIUM project at Newcastle is taking a comprehensive ‘soup to nuts’ look at Research Data Management; all the way from scoping academic and funder requirements through to what policies, guidance, tools, systems and training are needed to support them.
You may also know, particularly if you work anywhere near a Research Office, that come May the 1st 2012 EPSRC has asked that all Universities should have a road map to align their policies and procedures with the EPSRC research data management expectations. Through the Iridium project we have made significant headway here mainly by drafting up a Research Data Management Policy and a supporting Code of Good Practice, however we are missing an appropriate catalogue in which we can record what data we have and make it discoverable to others (a key requirement). With that in mind we met to discuss what could be put in place and other groups may find our discussion useful.
To start with we decided that the system we are going to put in place for May 1st will not be a repository but rather a straight forward web-based searchable catalogue of data and that we will only collect information on data that supports publication. We have opted for this measure as we know that data supporting publication should have already been prepared (i.e. confidentiality respected through the scrubbing of data, fields marked sensibly etc) plus we feel that data is normally available at this point for peer review and as a matter of good scientific practice, so (hopefully) we’re not asking too much more from our academics to fill in data information at the same point they fill in their new publication info in our output system.
The main part of the meeting was concerned with which fields we want to use and these were:
Key Information for the system:
- File Type and Format (publically available)
- Size (publically available)
- Location (Private): resolved down to the most appropriate level e.g. file type, drive, URL or repository)
- Type specific Key Words (public): Free text field, c. 250 character limit to describe the data resource
- Subject Specific key words (public): Free text field, c. 250 character limit to describe the data resource
- Distinct Title (public): Free text field, c 250 character limit
- Creator (Public): University user ID
- Submission Date (Public)
- Funder / Owner (Either Private or Public)
- Terms and Conditions (Private, available on request)
- Status (public): live, deleted, corrupted etc
- Last Access Date (private): Controlled through Access mechanism e.g updated by Research office
- University project Ref (Private)
- Publications (Public)
- Unique ref / DOI equivalent
- Upload time / date
- Mechanism (suggest that this should run URO)
- Our T & Cs (small number of boiler plate T& Cs)
Useful suggestions included; that we should have a change log for each field but that given the constraints there should be probably be controlled access after the upload and that, in future, we should have a method for allowing data re-users to add their own contextual information as a tag onto the data. Plus a very interesting question as to whether we could assign a non-financial value e.g. value to science and a suggestion that we could do that via publications link (4* pub = 4*data). Finally we’ll also look at linking our existing systems as this would add significantly to both the wealth of information held about the data and by making elements such as confidentiality and / or publications automatic.
We’re aiming to start working on this soon but as always we’d appreciate hearing from the anyone who can add anything or direct us to good examples of what’s already out there, please get in touch at niall.o’email@example.com.