iridium – dissemination at Jisc MRD02 Achievements, Challenges and Recommendations Programme Workshop March 2013

The iridium project (@iridium_mrd) is attending the Jisc MRD02 Achievements, Challenges and Recommendations Programme Workshop March 25-26 2013.

Ben Allen will be presenting on observations of the technical landscape.

We will also present a poster on key project outputs.

iridium_poster_final_jisc

iridium_Jisc_final_meeting_25_3_2013_sml

iridium – postgraduate evaluation of MANTRA RDM training (2) – Sharing,preservation and licensing unit

From Jack:

The final module of the MANTRA online research data management training is entitled Sharing, Preservation and Rights. The second of two new modules (the last one being Data Protection, Rights and Access) focus on the back end of the research lifecycle.  In this instance, when working on a project the main focus for the researcher will be gathering the data and achieving outputs, there may be little focus initially beyond this. Once work has been completed preservation and sharing may be one of great importance to ensure the greatest possible impact; if research is intended to be cumulative and part of a community then making research data available should be of priority. However, for some there may be restrictions to the extent they make data available and limits to how others are able to use it. These are also covered in this module.

The module outlines the benefits of sharing research data. There are benefits for the researcher their self (scientific integrity, funder requirements and preservation for one’s own future use) and the research community more widely (teaching, impact, collaboration and public record).  Whilst for the most part we may take on good faith the validity of outputs published in journals and other academic papers the module outlines some high profile instances of how some results have been fabricated by researchers. They argue then that making the data available upon outputs are based ensures legitimacy of research and conduct of openness.

Whilst outlining the importance of preserving data for future reuse the difficulties and potential problems of maintaining it over time are highlighted. Rapid changes in file formats and obsolete storage methods are cited as potential future issues for access. Though this may pose an undue hindrance one’s research activities I see it to emphasise the importance or proper and correctly managed data preservation. Reasons are given for placing data into repositories with emphasis. A further emphasis of the module is that whilst for the most part it focuses on the creator of data, also recognises the position of the secondary data user and provides help for them.

For further guidance on data preservation and best practice the recommended reading  of DCC Curation Reference Manual (http://www.dcc.ac.uk/resources/curation-reference-manual) provides in-depth curation techniques split into several chapters (some still in development).

This final module of the MANTRA training completes a comprehensive yet straightforward beginner’s guide to research data management. Having reviewed the content of several online data management guides recently the University of Edinburgh learning units are the ones I would be recommending as an introduction for fellow postgraduate researchers and equally anybody with related interest in research data management.

MANTRA available from: http://datalib.edina.ac.uk/mantra/preservation.html (CC-by licensed)

iridium – ‘core’ institutional research data management plan development

Research data management plan authoring is a key part of our draft institutional RDM policy and good practice. Most RCUK funders (apart from EPSRC, currently) require a formal RDMP. NERC now require a pre- and post award RDMP.  These types of templates are available in the DMP Online system.

We wanted to write an institutional RDMP within iridium for research projects that do not have a Funder mandated template (a fair proportion, ~66% of research projects?).  This was to be as easy to complete by end user as possible (i.e. low time burden for researchers) and to be used across Faculties (disciplines) if possible.

What are the ‘essential’ RDMP questions (a ‘core plan’, Donnelley, 2012)? We reviewed several RCUK RDMP templates from different disciplines for similarities, but also distinctive and pertinent questions. Also we had project specific criteria together with good practice from DATUM RDMP template with strong actions and review (‘active plan‘) emphasis.

It was decided to pursue a post-award RDMP template approach for projects without a mandate plan, as less it was burden to write ‘core’ plans for projects that were not awarded in the end and maximise uptake. We noted need for key aspects of RDMP planning to be brought forward in pre-award processes and RIM systems (such as ethics which is already strongly monitored institutionally, but also including RDM costs/atypical data volume size (plus extended curation duration?)). For example, recommending for questions/planning to be a ‘flag’/check-box in  a ‘minimal’ (‘ultra-minimal’?) RDMP check-list in existing RIM systems/pre-award Faculty peer review process.

—- —- —

iridium institutional template post-award RDMP v5 [DRAFT]

This template is for projects that DO NOT have a Funder mandated research data management (RDM) plan. Funding body requirements relating to the creation of a research data management plan are available from …

{ Our institution RIM system MyProjects contains research project administration data (see below). In the long term it would be useful to have this imported and auto-populated into a RDMP direct from RIM system. This aligns to the ‘header’ information in the DMP Online template }

Reference:
Proposal Type:
Proposal Title:
Proposal Short Title:
… ….  … …. etc.

Contact details of named individuals (Role/Name/Unit):

MyProjects Owner:

Date of creation of this plan:
Plan version/supersedes:

 Aims and purpose of plan: … …

[SCOPE NOTES: Guidance on completion of this plan is available from …. ‘DCC 1.x references link to additional guidance provide by the Digital Curation Centre]

1 Introduction and Context
1.1 Introduction and Context
[DCC 1.2]: Short description of the project’s fundamental aims and purpose
[DCC 1.3.2(re-worded)]: Describe how you have considered the Newcastle University RDM institutional policy and any Faculty/research group guidelines, together with any other policy-related dependencies:
[From RC template] Document the RDM advice you have sought on planning your proposed project, including any consultation with projects using similar methods.
[DCC 10.2]: Glossary of terms
2 Data Types, Formats, Standards and Capture Methods
2.1 Data Types, Formats, Standards and Capture Methods
[SCOPE NOTE – for further guidance on ‘data’ definitions and the capture of non-digital data, please see XYZ]
[DCC 2.1]: Give a short overview description of the data being generated or reused in this research
[SCOPE NOTE – for further guidance on ‘open’ file formats, please see …]
[DCC 2.3.3(re-worded)]: Which open file formats will you use, and why?
DCC 2.3.4: What criteria and/or procedures will you use for Quality Assurance/Management?
[SCOPE NOTE – for further guidance on ‘Quality Assurance/Management, please see …]
DCC 2.5.1: Are the datasets which you will be capturing/creating self-explanatory, or understandable in isolation?
[DCC 2.5.2]: If you answered No to [DCC 2.5.1], what contextual details are needed to make the data you capture or collect meaningful?
[DCC 2.5.3]: How will you create or capture these metadata?
[DCC 2.5.4]: What form will the metadata take?
3A Ethics
3A Ethics
HAVE YOU COMPLETED A NEWCASTLE UNIVERSITY ETHICS APPLICATION?[YES] [NO] [NOT APPLICABLE] REFERENCE NUMBER:{ We already have strong RIM/institutional check points for ethics, we don’t want to duplicate information gathering, thus this section is brief. }
3B Intellectual Property
3B Intellectual Property
[SCOPE NOTE – for further guidance Intellectual Property/licensing, please see …]
[DCC 3.2.1]: Will the dataset(s) be covered by copyright or the Database Right? If so give details in DCC 3.2.2, below.
[DCC 3.2.2]: If you answered Yes to [DCC 3.2.1], Who owns the copyright and other Intellectual Property?
[DCC 3.2.3]: If you answered Yes to [DCC 3.2.1], How will the dataset be licensed?
4 Access, Data Sharing and Re-Use
4.1 Access, Data Sharing and Re-Use
[From Research Council template] Are there issues of consent, confidentiality (including commercial), anonymisation and other ethical considerations?
[From RC templates] What are the main risks to data security/ confidentiality?
[DCC 4.2.3]: Are there any embargo periods for political/commercial/patent reasons?
[DCC 4.2.4]: If you answered Yes to DCC 4.2.3, Please give details.
[DCC 4.3.1]: Which groups or organisations are likely to be interested in the data that you will create/capture?
[DCC 4.3.2]: How do you anticipate your new data being reused?
[DCC 5.3.2]: How will you implement permissions, restrictions and/or embargoes?
[DCC 4.1.1]: Are you under obligation or do you have plans to share all or part of the data you create/capture?
[DCC 4.1.3]: If you answered Yes to DCC 4.1.1, How will you make the data available?
[DCC 4.1.4]: If you answered Yes to DCC 4.1.1, When will you make the data available?
[DCC 4.1.5]: If you answered Yes to DCC 4.1.1, What is the process for gaining access to the data?
[From RC template] What will be the responsibilities of data sets users (for example as detailed in a ‘Statement of Agreement’)?
[SCOPE NOTE – for further guidance responsibilities of data sets users and ‘Statement of Agreement’ wording, please see ….]
[DCC 4.1.6]: Will access be chargeable?
5 Short-Term Storage and Data Management
5.1 Short-Term Storage and Data Management
[DCC 5.1.1]: Where (physically) will you store the data during the project’s lifetime?
[DCC 5.1.2]: What media will you use for primary storage during the project’s lifetime?
[From RC template] What is the anticipated (‘ballpark’ figure) of data volume that will be collected? Will this vary after processing?
[DCC 5.2.1]: How will you back-up the data during the project’s lifetime?
[DCC 5.2.2]: How regularly will back-ups be made?
Has the back-up process been tested and successfully validate?
Who is responsible for back-up process?
[DCC 5.3.1]: How will you manage access restrictions and data security during the project’s lifetime?
6 Deposit and Long-Term Preservation
6.1 Deposit and Long-Term Preservation
[DCC 6.1]: What is the long-term strategy for maintaining, curating and archiving the data?
[SCOPE NOTE – for further guidance curation and archiving of data sets, please see …]
[DCC 6.2.1]: Will or should data be kept beyond the life of the project?
What is your deletion policy? Will data sets be deleted? When, by whom and how will they be identified?
[DCC 6.2.2]: If you answered Yes to DCC 6.2.1, How long will or should data be kept beyond the life of the project?
[DCC 6.2.3]: If you answered Yes to DCC 6.2.1, What data centre/ repository/ archive have you identified as the long-term place of deposit?
What is the anticipated (‘ballpark’ figure) of data volume that will be archived?
[DCC 6.2.7]: Will transformations be necessary to prepare data for preservation and/or data sharing?
[SCOPE NOTE – for further guidance data set transformations, please see …]
[DCC 6.2.8]: If you answered Yes to DCC 6.2.7, what transformations will be necessary to prepare data for preservation / future re-use?
[DCC 6.3.3]: Will you include links to published materials and/or outcomes?
[SCOPE NOTE – for further guidance on include links to published materials and/or outcomes, including the Research Data Catalogue, please see …]
[DCC 6.3.4]: If you answered Yes to [DCC 6.3.3], please give details.
[DCC 6.3.5]: How will you address the issue of persistent citation?]
[SCOPE NOTE – for further guidance persistent citation, please see …]
[DCC 6.4.1]: Who will have responsibility over time for decisions about the data once the original personnel have gone?
7 Resourcing
7.1 Resourcing
[DCC 7.1]: Outline the staff/organisational roles and responsibilities for research data management
[DCC 7.2]: How will data management activities be funded during the project’s lifetime?
[DCC 7.3]: How will longer-term data management activities be funded after the project ends?
Describe how funding for RDM has been specifically been costed into funding application (where appropriate).
[SCOPE NOTE – for further guidance on costings for RDM, please see …]
8 Adherence and Review
8.1 Adherence and Review
[DCC 8.1.1]: How will adherence to this data management plan be checked or demonstrated?
[DCC 8.1.2]: Who will check this adherence?
[DCC 8.2.1]: When will this data management plan be reviewed?
[SCOPE NOTE – for further guidance on review points for for RDM plans, please see …]
[DCC 8.2.2]: Who will carry out reviews?
9 Actions Required
9.1 Actions Required
Please list actions and timelines against named individuals identified as a result of completing this RDM plan.
For example please indicate additional hardware, software and relevant technical expertise, support and training that is likely to be needed and how it will be acquired.
For any deferred or unanswered questions outline how you plan to seek advice.
Action: / Responsibility: / Review Date:-: / -: / -:-: / -: / -:
Signature Date
Print name Role/Institution
Signature Date
Print name Role/Institution
Signature Date
Print name Role/Institution

[Attribution]

DMPOnline: https://dmponline.dcc.ac.uk/

© Northumbria University School of Computing, Engineering & Information Sciences, 2012 cc: by-nc-sa DATUM DMP template

© Newcastle University, iridium project, 2012 cc: by-nc-sa

— — — —-

We are currently evaluating end user acceptance of this draft plan, time required to complete and support required to assist with writing.

iridium – postgrad evaluation of MANTRA RDM training – Sharing, Preservation and Licensing unit

From Amy.

The new unit from the MANTRA Data Management Training programme focuses on Sharing, Preservation and Licensing, which follows on well from the previous unit on Data Protection, Rights and Access. The module took about an hour to get through, making notes as I went, and I found it a useful introduction to a topic that I know fairly little about.

The unit discusses the reasons for and against sharing research data and the benefits that can be enjoyed by researchers who do decide to share data. Other guides that I have read on this topic seem to offer a more one-sided view of the debate as they are trying to encourage researchers to share data. While this is understandable, and ultimately the aim of increasing awareness will be that more researchers share more data, it can sometimes make the source appear slightly less credible. For this reason, I was really pleased that this unit included a section on the barriers to sharing research data. For the issue of confidentiality it offered the solution of anonymisation, but it also recognised that financial and ownership issues are sometimes capable of preventing sharing altogether. By recognising that not all research data can be shared, its advice on data that can be shared became more realistic.

The unit provides extensive benefits of sharing research data including scientific integrity, meeting funder requirements, increasing research impact and preserving data for personal future use. This is all underlined by the examples given of real-life cases where the repercussions of not properly preserving/sharing data have caused problems. The unit gives an example of a postgraduate research student whose project was spoiled because they could not access the relevant data. While this is useful, the point is underlined far more seriously by the examples given of researchers who were accused of falsifying data and not having the records to back up their research. One of the benefits given that I could identify with the most was the impact that sharing data can have on teaching. The unit suggests that using research data in teaching is a good way to teach students how to collect and analyse data. Also, in my experience as a student, some of the most interesting teaching sessions I have had were those when lecturers talked about their current or recent projects and showed us data that they had collected for these. It made teaching much more closely related to research and made us, as students, feel more involved with what was going on in the University than when you feel like you’re just being taught from a set syllabus.

The unit also covers issues on licensing and introduces Open Data Commons as a source of guidance and licences that are conformant with the principles set out in the Open Knowledge Foundation’s definition of open knowledge. The unit definitely succeeded in its aims as the information provided, combined with the activities which outlined key terms and definitions, were useful to me as a postgraduate student in consideration of my own research, but also in consideration of data that I am using that belongs to someone else.

See MANTRA http://datalib.edina.ac.uk/mantra/preservation.html

iridium – evaluation of DataStage and DataBank research data management tools from DataFlow project

DataFlow project background:

DCC catalogue record: http://www.dcc.ac.uk/resources/external/datastage

Two tools

(a) DataStage, for researchers to manage their research data locally.

DataFlow lets researchers save their work to a DataStage file system that appears as a mapped drive on their computer, a lightweight system requiring them to install no special software on their computers.

More details: http://www.dataflow.ox.ac.uk/index.php/datastage/users/researchers

(b) DataBank, to preserve and publish valuable research.

DataStage is a secure personalized ‘local’ file management environment for use at the research group level, appearing as a mapped drive on the end-user’s computer.

More details:  http://www.dataflow.ox.ac.uk/index.php/databank 

Firstly, it’s great that the DataFlow team have released this system openly for re-use. Below are some of our findings.

From a local technical infrastructure assessment:

Ubuntu is not our standard Linux platform (which is Red Hat/CentOS). It would almost certainly be possible to port the Dataflow packages to CentOS (and feed this back to the main project) or use Ubuntu as an appliance (but this would mean that the systems used for this would not be managed by our standard configuration system). Either option comes with a reasonably significant cost.

The feeling that we got from installation (testing prior to 24 July 2012) is that the system is in the early stages of its lifecycle and our assessment is that Dataflow is not yet of sufficient maturity to deploy in production at Newcastle. It would be worth re-evaluating this decision at a later time, this would be prioritised against end users who have tried the system i.e. the more that they liked it, the more worthwhile putting resources into trying it again/working with the DataStage developers.

In terms of initial user testing (in early August 2012/and on ‘v0.3.1rc2’ Oxford installation), initial feedback was:

User testing – DataStage:

Users liked the feature specification of what it offered as a tool (desktop integration through mapped drives, web access aiding working from home, do not need a designated computer for their research work, setting of different access writes (private, public, and collaborative) and the ‘invite to share’ options. System interface is fine, basic yet functional and could be ‘skinned’ to institutional brand. Uploading documents/data files is straightforward.

My opinion was if an institution had no existing RDM systems, it would be a very useful ‘bootstrap’ system providing a simple functional system.

Seamless integration of a data file staging system/VRE with the user desktop (ideally through ‘drag & drop’/mapping over existing user networked drives) and through web access are key features that are top of an ‘average’ researchers wish list.

Making sure research data sets can be appended with an appropriate level of metadata in ‘data staging’ RDM tools (or perhaps later in lifecycle as practical?), so that metadata can flow through to an eventual data catalogue/or national repository is important RDM requirement. Thus, making sure that this function is provided to researchers is important to flag and DataStage/DataBank are a good approach to this.

I thought more data file re-use metadata capture would have been an option in DataStage (noting manifest/Zip package upload feature), pulling in automatically from individual data file itself (that’s probably me being simplistic on technical aspects?) ahead of the DataBank stage?

We noted that not all users are comfortable or had success in Windows drive mapping (network path errors), so some end user support would be needed. Users have high expectations on usability – ‘as easy as DropBox’.

Error messages while testing – access forbidden, 505/405, ‘submit as data package’ – where an entered/saved password was looping? (more helpful customisation of error messages, such as ‘this problem normally occurs because of x, y or z – wrong password, wrong file path, etc.’. (rather than ‘Error 505’/’Error 404’ would be helpful.

User testing – DataBank

 Liked:

– Simple, clean functional interface – again could be ‘skinned’ to instituitional brand.

– Current search/’on-off’ filters was good

– Assigning a DOI/RDF were useful RDM specific features.

– Licensing/embargo fields

– Simple admin interface

– CSV/JSON exports are useful

– Rest API was documented

 Suggestions:

– Clarifying, who was intended user audience for DataBank? Researcher or archivist?

– Terminology – not understood by user testers – ‘Silo’, ‘Mediator’, ‘Aggregate’ – obviously this could be changed easy.

– RDF and click through access to XML schema was confusing for our testers (they were not archivist, librarians, metadata experts – who would probably appreciate this function – i.e. package/manifest upload/explore)

– A basic tagging interface/fields to populate the RDF/XML for none specialists would be more friendly

– Again frequent error messages (404 not found/ 500 Internal Server Error, ‘Add manifest’ gives 505)

Documentation for DataStage/DataFlow researcher end users:

User documentation for researchers seemed a little sparse (I think the project/developers noted it is a work in progress i.e. https://github.com/dataflow/RDFDatabank/wiki). More end user documentation would facilitate wider take up. To note, technical installation documentation was more detailed with screen shares, etc.

We look forward to further DataFlow project developments.

DataFlow user forum is at: https://groups.google.com/forum/?fromgroups=#!forum/dataflow-users

iridium – JISC MRD02 monthly updates to January ’13

Progress to date

Workpackage 0 (Project Management)

  • 4th Steering Group meeting took take place on 08 January 2013, with two further planned for April and June 2013

Workpackage 1 (Requirements)

  • activities completed

Workpackage 2 (Policy)

Workpackage 3/4/5 (Tools, Systems, & Implementation)

  • Research Data Catalogue (RDC) user testing data loaded
  • DCC DMP Online tool used to host draft iridium institution-specific post-award DMP template
  • local CKAN test system established
  • e-Science Central Shibboleth authentication added, with SWORD protocol functionality being investigated

Workpackage 6 (Human Support)

  • additional good practice guidance content and revised draft policy principles added to developing RDM support website
  • meetings with Netskills on 09 Jan and 13 Feb 2013 to outlined RDM workshop and online training requirements
  • documentation for RDM tools authored
  • productive meeting with Staff Development Unit on 14 February 2013

Workpackage 8 (Evaluation)

  • draft RDM policy principles for available for open consultation from institutional website
  • testing fitness for purpose of identified RDM tools underway (see https://iridiummrd.wordpress.com/?s=tools+evaluation)
  • RDC minimal addition metadata entry testing and user acceptance will take place from February 2013, with 12 PIs invited to record data location (representing 39 research projects and 474 publications)
  • iridium support team have evaluated project RDMP template implementation within DMP Online system, to be followed by research funding support staff and researchers

Workpackage 9 (Dissemination)

  • Niall O’Loughlin (RES) gave a talk to research office representatives from Brunswick Group on 25 January 2013 on ‘Open Research Data’.

Activities in the last months

  • Dr Ben Allen and Dr Simon Kometa registered to attend ‘CKAN for RDM’ on 18 February 2013
  • Paul Haldane registered to attend ‘RDM Storage Workshop’ on 25 February 2013
  • Niall O’Loughlin registered to attend ‘Research data in the Visual Arts’ event on 06 March 2013

Risks & issues

Risk/issues likely to present in the coming 3 months are:

  • project still needs to be very clear that the iridium project is not about data storage space provision
  • managing researchers’ expectations about provision of RDM tools
  • REF 2014 will be a major priority for researchers and institution in coming months

Milestones & challenges

In the coming 3 months, identified milestones are:

  • project outputs (policy, tools and training) aligned into pilot infrastructure
  • pilot infrastructure evaluated against research projects’ needs
  • monitoring institutional IT re-structuring for embedding of project outputs
  • promoting uptake of pilot infrastructure and outputs
  • business case authoring

iridium JISC MRD02 – monthly updates to November ’12

Progress to date

Workpackage 0 (Project Management)

  • draft proposal for extension of project submitted
  • full-team meeting took place on 12 November 2012 to review project against JISC Progress Meeting RDM ‘components’

Workpackage 1 (Requirements)

  • requirements gathering outputs disseminated locally and externally to JISC MRD Programme
  • resultant project actions dissemination locally to stakeholders through Registrar’s regular update to Heads of Departments
  • anonymised requirements data shared with local projects and initiatives (Computing Science data security Choice Modelling, Digital Campus Initiative Infrastructure Discovery and ISS Information Security projects)

Workpackage 2 (Policy)

  • working group met on 06 September and 27 September 2012
  • policy principles and Code of Good Practice have been revised and are in a mature draft form
  • briefing of senior project advocates is planned ahead of tabling of draft policy document at URC
  • permission will be requested from URC for draft policy principles to be published on project website for open consultation

Workpackage 3/4/5 (tools, systems, & implementation)

  • research data catalogue will be tested with subset of Changing Age researchers
  • draft Newcastle-specific post-award DMP template will be tested within DCC DMP Online system from Decemebr 2012
  • SWORD2 protocol investigations blogged
  • CKAN platform to be tested

Workpackage 6 (Human support)

  • RDM support materials writing sessions continued on 08 and 18 October 2012
  • RDM support website has being populated with FAQ, draft policy principles and good practice guidance content
  • documentation of selected RDM-specific tools continuing
  • Dr Simon Kometa (ISS) attended JISC ‘Research Data Management Training’ workshop on 26 October 2012

Workpackage 8 (Evaluation)

  • draft policy principles reviewed in light of comments received from stakeholder to date
  • evaluation to date reported internally on tools (i.e. e-Science Central and DMP Online)
  • Lindsay Wood attended JISC benefits meeting in Bristol 29-30 November 2012

Workpackage 9 (Dissemination)

  • newsletter-style project update disseminated on 05 October 2012
  • project disseminated at JISC Progress Meeting on 24-25 October 2012
  • Niall O’Loughlin (RES) gave a talk to AFRD Research Committee on 09 November 2012

Activities in the last months

  • EPSRC-funded FRICCTT report published that referenced JISC iridium project
  • local EPSRC-funded Cyber Security project awarded that referenced JISC iridium project requirements gathering data in case for support
  • Lindsay Wood attend JISC IRIOS-2 RIM meeting in Newcastle on 21 September 2012
  • Suzanne Hardy (MEDEV) attended the DataCite citing sensitive data workshop on 29 October 2012 at the British Library
  • Dr Ben Allen (ISS) attended RDMF9 workshop on 14-15 November 2012 in Cambridge

Risks & issues

Risk/issues likely to present in the coming 3 months are:

  • project still needs to be very clear that the iridium project is not about data storage space provision
  • managing researchers’ expectations about provision of RDM tools
  • degree of policy revisions post-URC, that might be required, is unknown
  • maintaining good communication links with the URC during project team staffing changes
  • REF 2014 will be a major priority for researchers and institution in coming months

Milestones & challenges

In the coming 3 months, identified milestones are:

  • draft policy approved by URC and then by Executive Board
  • project outputs (policy, tools and training) aligned into pilot infrastructure
  • pilot infrastructure evaluated against research projects’ needs
  • monitoring institutional IT re-structuring for embedding of project outputs
%d bloggers like this: