iridium – ‘core’ institutional research data management plan development

Research data management plan authoring is a key part of our draft institutional RDM policy and good practice. Most RCUK funders (apart from EPSRC, currently) require a formal RDMP. NERC now require a pre- and post award RDMP.  These types of templates are available in the DMP Online system.

We wanted to write an institutional RDMP within iridium for research projects that do not have a Funder mandated template (a fair proportion, ~66% of research projects?).  This was to be as easy to complete by end user as possible (i.e. low time burden for researchers) and to be used across Faculties (disciplines) if possible.

What are the ‘essential’ RDMP questions (a ‘core plan’, Donnelley, 2012)? We reviewed several RCUK RDMP templates from different disciplines for similarities, but also distinctive and pertinent questions. Also we had project specific criteria together with good practice from DATUM RDMP template with strong actions and review (‘active plan‘) emphasis.

It was decided to pursue a post-award RDMP template approach for projects without a mandate plan, as less it was burden to write ‘core’ plans for projects that were not awarded in the end and maximise uptake. We noted need for key aspects of RDMP planning to be brought forward in pre-award processes and RIM systems (such as ethics which is already strongly monitored institutionally, but also including RDM costs/atypical data volume size (plus extended curation duration?)). For example, recommending for questions/planning to be a ‘flag’/check-box in  a ‘minimal’ (‘ultra-minimal’?) RDMP check-list in existing RIM systems/pre-award Faculty peer review process.

—- —- —

iridium institutional template post-award RDMP v5 [DRAFT]

This template is for projects that DO NOT have a Funder mandated research data management (RDM) plan. Funding body requirements relating to the creation of a research data management plan are available from …

{ Our institution RIM system MyProjects contains research project administration data (see below). In the long term it would be useful to have this imported and auto-populated into a RDMP direct from RIM system. This aligns to the ‘header’ information in the DMP Online template }

Reference:
Proposal Type:
Proposal Title:
Proposal Short Title:
… ….  … …. etc.

Contact details of named individuals (Role/Name/Unit):

MyProjects Owner:

Date of creation of this plan:
Plan version/supersedes:

 Aims and purpose of plan: … …

[SCOPE NOTES: Guidance on completion of this plan is available from …. ‘DCC 1.x references link to additional guidance provide by the Digital Curation Centre]

1 Introduction and Context
1.1 Introduction and Context
[DCC 1.2]: Short description of the project’s fundamental aims and purpose
[DCC 1.3.2(re-worded)]: Describe how you have considered the Newcastle University RDM institutional policy and any Faculty/research group guidelines, together with any other policy-related dependencies:
[From RC template] Document the RDM advice you have sought on planning your proposed project, including any consultation with projects using similar methods.
[DCC 10.2]: Glossary of terms
2 Data Types, Formats, Standards and Capture Methods
2.1 Data Types, Formats, Standards and Capture Methods
[SCOPE NOTE – for further guidance on ‘data’ definitions and the capture of non-digital data, please see XYZ]
[DCC 2.1]: Give a short overview description of the data being generated or reused in this research
[SCOPE NOTE – for further guidance on ‘open’ file formats, please see …]
[DCC 2.3.3(re-worded)]: Which open file formats will you use, and why?
DCC 2.3.4: What criteria and/or procedures will you use for Quality Assurance/Management?
[SCOPE NOTE – for further guidance on ‘Quality Assurance/Management, please see …]
DCC 2.5.1: Are the datasets which you will be capturing/creating self-explanatory, or understandable in isolation?
[DCC 2.5.2]: If you answered No to [DCC 2.5.1], what contextual details are needed to make the data you capture or collect meaningful?
[DCC 2.5.3]: How will you create or capture these metadata?
[DCC 2.5.4]: What form will the metadata take?
3A Ethics
3A Ethics
HAVE YOU COMPLETED A NEWCASTLE UNIVERSITY ETHICS APPLICATION?[YES] [NO] [NOT APPLICABLE] REFERENCE NUMBER:{ We already have strong RIM/institutional check points for ethics, we don’t want to duplicate information gathering, thus this section is brief. }
3B Intellectual Property
3B Intellectual Property
[SCOPE NOTE – for further guidance Intellectual Property/licensing, please see …]
[DCC 3.2.1]: Will the dataset(s) be covered by copyright or the Database Right? If so give details in DCC 3.2.2, below.
[DCC 3.2.2]: If you answered Yes to [DCC 3.2.1], Who owns the copyright and other Intellectual Property?
[DCC 3.2.3]: If you answered Yes to [DCC 3.2.1], How will the dataset be licensed?
4 Access, Data Sharing and Re-Use
4.1 Access, Data Sharing and Re-Use
[From Research Council template] Are there issues of consent, confidentiality (including commercial), anonymisation and other ethical considerations?
[From RC templates] What are the main risks to data security/ confidentiality?
[DCC 4.2.3]: Are there any embargo periods for political/commercial/patent reasons?
[DCC 4.2.4]: If you answered Yes to DCC 4.2.3, Please give details.
[DCC 4.3.1]: Which groups or organisations are likely to be interested in the data that you will create/capture?
[DCC 4.3.2]: How do you anticipate your new data being reused?
[DCC 5.3.2]: How will you implement permissions, restrictions and/or embargoes?
[DCC 4.1.1]: Are you under obligation or do you have plans to share all or part of the data you create/capture?
[DCC 4.1.3]: If you answered Yes to DCC 4.1.1, How will you make the data available?
[DCC 4.1.4]: If you answered Yes to DCC 4.1.1, When will you make the data available?
[DCC 4.1.5]: If you answered Yes to DCC 4.1.1, What is the process for gaining access to the data?
[From RC template] What will be the responsibilities of data sets users (for example as detailed in a ‘Statement of Agreement’)?
[SCOPE NOTE – for further guidance responsibilities of data sets users and ‘Statement of Agreement’ wording, please see ….]
[DCC 4.1.6]: Will access be chargeable?
5 Short-Term Storage and Data Management
5.1 Short-Term Storage and Data Management
[DCC 5.1.1]: Where (physically) will you store the data during the project’s lifetime?
[DCC 5.1.2]: What media will you use for primary storage during the project’s lifetime?
[From RC template] What is the anticipated (‘ballpark’ figure) of data volume that will be collected? Will this vary after processing?
[DCC 5.2.1]: How will you back-up the data during the project’s lifetime?
[DCC 5.2.2]: How regularly will back-ups be made?
Has the back-up process been tested and successfully validate?
Who is responsible for back-up process?
[DCC 5.3.1]: How will you manage access restrictions and data security during the project’s lifetime?
6 Deposit and Long-Term Preservation
6.1 Deposit and Long-Term Preservation
[DCC 6.1]: What is the long-term strategy for maintaining, curating and archiving the data?
[SCOPE NOTE – for further guidance curation and archiving of data sets, please see …]
[DCC 6.2.1]: Will or should data be kept beyond the life of the project?
What is your deletion policy? Will data sets be deleted? When, by whom and how will they be identified?
[DCC 6.2.2]: If you answered Yes to DCC 6.2.1, How long will or should data be kept beyond the life of the project?
[DCC 6.2.3]: If you answered Yes to DCC 6.2.1, What data centre/ repository/ archive have you identified as the long-term place of deposit?
What is the anticipated (‘ballpark’ figure) of data volume that will be archived?
[DCC 6.2.7]: Will transformations be necessary to prepare data for preservation and/or data sharing?
[SCOPE NOTE – for further guidance data set transformations, please see …]
[DCC 6.2.8]: If you answered Yes to DCC 6.2.7, what transformations will be necessary to prepare data for preservation / future re-use?
[DCC 6.3.3]: Will you include links to published materials and/or outcomes?
[SCOPE NOTE – for further guidance on include links to published materials and/or outcomes, including the Research Data Catalogue, please see …]
[DCC 6.3.4]: If you answered Yes to [DCC 6.3.3], please give details.
[DCC 6.3.5]: How will you address the issue of persistent citation?]
[SCOPE NOTE – for further guidance persistent citation, please see …]
[DCC 6.4.1]: Who will have responsibility over time for decisions about the data once the original personnel have gone?
7 Resourcing
7.1 Resourcing
[DCC 7.1]: Outline the staff/organisational roles and responsibilities for research data management
[DCC 7.2]: How will data management activities be funded during the project’s lifetime?
[DCC 7.3]: How will longer-term data management activities be funded after the project ends?
Describe how funding for RDM has been specifically been costed into funding application (where appropriate).
[SCOPE NOTE – for further guidance on costings for RDM, please see …]
8 Adherence and Review
8.1 Adherence and Review
[DCC 8.1.1]: How will adherence to this data management plan be checked or demonstrated?
[DCC 8.1.2]: Who will check this adherence?
[DCC 8.2.1]: When will this data management plan be reviewed?
[SCOPE NOTE – for further guidance on review points for for RDM plans, please see …]
[DCC 8.2.2]: Who will carry out reviews?
9 Actions Required
9.1 Actions Required
Please list actions and timelines against named individuals identified as a result of completing this RDM plan.
For example please indicate additional hardware, software and relevant technical expertise, support and training that is likely to be needed and how it will be acquired.
For any deferred or unanswered questions outline how you plan to seek advice.
Action: / Responsibility: / Review Date:-: / -: / -:-: / -: / -:
Signature Date
Print name Role/Institution
Signature Date
Print name Role/Institution
Signature Date
Print name Role/Institution

[Attribution]

DMPOnline: https://dmponline.dcc.ac.uk/

© Northumbria University School of Computing, Engineering & Information Sciences, 2012 cc: by-nc-sa DATUM DMP template

© Newcastle University, iridium project, 2012 cc: by-nc-sa

— — — —-

We are currently evaluating end user acceptance of this draft plan, time required to complete and support required to assist with writing.

Advertisements

2 Responses to iridium – ‘core’ institutional research data management plan development

  1. Tim Banks says:

    Thanks for a useful blog post. I’m interested in the re-wording of DCC 2.3.3 to ‘Which *open* file formats will you use, and why?’. Whilst I accept that it is best practice to store data in an open file format for the purposes of long term preservation, this is not always possible (or sometimes even desirable during the course of the research project). For example, the researcher may need to use a proprietary format whilst analysing or editing the data and will only convert the final data set an open format prior to submission into a repository.

    There may also be no suitable open file formats for the particular type of data being processed (or more commonly that the process of converting to an open format results in the loss of elements being lost, such as conditional formatting in a spreadsheet). In the case of the latter, the researcher may well choose to store both the open and proprietary version of the file. In the case where there is no open format available, then storing a VM containing the version of the software needed to open the files may be a last resort. There’s an interesting blog post from Chris Rusbridge which covers this area here:
    http://unsustainableideas.wordpress.com/2012/09/27/oh-no-not-emulation/

    Either way, wording the question to presume that open file formats will always be used will probably not reflect the reality of the situation for many projects.

    I’m also interested in the length of the plan (c. 55 questions). This seems very long, even for award stage and I wonder whether this in itself may be a barrier to completion. It’s also worth noting DCC’s recent announcement relating to a major changes to DMPOnline including a substantial reduction in the number of questions included in the DCC checklist:
    http://www.dcc.ac.uk/news/future-plans-dmponline Have you had any feedback from researchers who have completed one of these plans?

    David Shotton’s blog post ’20 questions for RDM planning’ also suggests an approach based around a smaller question set:
    http://datamanagementplanning.wordpress.com/2012/03/07/twenty-questions-for-research-data-management/

    There does seem to be a general move towards a shorter question set. Jez Cope’s work in this area resulted in a 27 question plan; the DataTrain project had a very minimalist 12 questions and the California Digital Library’s DMP tool uses a few as 17 questions.

    Tim Banks
    University of Leeds
    RoaDMaP project

    • Lindsay Wood says:

      Thank you for your feedback. Yes, with *open* file formats I was being presumptuous. For that question I was deliberately starting from a ‘position of openness as the default’ and expecting justification for why ‘open’ was not possible. Partly to test the water, but also to cut questions down from ‘what file formats will you use’ leading to ‘have you considered open file formats’ sequence.
      Your comments are of course valid on if open file formats are always available or desirable. The store both options (open and proprietary version) approach make sense.
      Storing the VM, sounds like a useful approach. We discussed these issues locally, early in our project, without resolving all the permutations.
      Yes, at ‘55 questions’ (including sub-questions/conditional questions) as formatted in the blog post it looks overbearing. You are not the first person to say that! Timing required will be a barrier to uptake. I think adding any extra requirement over ~20 minutes will be a barrier unfortunately.
      The condition questions might not be relevant to all projects and it may well be possible to shorten template. The sub-questions could be written as scope notes/guidance and are in effect largely that. The sub-questions are mostly guiding to give deeper planning. For example, several question could be combined (i.e. on detail of backups & adherence), but the sub-question text would go into scope notes or related support/good practice training. Without this we might expect to get short ‘Yes, we will do backups’ responses.
      I am definitely in favour of as short as possible RDM plans. We noted a couple of questions we could cut because of overlap. However, we do want a useful, robust RDM plan and not just ‘tick box’ activity.
      As a project recommendation, pre-award, a very short minimal RDM ‘plan’ (ethics, costs, etc.), check box activity is likely within RIM systems.
      I think future demand for writing RDMP will be somewhat of culture shock and not popular. It would be good if the 10 or so key questions of RDM planning were in day to day ‘working memory’ and followed more widely.
      I looking forward to future developments with DCC Checklist (http://www.dcc.ac.uk/news/future-plans-dmponline) – our own template is a filtering down of the various question pools, including taking the essentials of 3 funders plans. It is possible that some institutional RDMP templates are more demanding than some individual funders requirements.
      Testing the template is where we now and we will report about that later on. In terms of our postgrad support team, some did like the more detailed template plans.
      List of 55 includes condition questions (which might not be relevant to all projects) and sub-question. It does look long in blog format, and it may well be possible to shorten Scope notes/ guidance could be also written as text or questions. Some of our sub-question could be written as scopes note – the additional question are guiding. Several question could be combined (i.e. on detail of backups & adherence), but the text would go into scope notes or assume support/good practice or expect short and ill thought out ‘Yes, we will do backups’

      Not sure if there is longer-term move to shorter plans, I would not be opposed to that. Or, just that some short plans have been released recently. Certainly they will be more popular and there is no use having a wonderful long template plan that results in good RDM planning that no one uses. We looked at and liked the Shotton (http://datamanagementplanning.wordpress.com/2012/03/07/twenty-questions-for-research-data-management/) and Bath plans (http://blogs.bath.ac.uk/research360/2012/03/postgraduate-dmp-template-first-draft/) (see blog post https://iridiummrd.wordpress.com/2012/10/06/iridium-early-findings-on-research-data-management-planning-approaches-tools-and-writing-plans/). The Shotton post describes a ‘basis of a workable research data management plan’ and once in a while, you should revisit these questions and see whether your data management practices can be improved, updating’. The Bath plan at ~27 questions has additional questions/scope notes in text boxes (so not so different to ours?). It is also designed for postgrad students? With background notes that some sections/questions were not added as not appropriate for postgrad use? DataTrain (http://archaeologydataservice.ac.uk/attach/DataTrainDownload/Post-Graduate_DMP_Form.pdf) was also designed for postgrads, so you would expect it to be shorter?
      The California DL plan I had not seen before, so will take a look.
      I have not seen figures on how much time an ‘average’ RDMP takes to complete and how long is too long. For a £250,000 research project, you would expect a good level of RDMP planning? Similarly, if you want to keep data for 10+ years, that will require some detailed planning.
      But, should then a £5,000 project award have a less detailed RDMP plan? Is the value of an award any indicator of potential reputation issues or data sensitivity levels should RDM turn out to be poor and issues arise?
      Some discussions locally have asked if each research group should have their own ‘Standard Operating Procedure’ for RDM planning associated with day to day research activities that would be easier to re-use and adapt for each new project as needed. The first RDMP would probably be the most time consuming to author as you are starting from scratch.
      Do shorter RDMP templates produce good, robust planning, if RDM concepts are new to author? Or do they produce RDMP that is just ‘good enoug’h (or better than nothing)? Probably starting shorter and build up might be less of a shock to system. What would be a stronger motivator for time spent on good RDMP? How many RDMP are returned to authors to consider further planning/detail? Is review/auditing of RDMP at institutions or Funder level getting stricter, yet?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: