iridium – summary of thematic analysis of RDM researchers’ requirements from interviews

iridium thematic analysis – summary of qualatative interviews

Process

  • A two stage thematic analysis was conducted on the interview data generated by the Iridium Project.
  • 29 Interviews underwent Analysis.
  • The first stage of Analysis was deductive, and conducted by interviewers on the transcripts of interviews.
  • The deductive analysis was categorised into 5 initial themes:
    • Perception – To ascertain interviewee’s concepts of data and data management.
    • Purpose – To ascertain interviewee’s data usage and destination.
    • Process – To ascertain interviewee’s data lifecycle.
    • People – To ascertain the people involved in the data lifecycle.
    • Provoking – A Catchall to gather any other salient points expressed by interviewees.
  • Interim Categorisation Results:

iridium thematic analysis figure

  • The second stage of the analysis was inductive.
  • Uses results of deductive analysis as a starting point – attempts to build meaningful themes from whole data corpus.
  • Analysis done by a single researcher.
  • Generates a new set of themes

Final Themes

Diversity

Informed by elements from across the deductive categories, in particular perceptions of data usage, longevity and security, purpose of data, the people involved and the processes required.

The overarching theme is that across many aspects of data management there is a great deal of diversity amongst users and any policy should enable users to achieve best practice rather than apply a one size fits all “solution” to data management.

Data Analysis

Generated mainly from the process and perception of data and its life-cycle, a Data Analysis theme was apparent across many interviewees.

The consensus was that much of the processing of data that currently takes place on local machines would be more efficiently accomplished on larger scale servers, but that users are largely unaware that such services may exist in the university.

Longevity / Life Cycle

The longevity of data was a strong theme from across all categories. There was a strong consensus from interviewees that data should never be thrown away. There should be a separate system for archiving data and current data.

Any policy should attempt to support long term storage of research data, and the access to it, as well as current data.

Responsibility

The strongest theme coming from the people category was that of responsibility – who should be doing what with the data, with storage, with security and with access. Many interviewees were unclear about what falls to them and what the responsibility of the university is.

The recommendation to take forward is to make the situation clear, and provide training if needed for users.

Sharing and Collaboration

Another strong theme that emerged from across the initial categories is the concept of data access, in particular sharing data with collaborators, both internal and external. This becomes problematic with very large data sets, or with collaborators insisting on using “their” systems.

The university should provide a method of sharing data post publication, which should be linked to publications and to researcher’s profiles, and provide flexible guidelines on sharing with collaborators.

The full thematic analysis report is available to download: http://research.ncl.ac.uk/media/sites/researchwebsites/iridium/iridium_interview_thematic_analysis_5_7_2012_v1_PH.pdf

iridium – summary of online RDM requirements gathering survey findings

Quantitative online survey –  summary

Key Findings:

  • One hundred and twenty eight projects completed the online survey and over half of the projects are from the Faculty of Medical Sciences (nearly 52%).
  • Over 97% of projects’ data is in digital / physical format.
  • Generally projects have many files e.g. 23.4% of projects have between 100- 1000 files, 28.1% between 10-100 files and 19.5% have between 1000 – 10000 files.
  • Thirty one percent of projects’ files take up to 4GB of space. Just over 11% of projects’ files take up between 64 – 0.5TB. One project had more than 100TB, but no project required more than 1 PB.
  • For nearly 54% of projects space required at collection is greatly different from space required after processing / analysis. However for nearly 29% of projects the space was not greatly different.
  • There was lots of variability in data file format amongst projects. Some projects used many different file formats indeed. However Excel was the most common file format.
  • About 30% of projects said they store their data on ISS managed systems; about 18% of projects used academic unit managed systems and about 21% used personal systems. A small number of projects used external systems / services such as cloud and SurveyMonkey.
  • The majority of projects (nearly 43%) intend to keep data for 5 to 10 years. Just over 29% intend to keep data for 10 – 25 years and about 17% intend to keep data for more than 25 years. Eleven percent intends to keep data for 1 – 5 years.
  • 93% of projects have multiple copies / partial back up of data.
  • Nearly 49% of the projects have tested how successful it will be to retrieve backed up data.
  • Just over 73% of projects share their data with others within the University.
  • Just over 50% of projects share data externally.
  • Of 64 projects (50%) who share data external to the University nearly 30% don’t have any agreement in place while nearly 33% have other types of agreement not specified.
  • About 56% of projects have a data management plan or partial / informal plan.
  • Nearly 65% of projects don’t have any specific tool for RDM.
  • The majority (46%) of projects said that they do not have any deletion policy and just fewer than 17% have a deletion policy.
  • For over 75% of projects data have to be quite secure or very secure.
  • 84% of projects said they store their data quite securely or very securely.
  • Over 90% of projects used password, anonymisation or physical measures for data security.
  • Not many projects are aware of the policies and legislation that applies to their data e.g. only 63% of projects are aware of DPA and only 40% are aware of FOI.
  • 64% of projects said that the PI should have the primary responsibility for RDM support. Next in line was the Research Associate with just 11%.
  • 55% of projects believe that going forward the PI should still have the primary responsibility for RDM support. All the other officers got less than 10% of the vote except for computing support officer where just over 17% of projects think they should have the primary responsibility for RDM support.
  • Only 5 projects said they are aware of training sessions and materials on RDM.
  • 60% of projects gave a positive response to make their research data publicly available at the end of the project.
  • 73% of projects have not deposited any of their data in a data repository.
  • Nearly 60% of projects are willing to submit data to a data repository.
  • An overwhelming majority of projects (nearly 80%) are happy to submit data to a repository at the publication stage.
  • Nearly 41% of projects are willing to make data supporting any publication available immediately.
  • Nearly 73% of projects are willing to share data if they have control over who can access the data.
  • There is no clear consensus from projects on intellectual property rights (IPR), just over 30% of projects believe that it is owned by the University and about 17% of projects do not know; just over 19% think it belongs to their research group and 10.4% said other. For about 7% and 15% of projects it belonged to the funder and the researcher respectively.
  • The majority of projects were funded by either charity or research council 35.5% and 31.4% respectively, that is, a total of nearly 67%.

For more details, see full survey report: http://research.ncl.ac.uk/media/sites/researchwebsites/iridium/iridium_online_survey_report_17_8_2012_v2.1_SK.pdf

iridium – poster presentation of RDM thematic analysis at Digital Research 2012, Oxford

iridium project team members from the Digital Institute presented a poster on the JISC-funded thematic analysis work carried out with local research and related staff to gather and understand RDM requirements. The work was present at the recent Digital Research 2012 conference in Oxford.

iridium DI Digital Research 2012 poster

iridium DI Digital Research 2012 poster

iridium – RDM systems/tools ‘connectivity’ (busy researchers don’t like duplication of metadata entry!)

Briefly when discussing potential scope for the proof-of-concept Research Data Catalogue (RDC) we talked of *possible* future ‘connectivity’ needs.

We noted that researchers have told us *very* clearly that they do not want to enter research project admin data/outputs metadata twice in multiple systems, either internally or externally to institution.

This requires us to understand some of the systems the RDC may need to exchange metadata with that have existing information already entered. These could be local research group metadata catalogues, local/national repositories and other online systems (for example, we would need to outline the technical protocols (standards) for interoperability with national funder research output systems our researchers have been using).

1. Possible external connectivity needs?

i) ROS?
http://www.rcuk.ac.uk/research/Pages/ResearchOutcomesProject.aspx
– example -> https://logon.rcuk.ac.uk/
Schema: ?
API: ?

iii) MRC e-Val/Research Fish?

https://www.researchfish.com/

“Can I integrate the data from e-Val with other systems?

At Researchfish we want to reduce the burden on researchers required to provide information to their Funders and other organisations. As such we offer an API to allow you to use the data in other systems and the data is available for you to download at any time. We are also working with a number of organisations such as EuroCris to ensure future compatibility of our data with other research information systems”

Schema: ?
API: Yes, need details.

iii) Je-S?

https://je-s.rcuk.ac.uk/JeS2WebLoginSite/Login.aspx
Schema: ?
API: ?

2. Possible internal connectivity needs?

i) A future desktop tool?

ii) Sakai VRE?

iii) e-Science Central?

iii) Most likely research group data set/outputs catalogues/’repositories’ (at least 1 possible example  identified), but many have not been discovered yet, so a standard interoperable feed (in and out) are important i.e. OAI-PMH, SWORD2, (CERIF/XML, RSS maybe ?)

NB: ‘JISC and Research Councils UK work to reduce reporting burden on Universities‘ post from August 2012 is topical.

iridium – research data management requirements online survey

The iridium project online RDM requirements survey ran until 11th May 2012. Responses are currently being analysed. Below is the content of the online survey.

iridium research data management project online survey

This short online survey forms part of the iridium project requirements gathering that aims to assess and make recommendations on how we handle research data as an institution and to plan what developments are needed in the future.

You can find more information on the iridium project at: http://research.ncl.ac.uk/iridium

All fields marked (*) are required.

1. Context of responses

1.1. Please state the name of your research project (*).
1.2. Please state Faculty (or Faculties) principally involved in the project.
  • HaSS
  • FMS
  • SAgE

2. Thinking about your data

2.1. What format are your data in?
  • Physical
  • Digital
  • Both
  • Other (please specify)
If other, please specify.

3. For any digital research data

3.1. Approximately how many files exist?
  • 1 – 10
  • 10 – 100
  • 100 – 1,000
  • 1,000 – 10,000
  • 10,000 – 100,000
  • 100,000 – 1,000,000
3.2. What is the total amount of space your files take up at this stage?
Guidance:
4GB (the storage space of your ISS home folder)
16 GB (the storage space of a memory stick)
64 GB (the storage space of an iPod)
0.5 TB (the storage space of a typical external hard drive)

Note: This question is considering space for files (and NOT storage location).

  • 0 < > 4GB
  • 4 < > 16 GB
  • 16 < >  64 GB
  • 64 < > 0.5 TB
  • 0.5 < >  1TB
  • 1 TB < > 10 TB
  • 10 TB < > 100 TB
  • 100 TB < >1 PB >
  • 1 PB+
3.3. Does the total amount of space required at collection differ greatly after processing/analysis phase?
For example, 1 PB for collection phase, processed to 10 GB for long-term storage
  • Yes
  • No
  • I don’t know
3.4. What are the top three formats that your data are stored in?
For example, Excel, Mdb database, WAV audio, TIFF images, PZF statistics, ‘open’/proprietary software formats, etc.
  • Data format 1
  • Data format 2
  • Data format 2
3.6. Where do you store your data?
Select all that apply.
  • ISS managed systems
  • Academic Unit managed systems
  • Project managed systems
  • Off-campus managed system
  • Personal systems
  • External Cloud (IaaS) e.g. Amazon/EC2
  • External Service (SaaS) e.g. SurveyMonkey
  • At home
  • Other (please specify)
3.7. How long do you intend to keep the data for?
  • <1 year
  • 1 < > 5 years
  • 5 < > 10 years
  • 10 < > 25 years
  • 25+ years
Digital data section – additional details.
If you have answered ‘other’ to any previous questions, please specify here.

4. Research data management

4.1. Are there multiple copies of the data (i.e. backed up)?
For example, copies on office PC and at home, external hard drive copy, off-site storage copy, mirrored servers, etc.
  • Yes
  • Partial back up
  • No
  • I don’t know
4.2. Have you successfully tested retrieving data from the backup?
  • Yes
  • No
  • Don’t Know
4.3. Do you share or potentially share data with others in the University?
  • Yes
  • No
  • I don’t know
4.4. Do you share or potentially share data with people external to the University?
  • Yes
  • No
  • I don’t know
If yes to the above, is there an agreement in place to govern the sharing?
For example, codified agreement, Memorandum of Understanding, consortium agreement, licence to share, etc.
  • Codified
  • Non codified
  • Other agreement
  • No agreement
  • I Don’t Know
  • Not applicable (no sharing)
4.5. Do you have a research data management plan?
A data management plan typically includes what data will be created, ethics/intellectual property, storage, access and methods for sharing, timeframes and any restrictions that are required.
  • Yes
  • Partially/informal
  • No
  • I don’t know
4.6. Which tools do you use to manage your research data?
For example, online or local tools, toolkits, or applications to assist with organising, storing, archiving or sharing your data. Please specify where possible.
  • Data management planner
  • Ontology editor
  • Metadata explorer
  • Data file browser/search
  • Data manager application
  • Repository file transfer
  • MyExperiment
  • Other (please specify)
  • No specific tool
  • I don’t know
Additional tool details.
Please provide additional details, where possible. If other tool(s), please specify.
4.7. Do you have a deletion policy?
  • Yes
  • No
  • I don’t delete data
  • I don’t know
4.8. How securely does your data need to be stored?
  • Very securely
  • Quite securely
  • Not at all securely
4.9. How securely do you actually store your data?
  • Very securely
  • Quite securely
  • Not at all securely
4.10. What strategies do you use to secure your data?
Select all that apply.
  • Encryption
  • Anonymisation
  • Password protection
  • Physical measures
  • Other (please specify)
4.11. Which guidance, policies and legislation are you aware of that covers your data?
Select all that apply.
  • Data Protection Act
  • Freedom of Information
  • ISS Guidelines on Data Security
  • NHS requirements (please specify)
  • University requirements (please specify)
  • Academic Unit requirements (please specify)
  • Research team requirements (please specify)
  • Research Councils
  • Publication journals
  • Local ethics/LREC
  • NHS ethics/NRES
  • Other (please specify)
4.12. Which guidance, policies and egislation would it be most useful for the University to provide you with further information on?
4.13. Who has primary responsibililty for research data management support?
  • Principal Investigator
  • Computing Support Officer
  • School Manager
  • Research Administrator
  • Research Assistants
  • Technical support
  • PhD Student Other (please specify)
4.14. In the future, who should have primary responsibililty for research data management support?
  • Principal Investigator
  • Computing Support Officer
  • School Manager
  • Research Administrator
  • Research Assistants
  • Technical support
  • PhD Student Other (please specify)
4.15. What research data management (or closely related) training sessions or training materials are you aware of?
Please include who provides this.
Data management section – additional details.
If you have answered ‘other’ to any previous questions, please specify here.

5. Data repositories

Places that acquire and curate data such as institutional systems or the UK Data Archive.
5.1. At the end of your project, are you happy to have your data publically discoverable and accessible?
  • Yes
  • Partially
  • If participants are anonymised
  • No
  • Maybe
5.2. Have you ever deposited any of your data into a data repository?
  • Yes, I am required to do so
  • Yes, I choose to
  • No I don’t know
5.3. Would you be willing to submit your data to a data repository?
  • Yes
  • Partially
  • If participants are anonymised
  • No I don’t know
5.4. At what stage in the data’s lifecycle would you submit your data to the repository?
  • Collection Stage
  • Processing Stage
  • Publication Stage
  • I wouldn’t
5.5. Thinking now, specifically about data at publication, how long after publication would you be willing to make the data supporting that publication available?
  • Immediately
  • 1 – 6 months
  • 6 months -1 year
  • 2 – 5 years
  • 5 years or more
  • After my retirement
  • At my death
  • Never Other (please specify)
5.6. Would you be more likely to share the data if you controlled who could access it?
  • Yes
  • No
  • Maybe
Data repositories section – additional details.
If you have any further comments on the above section, please specify here.

6. Intellectual property

Intellectual property rights are granted to creators and owners of works that are the result of human intellectual creativity.
6.1. Who has the intellectual property rights for your research data?
  • Me
  • My research group
  • The University
  • My Funder
  • Another group/organisation
  • Other (please specify)
  • I don’t know
6.2. Who funded this project?
For example, funding councils, commercial/private funding, etc.
  • Research Council
  • European Commission
  • NHS/Department of Health
  • Industry/public corporations
  • Charity
  • UK government department
  • Unfunded
  • Other (please specify)
Intellectual property section – additional details.
If you have any further comments on the above section, please specify here.

7. Questions, comments and further information.

Please feel free to add any other comments, questions or request further information here.
If this online survey does not appropriately describe your data please tell us.

Face-to-face interviews (*)

We are also arranging face-to-face interviews, please tick this box if you are interested in being involved and we will contact you.
  • Yes
  • No

iridium project mailing list (*)

Would you like to be updated with the results of this survey, feedback sessions and related news from the project.
  • Yes
  • No

Contact details. If you answered ‘YES’ to either question above, please provide an email address.

 Data Protection Statement:

Your comments will be treated as confidential, although anonymised information will be included in our report.

The personal and identifiable data we collect from your survey responses will be accessible to project team members only until March 2013

Anonymised data will be retained for up to 10 years from last date of publication citing it

Your data will be used to inform and help synthesise draft policy recommendations and research data management systems.

Thank you for completing this survey.

—-

iridium research data management requirements interview survey questions

The iridium project has reviewed several useful requirements surveys (DAF, Sudahmih, MaDAM, IDMB, ERIM, Purdue, etc.), blogs (i.e. Research Data @Essex, etc.), debated internally and carried out interview trials/iterative development.  This is a recent draft outline of the main questions from our face-to-face interview script (these questions are supplemented by ‘scope notes’ to assist).

1. Data
1.1. What does “data” mean to you?
1.2. What data do you have?
1.3. How do you record what the data is / means?
1.4. What format is your data in?
1.5. Where is your data stored?
1.6. Approximate size of data?
1.7. Is any of your data intrinsically linked to any applications?
1.8. What is the most valuable data you have?
1.9. Is data of value independently or as part of a whole set?

2. Lifecycle
2.1. What is the lifecycle of your data?
2.2. Where does data come from and where does it go?
2.3. Can you visualise your data lifecycle?
2.4. Is your data backed up?
2.5. What is your deletion policy?
2.6. How do you process or analyse the data?
2.7. Where do you analyse or process the data?
2.8. When is the data used?
2.9. How is your data archived?
2.10. How changeable is your data over time?
2.11. What is your final outcome – data or process?
2.12. How much do you know about your data before you start working?
2.13. Does your data transcend projects or is it only used in a single project?

3. Legislation
3.1. Who do you get funding from?
3.2. What policy and legislation are you aware of that covers your data?
3.3. What sorts of access is required to your data and by whom?
3.4. How secure does your data need to be?

4. Future
4.1. What would you want to do that you can’t do now?
4.2. What should you be doing now but aren’t or can’t?
4.3. Is the interviewee aware of things that they ought to be doing now but can’t?
4.4. Who might need access to data in the future and how might this change?
4.5. What data exists that you would like to get access to?

5. Other
5.1. Anything else which the interviewee thinks might be useful?
5.2. Would the interviewee be happy to provide further information, interviews, written comments, or ISS audit in the future?
5.3. Who else might we usefully interview?
5.4. What else should we have asked?

Comments on these survey questions would be very welcome. A shorter online survey is also being authored.