iridium – postgraduate evaluation of MANTRA RDM training (2) – Sharing,preservation and licensing unit

From Jack:

The final module of the MANTRA online research data management training is entitled Sharing, Preservation and Rights. The second of two new modules (the last one being Data Protection, Rights and Access) focus on the back end of the research lifecycle.  In this instance, when working on a project the main focus for the researcher will be gathering the data and achieving outputs, there may be little focus initially beyond this. Once work has been completed preservation and sharing may be one of great importance to ensure the greatest possible impact; if research is intended to be cumulative and part of a community then making research data available should be of priority. However, for some there may be restrictions to the extent they make data available and limits to how others are able to use it. These are also covered in this module.

The module outlines the benefits of sharing research data. There are benefits for the researcher their self (scientific integrity, funder requirements and preservation for one’s own future use) and the research community more widely (teaching, impact, collaboration and public record).  Whilst for the most part we may take on good faith the validity of outputs published in journals and other academic papers the module outlines some high profile instances of how some results have been fabricated by researchers. They argue then that making the data available upon outputs are based ensures legitimacy of research and conduct of openness.

Whilst outlining the importance of preserving data for future reuse the difficulties and potential problems of maintaining it over time are highlighted. Rapid changes in file formats and obsolete storage methods are cited as potential future issues for access. Though this may pose an undue hindrance one’s research activities I see it to emphasise the importance or proper and correctly managed data preservation. Reasons are given for placing data into repositories with emphasis. A further emphasis of the module is that whilst for the most part it focuses on the creator of data, also recognises the position of the secondary data user and provides help for them.

For further guidance on data preservation and best practice the recommended reading  of DCC Curation Reference Manual (http://www.dcc.ac.uk/resources/curation-reference-manual) provides in-depth curation techniques split into several chapters (some still in development).

This final module of the MANTRA training completes a comprehensive yet straightforward beginner’s guide to research data management. Having reviewed the content of several online data management guides recently the University of Edinburgh learning units are the ones I would be recommending as an introduction for fellow postgraduate researchers and equally anybody with related interest in research data management.

MANTRA available from: http://datalib.edina.ac.uk/mantra/preservation.html (CC-by licensed)

Advertisements

iridium – postgraduate student evaluation of MANTRA RDM training – Sharing, Preservation and Licensing unit

From Blanca:

Probably I have blogged before about how useful MANTRA training units are and how much I enjoy then.  This month MANTRA released its new unit called “Sharing, preservation & licensing” which is no different from the other units in terms of how effectively it manages to get the message across.

More than that, I believe this to be a dramatic unit. Leaving aside specific barriers for sharing data such as not sharing because of commercial purposes, keeping subjects confidentiality and data ownership (all these barriers may or may not have solutions), there are other reasons which are linked to how the data has been managed during its lifecycle. This unit provides some dramatic examples of why you should share your data and how you need to treat your data from the moment you first get it.

One of the examples this is unit provides is an animated cartoon, which I found hilariously frustrating (putting myself in the shoes of a researcher who wants to re-use some data and finds herself at the mercy of the owner of the data). Problems such as backing up (which are the perils of using physical devices for storage on the short and long term?), appropriate formats (what do you do if the software you used for manipulating your data becomes unsupported?  What does this mean for future users?), and metadata recording (Do you want other researchers to be depending on you to interpret your data? Are you actually going to be available during the whole life cycle of the data?).

This simple animated cartoon made me reflect on the fact that besides barriers such as the ones mentioned above, some barriers are created by the very researcher and having an effective research data management plan can help you take the decision of sharing or not your data, and how you want to share it. In any case, how your data has been managed should not be a barrier for sharing it.

How the data is managed is effectively important.  This unit presents impressive real cases of data fabrication and falsification, these cases are truly unbelievable and I can just think, why would somebody put his/her reputation on the line in such way? The consequences are simply terrifying.

The unit also mentions the benefits of sharing your data, which may bring various rewards such as scientific integrity, increased impact in terms of primary and secondary publications, it may allow collaboration between data users and data creators, it may be the source of some other innovative unrelated research based on the same data,…, there are indeed various benefits and perhaps more importantly the researcher maximises transparency and accountability of his/her research while at the same time he/she complies with funders’ requirements.

Making your data shareable is not an easy task; there are several things to take into account, specially the need to define how you want your data to be re-used? This unit introduces Open data licensing briefly, a topic which I would possible like to see more developed in another unit.

In general, this is a really useful unit which I genuinely enjoyed reading.

MANTRA unit available from: http://datalib.edina.ac.uk/mantra/preservation.html

iridium – postgrad evaluation of MANTRA RDM training – Data protection, rights and access unit

From Blanca.

Today I had the opportunity to explore the “Data protection, rights and access” unit of MANTRA. This is a quite new unit which offers plenty of relevant and essential concepts.

Firstly, it discusses the concept of ethics and how ethical requirements need to be taken into consideration with planning a RDM. Ethics, is a serious issue, specially when it involves people. Most of the examples and RDM strategies discussed over the unit concern data about people.

Essential concepts this unit focusses on are privacy, consent and confidentiality. The first step towards an ethical research would be to obtain consent from your research subjects (This way people are given the right to take decisions on the use of their personal data). Next, the researcher needs to make sure he/she will guarantee the protection of subject’s privacy, to do so, the researcher will need to outline confidentiality strategies (this is an agreement between the researcher and the research subjects on how his/her identifiable private information will be handled, managed and disseminated).

Besides ethics, the unit makes relevance on how important are legal considerations for RDM. The 1998 Data Protection Acts regulates personal data handling. Failure to comply with these regulations can incur in extremely severe consequences for organisations and individuals, the unit provides a series of crude examples about it. Even huge institutions such as the NHS are not exempt!

Next, the unit provides with some very useful anonymisation techniques (masking data so that no person identifiers are present), a document with some examples is provided.

Finally, the unit discusses what a are “Intellectual Property Rights” and “Freedom of Information.”

Intellectual property (IP) is all about the creation of the mind. Laws try to make sure owners of these creations are granted with certain exclusive rights when it comes to commercialisation of their creation. There are 2 categories: Industrial property (includes patents, trademarks…) and Copyrights (for literary and artistic works). On the other hand, Freedom of Information (FoI) is about providing the public the right to access information from public bodies.

In general, I found this unit to be quite vast in content. The approach it takes for the explanation of the concepts is really good and concise. However, it didn’t have as many interactive parts as previous units. The unit also provides some other recommended resources.”

MANTRA Data protection, rights and access unit: http://datalib.edina.ac.uk/mantra/dataprotection.html

iridium – research data management frequent and key questions

Our requirements gathering interviews, online survey free-text responses, and stakeholder engagement has given rise to numerous questions and discussion about best practice in RDM. Several institutional RDM sites and projects have developed RDM FAQ pages. I note the Open Exeter project touched on this during their June workshop a while back. Questions raised then included …

“What is data

What to do with data after you finish your PhD or project?

“the best way to back-up,

use of central university storage,

number of passwords,

complexity of working online (which can make free cloud services more attractive),

lack of support with queries or uncertainty about who to contact;

selection and disposal,

uncertainty over who owns the data”

[JISC MRD Evidence Gatherering project blog report on the workshop]

… and I’ve found/seen blogged a few more institutional RDM FAQs:

http://www.lib.cam.ac.uk/repository/help/faq.html

http://securedata.data-archive.ac.uk/about/faq

http://www.southampton.ac.uk/library/research/researchdata/restrictingaccess.html

http://www.southampton.ac.uk/library/research/researchdata/faqs_researchdata.html

http://shard-jisc.blogspot.co.uk/2012/07/how-do-i-preserve-my-research-data-faqs.html

https://as.exeter.ac.uk/library/resources/researchoutputrepositoryeric/quickguides/#submitting_content

From our requirements gathering interviews/surveys with researchers (and related staff), staff queries and from our own project internal questions/discussions when planning for end user needs, this long list has been gathered (but not all the answers to these questions though, yet!) Obviously, there are some good sources of information out there for many queries such as the DCC website/resources, JISC Legal, JISC MRD outputs, individual Funder guidance, existing University guidance, the UK Data Archive help,  some .ac.uk/.edu Library pages, specific .gov.uk legal sites, etc.

This is what has come up so far:

What is ‘research data management’? Why is it important?

What is the University doing about it?

What is the University policy on RDM?

What is RDM good practice? What should I be doing?

My discipline is XXX, what is the specific RDM good practice? (i.e. Microscopy, Engineering, History, Biomedicine, Visual arts, Architecture, etc.)

What does a RDM plan consist of? Can I see examples?

How is research data defined? What do you mean by ‘data’?

What research data do I keep? How long for?

What research data do I need to share, my XXX Funder’s policy is ambiguous?

My research data is not digital, what do I archive/curate?

What is metadata? What metadata, specifically for research data, do I record?

How do I record the context of my research? What is sufficient context? How do I keep this/package with my data?

What is data curation/preservation? Isn’t curation a time burden?

What file formats should I use for long term archiving?

I have many CDs/DVD (etc.) with archived research data on, can I retrospectively index/capture metadata?

What tool(s) will help me manage my (or my Academic Units) research data?

I want to share my data, how/where do I do this?

I want to share my data, how do I licence it?

I do not want to share my data, do I have to? How can I avoid this (I have reasons not to share)?

What about commercialisation considerations?

What about ethics considerations?

My Funder and I disagree on the extent to which project research data should be placed in a national repository (there are sensitive ethical issues), what should I do?

What is Funder X’s policy? My funder doesn’t have a policy? Does it?

Funder policies keep changing, how do I keep up to date? My different Funders each have different policies, what do I do?

Who in the University can I get guidance from?

Where should I store/archive my data?

What are the national data centres? What repository is available for my research discipline?

There is no national repository for my discipline, does the University have one?

I have (lots of) data I want to bring from my former University, how can I do this?

I have had an FOI request for my research data, what do I do?

I have had an FOI request for a staff members’ data, who has long since left, what do I do?

Is my institutional personal ‘ “Home” drive suitable for storing/archiving my research data?

What *approved* tools are available for sharing data internally to University?

What *approved* tools are available for sharing data externally to University?

I need access to my data off campus/while overseas, how? (also Mac compatible).

What *approved* tools are available for collaborating/discussion on research projects internally/externally to University?

How can I share large files securely?

Is Google Docs appropriate/approved for research data?

Is DropBox appropriate/approved for research data?

Is the ‘cloud’ appropriate for research data storage?

Can cloud services be used to make research data processing/analysis more efficient?

How can I improve processing speed of raw data?

What can I do about low bandwidth for data transfer?

What induction training is there for PIs/Visitors/PDRA/PhD/UGs?

How does the Environmental Information Regulations (EIR) effect research data requests?

What is a ‘data paper’? Do data papers attract REF?

Can research data be submited for REF?

How do I cite my research data?

Where does funding for long term storage come from? How do I add to direct/indirect cost?

Will my research grant still be funded if I include data curation costs?

Where can I find RDM training materials to incorporate/adapt?

What RDM training can I attend locally/nationally (at different career levels/roles)?

What is data security?/What encryption good practice is recommended? (i.e. Bit Locker/ISO9xxx?) What physical security measures are adequate?

I have paper-based data/data backed to tape, what is the institutional provision for a fire-proof storage location?

The RDM policy states XXX /’appropriate storage, treatment and security’ what are definitions/examples of this?

What is the research data life cycle?

Should I ever delete data?

What is file compression?

I am running out of research data storage space, can I get more?

Where can I find policy XYZ?

I am a NHS employee, but I do research in the University, how do I …?

I am a HE employee, but do I research in the NHS, how do I …?

I have a NHS/HE dual/honary contract, my data is on an NHS encrypted drive, when I bring it to the University … … ?

The NHS requires that I have a separate XXX/laptop for NHS work, this cause difficulty and means that … ….?

Are external USB hard drives ‘bad’ for research data storage? Why?

Are external RAID (RAID5?) devices acceptable?

Can you outline examples of different storage options, their appropriateness for research data storage, failure rate and cost per TB per year?

Who owns my research data? Do I own my research data? What is IPR? Do undergraduates own their data? Do postgraduates own their data?

What is good practice in data backup? Do you have an example Standard Operating Proceedure?

I believe research data is at risk in my Academic Unit (security/or hardware), who do I report it to?

What are the implications of storing/sharing data across UK/EU/International borders, etc.?

How do I measure ‘last accessed’ date of my research data sets?

I wrote a RDM plan, it didn’t predict my actual storage needs accurately as project was more successful/went in new direction, what can I do? Can I get some temporary storage?

We have our own database/repository, how can we link up with the institutional Research Data Catalogue (RDC)?

What interoperability (ingest/harvest feed formats) does the RDC offer?

How can I bulk upload to/ingest from RDC for a large number of metadata records?

How will the iridium project feed into the local XXX project/initiative?

How has the recent XXX review fed into iridium project?

My data was licensed for a specific purpose, so I cannot share data (share metadata), can I?

How do I consent for open access data? How do I extend/should I contact my data subject again/patients again to get consent for open access?

Why do I need to do a full ethics review again because of [condition X] and [condition Y]?

Can the University provide guidance on negotiating with differing policies from IRAS, local NHS Trusts and national NHS policies?

Do people actually read RDM plans? What teeth do Research Councils have?

I have a staff member leaving, how should I prepare in terms of RDM?

We no longer have funding to maintain XXX research data collection, what should we do with the research data?

My research data has national security implications, I cannot share data/metadata?

I don’t think repository XXX be able to store my XXX research data type?

What would be the terms and conditions for someone using my data?

My data would require a custom/obscure piece of software for its meaningful use? How can this be achieved if I deposit data?

What would be the reason(s) to release data prior to publication?

I want to be selective about the data sets I release, part of it will form future research funding proposals, is this OK with University policy/Funder XXX policy?

Do you have a sample data/client agreement or licence agreement?

Can I create controlled access/levels of access for different people internal/external to University?

How do I document the context/conditions/parameters of my research and its data?

It would be impossible/difficult for me to pass on the context of my research data, without this the data does not have re-use value?

I have seen XXX tool, can I/should I use that for RDM?

iridium – what training support do RDM policies and systems require?

Looking around at other projects and institutions outputs on RDM training/support, I’ve noted these themes:

Understanding of research data definitions and various forms (across different disciplines)
Writing a RDM plan
Organising data files
Data file versioning
File formats/open standards to share/archive data
Documenting sufficient research data context and high-quality metadata for data discovery/re-use
Safe data storage and security (including secure data transfer, guidance on use of email for research data) [UPDATED – storage methods (Cloud, etc.) – pros/cons]
Local/national RDM-related policies
Transparency and open access agendas
Data curation/preservation explained  [UPDATED – what to keep? Appraisal and selection (see comments below)]
Sharing and licensing data sets
Searching for and finding data sets
Re-use of data sets, correct attribution/citation of data sets
Local and national tools to support RDM
Sources of assistance/guidance on discipline-specific RDM

Is this a comprehensive list (comments welcome!)? What else is needed?

Also, noted the following institutional websites with public outputs supporting local RDM practice and approaches to organising RDM support information:

http://www.southampton.ac.uk/library/research/researchdata/
http://www.gla.ac.uk/services/datamanagement/organisingyourdata/
http://www.admin.ox.ac.uk/rdm/
http://www.keele.ac.uk/researchsupport/researchdatamanagement/
http://www.ed.ac.uk/schools-departments/information-services/services/research-support/data-library/research-data-mgmt
http://www.bath.ac.uk/rdso/datamanagement.html
http://www.exeter.ac.uk/research/rkt/grantlifecycle/datamanagement//
http://www2.le.ac.uk/services/research-data

iridium – third postgraduate student feedback blog on MANTRA RDM training

Continuing the series of posts on the MANTRA RDM online training tool from postgrads in different  disciplines.

Third is post is from Jack, postgraduate student in Philosophy.

“I recently read through MANTRA, the online research data management training guide from Edinburgh University, as a novice beginning work on the Iridium project. By novice I mean my own current research is broadly in Philosophy at masters level. As such, with regards to my own area of study, research data management took little priority beyond maintaining a bibliography of secondary texts I had read for reference. My hope for the training was to widen the context of research data across the spectrum of methods and data types.

The first section proper (“Research data explained”) I found to be a useful introduction to the remit of what may constitute research data, how it may be generated with practical examples for each. The next section (“Data Management Plans”) emphasises the importance of having a management plan, placing it in the context of one’s own research by asking what best suits the type of data the researcher uses. It then breaks down the general components of a data management plan with a checklist of what should go into each:

(c) EDINA and Data Library, University of Edinburgh. Research Data MANTRA [online course], http://datalib.edina.ac.uk/mantra CC:BY

(c) EDINA and Data Library, University of Edinburgh. Research Data MANTRA [online course], http://datalib.edina.ac.uk/mantra CC:BY

With “Organising Data” the need for good file management will be familiar to researchers with lax conventions for saving. With the best will in the world whilst one may have the confidence they will remember where data was saved and under what title at the time, a few weeks or months down the line this becomes a cause of frustration. Intelligible and simple ways of saving are provided to avoid this. Reference to bulk renaming tools is given to aid creating conventions for mass files. The RDM Blankety Blankstyle summaries at the end of the sections are a good way to check what information has been absorbed:
 (c) EDINA and Data Library, University of Edinburgh. Research Data MANTRA [online course], http://datalib.edina.ac.uk/mantra CC:BY


(c) EDINA and Data Library, University of Edinburgh. Research Data MANTRA [online course], http://datalib.edina.ac.uk/mantra CC:BY

I found the video in the “Documentation and Metadata” beneficial as it relates directly with recording metadata in the social sciences. The student outlines the issues related to recording metadata in social sciences. She also highlights how recording metadata can be important for one’s own sake in reminding what methods were used and for what reasons when returning to projects later on. With other videos for an introduction I think some of the speakers in the video I found maybe too in depth though the enthusiasm is creditable.The final unit “Storage and Security” is by far the largest section and could act as a standalone training module. It goes into detail around the importance of regular backups and appropriate storage.  Whilst most researchers will be familiar with the pains of losing data they have worked on a few first hand horror stories from those who have lost data are given to affirm the importance of this. The concluding “Recommended Resources” section provides recent documents and webpages for further introductory guides to research data and management plans, which open embedded into the training page. For anybody looking to go on to create a management plan I found the Sarah Jones guide from the Digital Curation Centre to provide a detailed yet straightforward guide to doing so.To surmise, with my own work being broadly based in the humanities understanding what constitutes research data seemed less clear cut than other faculties. Whereas with chemistry or biology the raw data is readily demarcated, as that which is studied in a laboratory, I initially found it less discernible with more academic essay writing. However, the way in which the training asks to question your own data generally and covers a great range of data types. This provides a greater direct understanding in relation to my own work and research data more generally. Splitting the different areas of RDM into separate units I’ve found beneficial. Following interviews conducted for Iridium I have since gone back to individual sections to go through the area again to clarify any doubts.”

iridium – postgraduate student feedback on MANTRA RDM training

The postgraduate support team completed the MANTRA RDM online training tool a while back. Here is the first in (hopefully) a series of postgraduate blog posts reviewing the training package from different discipline perspectives.

First post is from Amy, postgraduate student in History of the Americas.

“I recently completed the Research Data MANTRA training – an online resource provided by the University of Edinburgh. I undertook this training in order to help me get to grips with some of the basics of research data management (RDM) primarily to help me understand the issues that researchers are facing with data management. As part of the Iridium project I carried out interviews with staff from across the University to find out their thoughts on RDM and I hoped that this training would help me build up a base knowledge of some of these issues.

I found the first module, ‘Research Data Explained’, a really useful starting point. It outlined some simple information, such as the different forms of data and it uses, as well as the difference between research data and research records. Although this might sound basic, I had never really given a lot of thought to this area and now that I have a clear understanding of it, I can see how important it is to have a solid foundation of knowledge in the basic terms of RDM.

The second module, ‘Data Management Plans’, was not as interesting as the first, but I think that this is because data management plans are something that some researchers (definitely me) tend to do unconsciously in an ad-hoc sort of way, so actually taking the time to write out something that, for some, is an inherent but unwritten part of research can seem tedious. However, I also recognise that for some researchers, detailed data management plans are the essential beginnings of any project and some of the points raised in this module will be useful for when I’m interviewing people and getting them to think about how they store their data. It discusses the benefits of planning data management in terms of funding, efficiency and integrity, as well as re-using data for teaching and learning which is an important point to link to the research-led teaching aspect of the REF when we’re talking to researchers.

The third section, ‘Organising Data’, was the most basic of the modules and really just covered general rules that people probably already know about properly naming files and organising their work so that it can be easily referred to in the future. However, there were some specifics in the modules that I hadn’t considered before, such as making sure that the file names are scalable to the proper degree. I enjoyed the training although some of the videos were distracting as they could be quite subject specific and it may have been better if they’d used a more general approach to research data (although I recognise this is difficult considering the variation between disciplines). The summary pages at the end were really good at consolidating the topic and as this is a topic that I already feel comfortable with, I think the summary pages would be the part of the module that I referred back to if I did need any support. This module would also be useful for anyone working in a group for the first time and wanting to consider how this might affect their research practices.

I found the module on ‘File formats and Transformations’ really useful as I knew very little about this subject. One of the most important points that this module made was the difference between operating systems in relation to file formats. As researchers share more and more work this will become an increasingly important issue and I imagine that this is now something that researchers working in groups have to take into account when they are creating a data management plan. I think that some of the more simple points – such as the longevity of text files – offer a good starting point to demonstrate to those researchers who do not give much consideration to this topic, that it is a necessary part of data management and there are simple things that can be done to improve their data’s integrity.

The fifth module on ‘Documentation and Metadata’ could be received by different researchers in very different ways. For those researchers who regularly use lab books and research data documentation, this module is probably very basic. However, some researchers may be unaware of the importance of metadata or its repercussions for ‘machine-to-machine’ interpretation of data. The module usefully outlines the different categories of metadata and I think that most researchers could probably work with at least one of the categories and therefore, start purposefully producing and recording metadata for their research. It is important to make sure that all researchers feel that they are capable of doing this as it will be increasingly important for the purposes of transparency and accountability, as well as the sharing of data online. If metadata is something that interests funding bodies then all researchers should be made aware and comfortable of creating and using it.

The final module, ‘Storage and Security’, made some valid points but I found it quite serious and I find it unlikely that researchers who are dealing with anything but highly confidential or valuable material would be inclined to follow its instructions very closely. However, the information it provides about organised and regular back-ups would be applicable to almost all researchers. I also found the information that it provided on deleting sensitive data interesting as I was unaware that such measures were required to properly delete data.

Although I undertook this training primarily in relation to my work on the Iridium project, it has given me a better understanding of how I should be aware of RDM in my own research. Being a History student, I rarely produce any of my own data – most of it is gathered from already existing sources. Therefore, I had never really considered RDM to be an issue for me. After completing this training though, as well as working on the Iridium project, I can see how issues surrounding RDM such as storage, security and especially organisation, affect every researcher to varying degrees. As the level at which I am studying increases, I think that these issues will become more and more relevant and hopefully I will be able to employ some of the advice that has been given in this training to my research methods. I also think that completing this course has made me much more confident in my ability to go out and talk to researchers as part of the Iridium project about RDM as I now have a much better understanding of the issues that they are facing. “

%d bloggers like this: