Reflections on Repository Fringe 15: 5 Positive Parallels between Data Repositories and Electronic Lab Notebooks

Today we are delighted to share a new guest post from Rory Macneil, from Research Space, who was attending and presenting at Repository Fringe 2015 last week in Edinburgh and shares his reflections on the event.

Dr David Prosser, Executive Director of RLUK speaking at Repository Fringe 2015

Dr David Prosser, Executive Director of RLUK speaking at Repository Fringe 2015

“We have failed to engage researchers adequately and I think that the busyness of academics is an insufficient reason to explain that. So why have we failed to engage and to get academics to see this as something they should do on a daily basis?”

So said David Prosser near the beginning of his opening remarks at #rfinge15. It strikes me that until recently the same could have been said about electronic lab notebooks. The stories I heard at #rfringe15 about difficulties in getting academics to use repositories resonated with the same difficulties faced in encouraging uptake of ELNs.

The recent upsurge in interest among academic researchers in ELNs is, I think, a positive sign for increased uptake of repositories. I’ve described below five things repositories and electronic notebooks have in common, after which I reflect on the implications for the future of both.

1. Drivers

Funder requirements to make public and preserve data are driving interest among academic researchers in both electronic lab notebooks and repositories.

2. Benefits

At the Tuesday afternoon breakout session led by Claire Knowles we came up with two ‘use cases’ for using repositories that are sure to resonate with at least some academics, preservation of data that has the potential to get lost, and discovery by other researchers who might otherwise never come across the research related to which data relates. The first is a major driver for uptake of electronic lab notebooks, and the second is a secondary driver.

3. Workflows

Collection, analysis and presentation of data are at the heart of existing researcher workflows. Both electronic lab notebooks and repositories fit into, and enhance, those workflows without causing extensive disruption to them or requiring fundamental changes in workflows.

4. Synergies

Electronic lab notebooks and repositories complement and reinforce each other. ELNs may in future come to be seen as the ‘repository’s friend’ because data in ELNs is already available, and structured, and hence it’s easier and more natural for researchers using an ELN to deposit their data into a repository. And researchers who use ELNs are more likely to understand the benefits of depositing their data into a repository.

5. The institutional context

Research data managers/data librarians have taken the lead in developing repositories and repository services, and in introducing researchers to repositories. Increasingly they are taking the lead in introducing researchers to ELNs, in terms of evaluating them, procuring them, and assisting with their introduction, including providing and/or procuring training with ELNs. As a result the researcher experience of discovering and using repositories and ELNs is quite similar.

Concluding Thoughts

The title of David Prosser’s talk, “Fulfilling their potential: is it time for institutional repositories to take centre stage?”, is more optimistic than his quote cited at the opening of this post. I share this optimism, because of the broader picture of accelerating introduction of new tools and technologies into evolving researcher workflows, and the increasing relevance of repositories to the day to day needs of researchers. Thinking about ELNs side by side with repositories highlights key trends in the broader picture and also brings out ways in which ELNs and repositories complement and reinforce each other.

Many thanks to Rory for his take on last weeks Repository Fringe 2015. Remember, if you would like to share your own reflections on the event, or if would like us to link to your blog post or coverage of the event just get in touch via email or via the comments below.

We’ll have a round up post coming soon, in the mean time why not browse the pictures from the event on Flickr, or explore or search the Twitter archive for #rfringe15, and if you haven’t already completed our feedback survey please do – we really value your ideas and comments.

Posted in Uncategorized

Repository Fringe Day Two LiveBlog

Image of Les Carr's jacket, taken by Paul Walk (EDINA)

Welcome back to day two of Repository Fringe 2015! For a second day we will be sharing most of the talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Integration – at the heart of  – Claire Knowles, University of Edinburgh; Steve Mackey, Arkivum

Steve is leading this session, which has been billed as “storage” but is really about integration.

We are a company, which came out of the University of Southampton, and our flagship Arkivum100 service has a 100% data integrity guarantee. We sign contracts in the long term, for 25 years – most cloud services sign yearly contracts. But we also have a data escrow exit – so there is a copy on tape that enables you to retrieve your data after you have left. It uses all open source encryption which means it can be decrypted as long as you have the key.

So why use a service like Arkivum for keeping data alive for 25+ years. Well things change all the time. We add media all the time, more or less continually… We do monthly checks and maintenance updates but also annual data retrieval and integrity checks. There are companies, Sky would be an example, that has a continual technology process in place – three parallel systems – for their media storage in order to keep up with technology. There is a 3-5 year obsolescence of services, operating systems and software so we will be refreshing hardware, and software and hardware migrations.

The Arkival appliance is a CIFS/NFS rpresentation which means it integrates easily to local file systems. There is also a robust REST API. There is simple administration of users permissions, storage allocations etc. We have a GUI for file ingest status but also recovery pre-staging and security. There is also an ingest process triggerd by timeout, checksum, change, manifest – we are keen that if anything changes you are triggered to check and archive the data before you potentially lose or remove your local comment.

So the service starts with original datasets and files, we take copy for ingest, via the Arkivum Gateway on Appliance, we encrypt and also decrypt to check the process. We do check sums at all stages. Once all is checked it is validated and sent to our Archive on the Janet Network, and it is also archived to a second archive and to the escrow copy on tape.

 

We sit as the data vault, the storage layer within the bigger system which includes data repository, data asset register, and CRIS. Robin Taylor will be talking more about that bigger ecosystem.

We tend to think of data as existing in two overlapping cycles – live data and archive data. We tend to focus much more on the archive side of things, which relates to funder expectations. But there is often less focus on live data being generated by researchers – and those may be just as valuable and in need of securing as that archive data.

In a recent Jisc Research Data Spring report the concept of RDM Workflows is discussed. See “A consortial approach to building an integrated RDM system – “small and specialist”” and that specifically talks about examples of workflows, including researcher centric workflows that lays out the process for the research to take in their research data management. We have examples in the report include those created for Loughborough and Southampton.

Loughborough have a CRIS, they have DSpace, and they use FigShare for data dissemination. You can see interactions in terms of the data flow are very complex [slides will be shared but until then I can confirm this is a very complex picture] and the intention of the workflow and process of integration is to make that process simpler and more transparent for the researcher.

So, why integrate? Well we want those process to be simpler and easier to encourage adoption and also lower cost of institutional support to the research base. It’s one thing to have a tick box, it’s another to get researchers to actually use it.  We also, having been involved multiple times, have experience in the process of rolling RDM out – our work with ULCC on CHEST particularly helped us explore and develop approaches to this. So, we are checking quality and consistency in RDM across the research base. We are deploying RDM as a community driven shared service so that smaller institutions can “join forces” to benefit from having access to common RDM infrastructure.

So, in terms of integrations we work with customers with DSpace and EPrints, with customers using FigShare, and moving a little away from the repository and towards live research data we are also doing work around Sharegate (based on archiving Sharepoint), iRODS and QStar; and with ExLibris Rosetta and archivematica. We really have yet to see real preservation work being done with research data management but it’s coming, and archivematica is an established tool for preservation in the cultural heritage and museums sector.

Q1) Do you have any metrics on the files you are storing?

A1) Yes, you can generate reports from the API, or can access via the GUI. The QStar tool, and HSM tool, allows you to do a full survey of the environment that will crawl your system and let you know about file age and storage etc. And you can do a simulation of what will happen.

Q2) Can I ask about the integration with EPrints?

A2) We are currently developing a new plugin which is being driven by new requirements for much larger datasets going into EPrints and linking through. But the work we have previously done with ULCC is open source. The Plugins for EPrints are open source. Some patches created were designed by @mire so a different process but after those have been funded they are willing for those to be open source.

Q3) When repositories were set up there was a real drive for the biggest repository possible, being sure that everyone would want the most storage possible… But that is also expensive… And it can take a long time to see uptake. Is there anything you can say that is helpful for planning and practical advice about getting a service in place to start with? To achieve something practical at a more modest scale.

A3) If you use managed services you can use as little as you want. If you build your own you tend to be into a fixed capital sum… That’s a level of staffing that requires a certain scale. We start as a few terabytes…

Comment – Frank, ULCC) We have a few customers who go for the smallest possible set up for trial and error type approach. Most customers go for the complete solution, then reassess after 6 months or a year… Good deal in terms of price point.

A3) The work with Jisc has been about looking at what those requirements are. From CHEST it is clear that not all organizations want to set up at the same scale.

Unfortunately our next speaker, Pauline Ward from EDINA,  is unwell. In place of her presentation, Are your files too big? (for upload / download) we will be hearing from Robin Taylor.

Data Vault – Robin Taylor 

This is a collaborative project with University of Manchester, funded by JISC Research Data Spring.

Some time back we purchased a lot of kit and space for researchers, giving each their own allocation. But for the researcher the workflow the data is generated, that goes into a repository but they are not sure what data to keep and make available, what might be useful again, and what they may be mandated to retain. So we wanted a way for storing that data, and that needed to have some sort of interface to enable that.

Edinburgh and Manchester had common scenarios, commons usages. We are both dealing with big volumes of data, hundreds of thousands or even millions of files. It is impossible to use mechanisms of web interfaces for upload. So we need that archiving to happen in the background.

So, our solution has been to establish the Data Vault User Interface, that interacts with a Data Vault Broker/Policy Engine that interacts with Data Archive, and the Broker then interacts with the active storage. But we didn’t want to build something so bespoke that it didn’t integrate with other systems – the RSpace lab notebooks for instance. And it may be that the archive might be Arkivium, or might be Amazon, or might be tape… So our mechanism abstracts that in a way, to create a simple way for researchers to archive their data in a standard “bag-it” type way.

But we should say that preservation is not something we have been looking at. Format migration isn’t realistic at this point. At the scale we are receiving data doesn’t make that work practical. But, more important, we don’t know what the suitable time is, so instead we are storing that data for the required period of time and then seeing what happens.

Q1) You mentioned you are not looking at preservation at the moment?

A1) It’s not on our to do list at the moment. The obvious comparison would be with archivematica which enables data to be stored in more sustainable formats, because we don’t know what those formats will be… That company have a specific list of formats they can deal with… That’s not everything. But that’s not to say that down the line that won’t be something we want to look at, it’s just not what we are looking at at the moment. We are addressing researchers’ need to store data on a long term basis.

Q1) I ask because what accessibility means is important here.

A1) On the live data, which is published, there is more onus on the institution to ensure that is available. This hold back wider data collections

Q2) Has anyone ever asked for their files back?

A2) There is discussion ongoing about what a backup is, what an archive is etc. Some use this as backup but that is not what this should be. This is about data that needs to be more secure but maybe doesn’t need to have instant access – which it may not be. We have people asking about archiving, but we haven’t had requests for data back. The other question is do we delete stuff that we have archived – the researchers are best placed to do that so we will find out in due course how that works.

Q3) Is there a limit on the live versus the archive storage?

A3) Yes, every researcher has a limited quantity of active storage, but a research group can also buy extra storage if needed.  But the more you work with quotas, the more complex this gets.

Comment) I would imagine that if you charge for something, the use might be more thoughtful.

A3) There is a school of thought that charging means that usage won’t be endless, it will be more thoughtful.

Repositories Unleashing Data! Who else could be using your data? – Graham Steel, ContentMine/Open Knowledge Scotland

Graham is wearing his “ContentMine: the right to read is the right to mine!” T-shirt for his talk…

As Graham said (via social media) in advance of his  talk, his slides are online.

I am going to briefly talk about Open Data… And I thought I would start with a wee review of when I was at Repository Fringe 2011 and I learned that not all content in repositories was open access, which I was shocked by! Things have gotten better apparently, but we are still talking about Gold versus Green even in Open Access.

In terms of sharing data and information generally many of you will know about PubMed.

A few years a blogger/Diabetes Researcher, Jo Brodie and regular tweeter asked why PubMed didn’t have social media sharing buttons. I crowd-sourced opinion on the issue and sent the results off to David Lipman (who I know well) who is in overall charge of NBCI/PubMed.  And David said “what’s social media ?”.

It took about a year and three follow ups but PubMed Central added a Twitter button and by July 2014, sharing buttons were in place…

Information wants to be out there, but we have various ways in which we stop that – geographical and license restricted streaming of video for instance.

The late great Jean-Claude Bradley saw science as heading towards being led by machines… This slide is about 7 years old now but I sense matters have progressed since then !

JCB

But at times, is not that easy to access or mine data still – some publishers charge £35 to mine each “free” article – a ridiculous cost for what should be a core function.

The Open Knowledge Foundation has been working with open data since 2004… [cue many daft Data pictures about Star Trek: The Next Generation images!].

millennium falcon

We also have many open data repositories, figshare (which now has just under 2 million uploads), etc. Two weeks back I didn’t even realize many universities have data repositories but we also want Repositories Unleashing Data Everywhere [RUDE!] and we also have the new initiative, the Radical Librarians Collective…

les_carr[1]

Les Carr, University of Southampton (data.ac.uk)

SLIDES

The Budapest Open Access Initiative kind of kicked us off about ten years ago. Down in Southampton, we’ve been very involved in Open Government Data and those have many common areas of concern about transparency, sharing value, etc.

And we now have data.gov.uk which enables the sharing of data that has been collected by government. And at Southampton we have also been involved recently in understanding the data, the infrastructure, activities, equipment of academia by setting up data.ac.uk. That is a national aggregator that collects information from open data on every institution… So, if you need data on, e.g. on DNA and associated equipment, who to contact to use it etc.

This is made possible as institutions are trying to put together data on their own assets, made available as institutional open data in standard ways that can be automatically scraped. We make building info available openly, for instance, about energy uses, services available, cafes, etc. Why? Who will use this? Well this is the whole thing of other people knowing better than you what you should do with your data. So, students in our computer science department for instance, looked at building recommended route apps, e.g. between lectures. Also the cross with catering facilities – e.g. “nearest caffeine” apps! It sounds ridiculous but students really value that. And we can cross city bus data with timetables with UK Food Hygiene levels – so you can find where to get which bus to which pub to an event etc. And campus maps too!

Now we have a world of Open Platforms – we have the internet, the web, etc. But not Google – definitely not open. So… Why are closed systems bad? Well we need to move from Knowledge, to Comprehension, to Application, to Analysis, to Synthesis and to Evaluation. We have repositories at the bottom – that’s knowledge, but we are all running about worrying about REF2020 but that is about evaluation – who knows what, where is that thing, what difference does that make…

So to finish I thought I’d go to the Fringe website and this year it’s looking great – and quite like a repository! This year they include the tweets, the discussion, etc. all in one place. Repositories can learn from the Fringe. Loads of small companies desperate for attention, and a few small companies who aren’t bothered at all, they know they will find their audience.

Jisc on Repositories unleashing data – Daniela Duca, Jisc

SLIDES

I work in the research team at Jisc and we are trying to support universities in their core business and help make research process more productive. And I will talk about two projects in this area: UK Research Data Discovery Service and the second Research Data Service.

The Research Data Discovery Service (RDDS) is about making data more discoverable. This is a project which is halfway through and is with UK Data Archive and the DCC. We want to move from a pilot to a service that makes research data more discoverable.

In Phase 1 we had the pilot to evaluate the Research Data Australia developed by ANDS, with contributions from UK data archive, Archeology data centre, and NERC data centres. In Phase 2 Jisc, with support from DCC and UKDA are funding 9 more institutions to trial this service.

The second project, Research Data Usage and Metrics comes out of an interest in the spread of academic work, and in the effectiveness of data management systems and processes. We are trying to assess use and demand for metrics and we will develop a proof of concept tool using IRUS. We will be contributing to and drawing upon a wide range of international standards.

And, with that, we are dispersing into 5 super fast 17 minute breakout groups which we hope will add their comments/notes here – keep an eye on those tweets (#rfringe15) as well!

We will back on the blog at 11.15 am after the breakouts, then coffee and a demo of DMAOnline – Hardy Schwamm, Lancaster University.

And we are back, with William Nixon (University of Glasgow) chairing, and he is updating our schedule for this afternoon which sees our afternoon coffee break shortened to 15 minutes.

Neil and I will be talking about some work we have been doing on linking research outputs. I am based at the British Library working as part of a team working on research outputs.

Linking Data – Neil Chue Hong, Software Sustainability Institute; Rachael Kotarski, Project THOR

Rachael: Research is represented by many outputs. Articles are some of the easier to recognise outputs but what about samples, data, objects emerging from research – they could be 100s of things… Data citation enables reproducibility – if you don’t have the right citation, and the right information, you can’t reproduce that work.

Citation also enables acknowledgement, for instance of historical data sets and longitudinal research over many years which is proving useful in unexpected ways.

Data citation does also raise authorship issues though. A one line citation with a link is not necessarily enough. So some of the work at DataCite and the British Library has been around linking data and research objects to authors and people, with use of ORCID alongside DOIs, URLs, etc. Linking a wide range of people and objects together.

THOR is a project, which started in June, on Technical and Human Infrastructure. This is more work on research objects, subject areas, funders, organisations… really broadening the scope of what should be combined and linked together here.

So the first area here is in research – and understanding the gaps there, and how those can be addressed. And bringing funders into that. And we are also looking at integration of services etc. So one thing we did in ODIN and are bringin into THOR is about connecting your ISNI identifier to ORCID, so that there is a relationship there, so that data stays up to date. And the next part of the work is on Outreach – work on bootcamps, webinars, etc. to enable you to feed into the research work as well. And, finally, we will be looking at Sustainability, looking at how what we are doing can be self-funded beyond the end of the project, through memberships of partner organisations: CERN, DataCite, ORCID, DRYAD, EMBL-EBI, ands, PLoS, Elsevier Labs, Panomia(?). This is an EU funded project but it has international scope and an international infrastructure.

So we want to hear about what the issues are for you. Talk to us, let us know.

Linking Software: citations, roles, references and more – Neil Chue Hong

Rachael gave you the overview, I’m going into some of the detail for software, my area. So we know that software is part of the research lifecycle. That lifecycle relies on the ability to attribute and credit things and that can go a bit wrong for software. Thats because our process is a little odd… We start research, we write software, we use software, we produce results, and we public research papers. Now if we are good we may mention the software. We might release data or software after publication… rather than before…

A better process might be to start the research, to identify existing software, we might adapt or extend software, release software (maybe even publish a software paper), use software, produce results, might release data and public data paper, and then we publish research paper. That’s great but also more complex. Right now we use software and data papers as proxies for sharing our process.

But software is not that simple, the boundaries can be blurry… Is it the workflow, the software that runs the workflow, the software that references the worksflow, the software that supports the software that references the workflow, etc? What’s the useful part? Where should the DOI be for instance? It is currently at programme level but is that the right granularity? Should it be at algorithm level? At library level? Software has the concept of versioning – I’d love our research to be versioned rather than “final” but that’s a whole other talk! But the versioning concept indicates a change, it allows change… but again how do we decide on when that version occurs?

And software also has the problem of authorship – which authors have had what impact on each version of the software? Who has the largest contribution to the scientific results in a paper? So for a project I might make the most edits to a code repository – all about updating the license – so the biggest contribution but would the research community agree? Perhaps not. Now I used to give this talk and say “this is why software is nothing like data” but now I say “this is why software is exactly like data”!

So, the different things happening now to link together these bits and piece. GitHub, Zenodo, FigShare and Institutional Repo looked at “package level” one click deposit, with a citable DOI. There has been work around sword deposit which Stuart Lewis has been looking at too. So you can now archive software easily – but that’s easily – but it’s the social side that needs dealing with. So, there is a brand new working group led by Force11 on Software Citation – do get involved.

And there are projects for making the roles of authors/contributors clearer: Project Credit is looking at Gold/Silver/Bronze levels. But Contributor Badges is looking at more granular recognition. And we also have work on code as a research object, and a project codemeta that is looking at defining minimal metadata.

So that brings us to the role of repositories and the Repository Fringe community. Imperial College for instance is looking at those standards and how to include them in repositories. And that leads me to my question to you: how does the repository community support this sort of linkage?

Q1 – William Nixon) Looking at principles of citation… how do you come up with those principles?

A1 – Neil) Force11 has come up with those citation principles and those are being shared with the community… But all communities are different. So it is easy to get high level agreement, but it is hard to get agreement at implementation details. So authorship changes over time, and changes version to version. So when we create principles for citation do we create all collectively and equally, or do we go the complex route of acknowledging specific individual contributions for a particular version. This causes huge debate and controversy in the open source community about who has the appropriate credit etc. For me, what do we need to deposit? Some information might be useful later in reward lifecycle…. But if I’m lead author will that be a priority here?

Q2 – Paul Walk) My internal hippie says that altruism and public good comes into open source software and I wonder if we are at risk of messing with that sordid research system…

A2 – Neil) I would rebutt that most open source contribution and development is not altruistic. It is people being rewarded in some way – because doing things open soure gives them more back than working alone. I wouldn’t say altruism is the driving force or at least hasn’t been for some time.. It’s already part of that research type system.

Comment) For me this is such a high level of where we are, you are talking about how we recognise contribution, citation etc. but just getting things deposited is the issue for me right now… I’d love to find out more about this but just convincing management to pay for ORCID IDs for all is an issue even…

A2 – Rachael) We do need to get work out about this, show how researchers have done this and the value of those will help. It may not just be through institutions but through academic societies etc. as well..

A2 – Neil) And this is back to the social dimension and thinking about what will motivate people to deposit. And they may take notice of editors… Sharing software can positively impact citations and that will help. Releasing software in the image processing community for instance also shows citations increase – and that can be really motivating. And then there is the economic impact for universities – is there a way we can create studies to show positive reputation and economic impacts on the institution that will prove the benefit for them.

Q3) A simple question – there are many potential solutions for software data… but will we see any benefits from them until we see REF changing to value software and data to the same extent as other outputs.

A3 – Neil) I think we are seeing a change coming. It won’t be about software being valued as much as papers. It will be about credit for the right person so that they are value. What I have seen in research council meetings is that they recognise that other outputs are important. But in a research project credit tends to go to the original writer of a new algorithm perhaps, not the developer who has made substantial changes. So where credit goes matters – the user, implementer, contributor, originator, etc? If I don’t think I will get suitable credit then where is the motivation for me to deposit my software?

EC Open Data Pilot, EUDAT, OpenAIRE, FOSTER and PASTEUR4OA – Martin Donnelly, Digital Curation Centre

I was challenged yesterday by Rachael and by Daniela at Jisc to do my presentation in the form of a poem…

There once was a man from Glasgee
Who studied data policy
In a project called FOSTER
Many long hours lost were
And now he?ll show some slides to ye?

So I will be talking about four European funded projects on research data management and open access that are all part of Horizon 2020. Many of you will be part of Horizon 2020 consortia, or will be supporting researchers who are. It is useful to remind ourselves of the context by which these came about…

Open Science is situated within a context of ever greater transpareny, accessibility and accountability. It is both a bottom up issue: the OA concept was coined about 10 years back in Budapest and was led by the high energy physics community who wanted to be more open in sharing their work, and to do so more quickly.  And it has also been driven from the top through government/funder support, increasing public and commercial engagement in research. To ensure better take up and use of research that has been invested in.

Policy wise in the UK the RCUK has seven Common Principles on Data Policy, size of the RCUK funders require data management plans. That is fitting into wider international policy moves. Indeed if you thought the four year EPSRC embargo timeline was tight, South Africa just introduced a no more than 12 month requirement.

Open Access was a pilot in FP7, this ran from August 2008 until the end of FP7 in 2013. It covers parts of FP7, but it is covers all of FP8/Horizon 2020 although that is a pilot process intended to mainstream by FP9 or whatever it is known by. The EC sees real economic benefit to IA by supporting SMEs and NGOs that can’t afford subscriptions to latest research. Alma Swan and colleagues have written on the opportunity costs which provides useful context to the difference Open Access can make.

Any project with H2020 funding have to make any peer-reviewed journal article they publish in an openly available and free to access, free of charge, via a repository – regardless of how they publish and whether green or gold OA.

H2020 also features an Open Research Data pilot – likely to be requirement by FP9. It applies to data and metadata needed to validate scientific results which should be deposited in a dedicated data repository. Interestingly, whilst data management plans needs to be created 6 months into project, and towards the end, they don’t require them to be filed with the EU at the outset.

So, lastly, I want to talk about four projects funded by the EU.

Pasteur4OA aims to simplify OA mandates across the EU – so that funders don’t have conflicting policy issues. That means it is a complex technical and diplomatic process.

OpenAIRE aims to promote use and reuse of outputs from EU funded research

EUDAT offers common data services through geographically distributed resilient network of 35 European organisations. Jisc and DCC are both working on this, integrating the DCC’s DMP Online tool into those services.

The FOSTER project sis supporting different stakeholders, especially younger researchers, in adopting open access in the context of the European Research Area and to make them aware of H2020 requirements of them – with a big carrot and a small stick in a way. We want researchers to integrate open acces sprinciples and practice in their current research workflow – rather than asking them to change their way of working entirely. We are doing train the trainer type activities in this area and also facilitating adoption, reinforcement and of OA policies within and beyond the EC. Foster is doing this work through various methods, including identifying existing content that can be reused, repackaged, etc.

Jisc Workshop on Research Data Management and Research at Risk Activities, and Shared Services – Rachel Bruce, Daniela Duca, Linda Naughton, Jisc

Rachel is leading this session…

This is really a discussion session but I will start by giving you a very quick overview of some of the work in research at ris as well. But this is a fluid session – we are happy to accommodate other topics that you might want to talk about. While we give you a quick overview do think about an RDM challenge topic you might want to take the chance to talk about.

So, in terms of Research at Risk this is a co-design challenge. This is a process we take forward in Jisc for research and development, or just development end of the spectrum, but to address sector challenges. The challenges facing the sector here is about the fragmented approach to research data and infrastructure. Because of that we are probably not reaching all the goals we would wish to. Some of that relates quite closely to some of what David Prosser was saying yesterday about open access and the benefits of scale and shared services. So, we have been asked to address those issues in Research at Risk.

Within Research at Risk we have a range of activities, one of the biggest is about shared services, including in the preservation and curation gap. You have already heard about discovery and research data usage, also the Research Data Spring.

So, the challenges we want to discuss with you are:

  1. The Shared services for RDM – yesterday there was discussion around the SHERPA services for instance. (Rachel will lead this discussion)
  2. Journal research data policy registry (Linda will lead this session)
  3. Business case and funding for RDM – articulating the role of RDM (Daniela will lead this session)
  4. But also anything else you may want to discuss… (Varsha will lead this group discussion)

So, Shared Services… This is an architecture diagram we have put together to depict all of the key services to support a complete data management service, but also linking to national and international services. And I should credit Stuart Lewis at UoE and John Lewis (Sheffield?) who had done much of this mapping already. We have also undertaken a survey of respositories around potential needs of HEIs. Some responses around a possible national data repository; a call for Jisc to work with funders on data storage requirements for them to provide suitable discipline specific data storage mandate.

Linda: I will talk a bit about the Journal Research Data Policies Registry – you can find out more on our blog and website. We want to create a registry that allows us to turn back time to see what we can learn from OA practices. The aim is to develop best practice on journal policies between publishers and other stakeholders. We want to know what might make your life easier in terms of policies, and navigating research data policies. And that input into this early stage work would be very valuable.

Daniela: The business case and costings for RDM is at a very early stage but we are looking at an agreed set of guidance for the case for RDM and for costing information to support the business case in HEIs for research data management. This reflects the fact that currently approaches to funding RDM services and infrastructure vary hugely, and uncertainty remains… And I would like to talk to you about this.

Rachel: we thought we would have these discussions in groups and we will take notes on the discussions as they take place, and we will share this on our blog. We also want you to write down – on those big post it notes – the one main challenge that you think needs to be addressed which we will also take away.

So, the blog will be going quiet again for a while but we’ll try and tweet highlights from groups, and grab some images of these discussions. As Rachel has said there will also be notes going up on the Jisc Research at Risk blog after today capturing discussions… 

Cue a short pause for lunch, where there was also be a demo taking place from: DMPonline – Mary Donaldson and Mick Eadie, University of Glasgow.

Our first talk of this afternoon, introduced by William Nixon, is:

Unlocking Thesis Data – Stephen Grace, University of East London

This project is for several different audiences. For Students it is about bridging to norms of being a career research, visability and citations. Helping them to understand the scholarly communication norm that is becoming the reality of the world. But this also benefits funders, researchers, etc.

We undertook a survey (see: http://dx.doi.org/10.15123/PUB.4274) and we found several already assigning DOI’s to theses, but others looking to do more in this area. We also undertook case studies in six institutions, to help us better understand what the processes actually are. So our case studies were for University of East London; University of Southampton; LSE; UAL; University of Bristol; and University of Leicester. Really interesting to see the systems in place.

We undertook test creation of thesis DOIs with University of East London and University of Glasgow, and University of Southampton undertook this via an XML upload so a slightly more complex process. In theory all of that was quite straightforward. We were grateful for the Jisc funding for that three month project, it didn’t get continuation funding but we are keen to understand how this can happen in more institutions and to explore other questions: for instance how does research data relate to the theses, what is it’s role, is it part of the thesis, a related object etc?

So questions we have are: What systems would you use and can they create/use persistent identifiers? Guidance on what could/should/must be deposited? One record or more? Opportunities for efficiencies?

On the issue of one record or more, a Thesis we deposited at UEL was a multimedia thesis, about film making and relating to making two documentary films – they were deposited under their own DOIs. Is that a good thing or a bad thing? Is that flexibility good?

Efficiencies could be possible around cataloguing theses – that can be a repeated process for the repository copy and for the library’s copy and those seem like they should be joined up processes.

We would love your questions and comments and you can find all project outputs.

Q1) What is the funder requirement on data being deposited with theses?

A1) If students are funded by research councils, they will have expectations regardless of whether the thesis is completed.

Q2) Have you had any feedback from the (completed) students whose work has been deposited on how they have found this?

A2) I have had feedback from the student who had deposited that work on documentary films. She said as a documentary film maker there are fewer and fewer ways to exhibit those documentary films. As a non commercial filmmaker seeing her work out there and available is important and this acts as an archive and as a measure of feedback that she appreciates

Q3) On assigning ORCID IDs to students – I struggle to think of why that would be an issue?

A3) Theoretically there is no issue, we should be encouraging it.

Comment: Sometimes where there is a need to apply an embargo to a thesis because it contains content in which a publisher has copyright – it may be useful to have a DOI for the thesis and separate DOIs for the data, so that the data can be released prior to the thesis being released from embargo. [Many thanks to Philippa Stirlini for providing this edit via the comments (below)].

IRUS UK – Jo Alcock, IRUS UK

We are a national aggregation service for any UK Institutional Repositories which collects usage statistics. That includes raw download data from UK IRs for all item types within repositories. And it processes raw data into COUNTER compliant statistics. And that aggregation – of 87 IRs – enables you to get a different picture than just looking at your own repository.

IRUS-UK is funded by Jisc. Jisc project and service manage IRUS-UK and host it. Cranfield University undertake development and Evidence Base at Birmingham City University undertake user engagement and evaluation.

Behind the scenes IRUS-UK is a small piece of code that can be added to repository software and which employs the “Tracker Protocol”. We have patches for DSpace, Plug-ins for Fedora, and implementation guidelines for Fedora. It gathers basic data for each download and sends it to the IRUS-UK server. The reports are Report 1 and Report 4 COUNTER compliant. We also have an API and SUSHI-like service.

At present we have around 400k items covered by IRUS-UK. There are a number of different reports – and lots of ways to filter the data. One thing we have changed this year is that we have combined some of these related reports, but we have added a screen that enables you to filter the information. Repository Report 1 enables you to look across all repositories by month – you can view or export as Excel or CSV

As repositories you are probably more concerned with the Item Report 1 which enables you to see the number of successful item download requests by Month and Repository Identifier. You can look at Item Statistics both in tabular and graphical form. You can see, for instance, spikes in traffic that may warrant further investigation – a citation, a news article etc. Again you can export this data.

You can also access IRUS-UK Item Statistics which enable you to get a (very colourful) view of how that work is being referenced – blogged, tweeted, cited, etc.

We also have a Journal Report 1 – that allows you to see anything downloaded from that journals within the IRUS-UK community. You can view the articles, and see all of the repositories that article is in. So you can compare performance between repositories for instance.

We have also spent quite a lot of time looking at how people use IRUS-UK. We undertook a number of use cases around the provision of standards based, reliable repository statistics; reporting to institutional managers; reporting to researchers; benchmarking; and also for supporting advocacy. We have a number of people using IRUS-UK as a way to promote the repository, but also some encouraging competition through newsletters etc. And you can find out more about all of these use cases from a recent webinar that is available on our website.

So, what are the future priorities for IRUS. We want to increase the number of participating repositories in IRUS-UK. We want to implement the IRUS tracker for other repository and CRIS software. We want to expand views of daya and reports in response to user requirements – for instance potentially alt metrics etc. We also want to include supplementary data and engage in more international engagement.

If you want to contact us our website is http://irus.mimas.ac.uk; email irus@jisc.ac.uk; tweet @IRUSNEWS.

Q1) Are the IRUS-UK statistics open?

A1) They are all available via a UK Federation login. There is no reason they could not technically be shared… We have a community advisory group that have recently raised this so it is under discussion.

Q2) How do data repositories fit in, especially for text mining and data dumps?

A2) We have already got one data repository in IRUS-UK but we will likely need different reporting to reflect the very different ways those are used.

Q3) If a data set has more than one file, is that multiple downloads?

A3) Yes.

Q3) Could that be fixed?

A3) Yes, we are looking at looking at separate reporting for data repositories for just this sort of reason.

Sadly Yvonne Howard, University of Southampton, is unable to join us today due to unforeseen circumstances so her session, Educational Resources, will not be going ahead. Also the Developer Challenge has not been active so we will not have the Developer Challenge Feedback session that Paul Walk was to lead. On which note we continue our rejigged schedule…

Recording impact of research on your repository (not impact factors but impact in REF sense!) – Mick Eadie & Rose-Marie Barbeau, University of Glasgow; 

Rose-Marie: Impact is my baby. I joined Glasgow specifically to address impact and the case studies. The main thing you need to know about the impact agenda is that all of our researchers are really stressed about it. Our operating landscape has changed, and all we have heard is that it will be worth even more in future REFs. So, we don’t “do” impact, but we are about ensuring our researchers are engaging with users and measuring and recording impact. So we are doing a lot of bridging work, around that breadcrumb trail that explains how your research made it into, e.g. a policy document…

So we have a picture on our wall that outlines that sort of impact path… showing the complexity and pathways around impact. And yet even this [complex] picture appears very simple, reality is far more complicated… When I talk to academics they find that path difficult: they know what they do, they know what they have to show… so I have to help them understand how they have multiple impacts which may be multiple impacts, it might be be by quite a circuitous route. So for instance in a piece of archeological work impacted policy, made Time Team, impacted the local community… Huge impact, extension international news coverage… But this is the form for REF processes…

But my big message to researchers is that everything has changed: we need them to engage for impact and we take that work seriously. It’s easy to say you spoke to schools, to be part of the science festival. We want to capture what these academics are doing here professionally, things they may not think to show. And we want that visible on their public profile for example. And we want to know where to target support, where impact might emerge for the next REF.

So, I looked at other examples of how to capture evidence. Post REF a multitude of companies were offering solutions to universities struggling to adapt to the impact agenda. And the Jisc/Coventry-led project establishing some key principles for academic buy in – that it needed to be simple and very flexible – was very useful.

And so… Over to the library…

Mick: So Rose-Marie was looking for our help to capture some of this stuff. We thought EPrints might be useful to capture this stuff. It was already being used and our research admin staff were also quite familiar with the system, as are some of our academics. We also had experience of customising EPrints. And we have therefore added a workflow for Knowledge Exchange and Impact. We wanted this to be pretty simple – you can either share “activity” or “evidence”. There are a few other required fields, one of which is whether this should be a public record or not.

So, when an activity/evidence is added the lead academics have can be included, as can any collaborating staff. The activity details follow the REF vocabulary. We include potential impact areas for instance… And we’d like for that record to be linked to other university systems. But we are still testing this with research admin staff.

We still have a few things to do… A Summary page; some reporting searching and browsing functionality – which should be quite easy; link to other university systems (staff profiles etc); and we would like to share this with the EPrints community.

Q1) What about copyright?

A1 – Rose-Marie) Some people do already upload articles etc. as they appear. The evidence repository is hidden away – to make life easier in preparing for the next REF – but the activity is shared more publicly. Evidence is

Q2 – Les) It’s great to hear someone talking about impact in a passionate and enthuastic way! There is something really interesting in what you are doing and the intersection with preservation… In the last REF there was evidence lost that had been on the web. If you just have names and URLs, that won’t help you at the end of the day.

A2 – Rose-Marie) Yes, lack of institutional memory was the biggest issue in the last REF. I speak a lot to individuals and they are very concerned about that sort of data loss. So if we could persuade them to note things down it would jog memories and get them in that habit. If they note disappearing URLs that could be an issue, but also I will scan everything uploaded because I want to know what is going up there, to understand the pitfalls. And that lets me build on experience in the last REF. It’s a learning process. We also need to understand the size of storage we need – if everyone uploads every policy document, video etc. It will get big fast. But we do have a news service and our media team are aware of what we are doing, and trying to work with them. Chronological press listings from that media team isn’t the data structure we would hope for so we are working on this.

William) I think it is exciting! As well we don’t think it’s perfect – we just need to get started and then refine and develop that! Impact did much better than expected in the last REF, and if you can do that enthusiastically and engagingly that is really helpful.

A2 – Rose Marie) And if I can get this all onto one screen that would be brilliant. If anyone has any questions, we’d love to hear them!

Impact and Kolola – Will Fyson, University of Southampton

I work for EPrints Services but I also work for Kolola, a company I established with co-PhD students – and very much a company coming out of that last REF.

The original thinking was for a bottom up project thinking about 50 or 60 PhDs who needed to capture the work they were doing. We wanted to break down the gap between day to day research practice and the repository. The idea was to allow administrators to have a way to monitor and plan, but also to ensure that marketing and comms teams were aware of developments as well.

So, our front page presents a sort of wall of activity, and personal icons which shows those involved in the activity. These can include an image and clicking on a record takes you through to more information. And these records are generated by a form with “yes” or “no” statements to make it less confusing to capture what you have done. These aren’t too complex to answer and allow you to capture most things.

We also allow evidence to be collected, for instance outreach to a school. You can also capture how many people you have reached in this activity. We allow our community to define what sort of data should be collected for which sort of activity. And analytics allow you to view across an individual, or a group. That can be particularly useful for a large research group. You can also build a case study from this  work – useful for the REF as it allows you to build up that case study as you go.

In terms of depositing papers we can specify in the form that an EPrints deposit is required when certain types of impact activities are recorded – and highlight if that deposit has been missed. We can also export a Kolola activity to EPrints providing a link to the Kolola activity and any associated collections – so you to explore related works to a particular paper – which can be very useful.

We’ve tried to distribute a research infrastructure that is quite flexible and allow you to have different instances in an organisation that may be tailored to different needs of different departments or disciplines. But all backed up by the institutional repository.

Q1) Do you have any evidence of researchers gathering evidence as they go along?

A1) We have a few of these running along… And we do see people adding stuff, but occasionally researchers need prompting (or theatening!), for instance for foreign travel you have to be up to date logging activity in order to go! But we also saw an example of researchers getting an entry in a raffle for every activity recorded – and that meant a lot of information was captured very quickly!

(Graham Steel @McDawg taking over from Nicola Osborne for the remainder of the day)

Demo: RSpace – Richard Adams, Research Space

 

RSpace ELN presentation and demo. Getting data online as early as possible is a great idea. RSpace at the centre of user data management. Now time for a live demo (in a bit).

Lab note books can get lost due to a number of reasons. Much better is an electronic lab book. All data is timestamped. Who made what changes etc. are logged. Let’s make it easy them use. Here’s the entry screen when you first log in.  You can search for anything and it’s very easy to use. It’s easy to create a new entry. We have a basic document into which you can write content with any text editor. You can drag and drop content in very simply. Once documents have been added they appear in the gallery. Work is saved continuously and timestamped.

We also have file stores for large images and sequencing files.

NOW A LIVE DEMO.

It’s very easy to configure. Each lab has it’s own file server. Going back to workspace, we’re keen to make it really easy to find stuff. Nothing is ever lost or forgotten in workspace. You can look at revision history. You can review what changes have been made.  Now looking at a lab’s group page. You can look at but not edit other user generated content. You can invite people to join your group and collaborate with other groups. You can set permission for individual users. One question that comes up often is about how to get data out of the system. Items are tagged and contain metadata making them easier to find. To share stuff, there are 3 formats for exporting content (ZIP, XML and PDF).

The community edition is free and uses Amazon web services. We’re trying to simplify RSpace as much as possible to make it really easy to use. We are just getting round to the formal launch of the product but have a number of customers already. It’s easy to link content from the likes of DropBox. You can share content with people that are not registered with an RSpace account. Thanks for your attention.

Q1) I do lot’s of work from a number of computers.

A1) We’re developing an API to integrate such content. Not available just yet.
Closing Remarks and presentation to winner of poster competition – Kevin Ashley, Digital Curation Centre

I’m Kevin Ashley from Digital Curation Centre here in Edinburgh. Paul Walk mentioned that we’ve done RFringe events for 7 years. In the end, we abandoned the developer challenge due to a lack of uptake this year. Do people still care about it ? Kevin said there is a sense of disappointment. Do we move on or change the way we do it ? Les says I’ve had a great time, it’s been one of the best events I’ve been to for quite some time. “This has been fantastic”. Thanks Paul for your input there said Kevin.

David Prosser’s opening Keynote was a great opening for the event. There were some negative and worrying thoughts in his talk. We are good at identifying problems but not solutions. We have the attention of Governmental department in terms of open access and open data. We should maximize this opportunity before it dissapears.

Things that we talked about as experiments a few years ago have now become a reality. We’re making a lot of progress generally. Machine learning will be key, there is huge potential.

I see progress and change when I come to these events. Most in the audience had not been to RFringe before.

Prizes for the poster competition. The voting was quite tight. In third place LSHTM, Rory. Second place. Lancaster. First place, Robin Burgess and colleagues.

Thanks to all for organizing the event. Thanks for coming along. Thanks to Valerie McCutcheon for her contribution (gift handed over). Thanks to Lorna Brown for her help too. Go out and enjoy Edinburgh ! (“and Glasgow” quipped William Nixon).

 

Tagged with: , , , , , , , , , , , , , , , ,
Posted in LiveBlog

Repository Fringe 2015 – Day One LiveBlog

Repository Fringe 2015. The Original Repositories Unconference |3rd - 4th August 2015

Welcome to Repository Fringe 2015! We are live for two packed days of all things repository related. We will be sharing talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Welcome to Edinburgh – Jeremy Upton, Director, Library and University Collections, University of Edinburgh

It gives me great pleasure to introduce you to the University of Edinburgh to this great event organised jointly by staff from the Digital Curation Centre, EDINA, and the University of Edinburgh.

If you have come from outside Edinburgh then it really is a beautiful city and I encourage you to explore it if you have time. And of course it’s the Edinburgh festival and I’m sure you’ve already had a sense of that coming in today. It is an event with a huge impact on the city, and on the University – we get involved in hosting and running events and I have to give a plug for our current exhibition, Towards Dolly, featuring Dolly the Sheep at the University of Edinburgh library.

So, as the new Library Director I really am pleased that Repository Fringe is running again here. And in my time thus far two issues have really been a major priority: Open Access and Open Data, and I’m pleased to see both reflected in your programme. Amongst academics these issues can trigger quite a fair degree of panic and concern. But you can take a really positive opportunity here – the academic community is looking to our community to provide creative solutions.

I’m also delighted to be here because throughout my career I have been a fan of collaboration and shared working, and the areas of open access and open data are areas particularly ripe for collaborative and shared working, to share knowledge and share some of our pain, as we all meet these shared requirements.

We also find ourselves in an increasingly uncertain world and that makes the role of innovation and ideas so important – and that is why events like this are so important, giving us space to

Edinburgh was an early adopter of OA, starting in 2003. We have strong support for open access with champions within departments. We received over £1.1M RCUK funding of Gold OA this year. Our staff look at the landscape not only from our own institutional perspective, but also looking to the much wider sector. We work in collaboration with colleagues in EDINA and with DCC, talking regularly and sharing knowledge and expertise.

We are one of the partners in the new Alan Turing Institute, working with large data sets including large open data sets. And that is looking at the opportinities for new and innovative research in areas such as healthcare. And I heard our Vice Chancellor talking about the use of data in, for instance, treating diabetes in new ways.

Now, finally, I have a few practical items to mention. You all have stickers for voting on your favourite posters. Also Repository Fringe is very deliberately a fringe event – we want this event to have a looser structure than traditional conferences. The organisers want me to emphasise informality, please do dip in and out of sessions, move between them, your presenters will expect that so move as you wish. And if you want to create your own break out sessions there are rooms available – just ask at the Fringe registration desk. And the more that you put into this event, the more you will get out of it.

We would like to thank our sponsors this year: Arkivum, EPrints Repository Services, and the University of London Computing Centre.

So, please do enjoy the next few days and take the opportunity to see some of Edinburgh. And hopefully you will have a fruitful event finding new solutions to the challenges we all face.

Now it is my pleasure to introduce your opening keynote speaker. David Prosser came from the “dark side” of medical publishing, then moving on to undertake his doctorate and move onto his work with Research Libraries UK…
Fulfilling their potential: is it time for institutional repositories to take centre stage? – David Prosser, Executive Director, RLUK,

As someone who has been involved in Open Access for the last 12 years I want to look back a bit at our successes and failures, and use that to set the scene with where we might go forward.

I wanted to start by asking “What are repositories for?”. When we first set up repositories they were very much about the distribution of research, for those beyond the institution, and for those without the funding to access all of the journals being published in. There was, and continues, to be debate about the high profits made by commercial publishers… There was a move towards non commercial publishers. And there was something of a move to remove “dirty profits” from the world of scholarly communications that helped drive the push to open access.

We have also seen a move from simpler journals and books, towards something much richer which repositories enable. We were going to revolutionise scholarly communications but we haven’t done that. We have failed to engage researchers adequately and I think that the busyness of academics is an insufficient reason to explain that. So why have we failed to engage and to get academics to see this as something they should do on a daily basis? I warned Les Carr years back when he was talking about the Schools timetabling… He was saying how hard that is to do but that he knew he had to do that because that was part of what he needed to do to achieve what he wanted. We have gotten fixated on making things easy in open access, even though people will take the time to do things they feel its important to do.

And there has also been confusion about standards, about publication status – what they can share as pre/post print and what they were allowed to do which took a long time to resolve. It didn’t help that publishers were confused too. There is an interesting pub conversation to be had about whether the confusion is a deliberate tactic from publishers… In charitable moments you can see statements coming out that suggest confusion, that publishers don’t understand the issues.

But there are places where those issues have been overcome. Arxiv is so well established in high energy physics that no publishers restricts authors in depositing there. Another subject based repository, PubMed Central, has also seen success but that is in part because of requirements on authors and publishers, and that space has seen success because of working closely with publishers. We have also seen FigShare and Mendeley that have seen great success – what is it about them that is attractive that we can learn from and borrow from for repositories?

Over the last 12 years there has been a real tension between pragmatism and idealism. When repositories first emerged we were happy to take in content without checkin the quality so carefully – for instance suitable and clear rights information. As a user the rights information is not always there or clear. Not all metadata we have, especially for older material, is neccassarily fit for purpose, for our needs. We have to some extent a read only corpus because of that. But is that enough? Or are there more interesting things we would want to do? It is difficult to look back and see those pragmatic decisions as the wrong one: we wanted to demonstrate the value; to show authors the potential for dissemination of their research… It is hard to say that was wrong but going forward we really need to make a concious decision about what it is that we want.

So one of the things about open access as a force for revolutionising scholarly communication… You can see scholarly communication as being made of three functions of registration, archiving and dissemination, now all of these can be fulfilled by repositories but we still seem to be using repositories for all of these things. We haven’t moved to using repositories for those functions first. Early on there was an idea that you would deposit work locally then get it accepted, kitemarked, etc. by a journal or process after that. We can see the journal has retained it’s position. Libraries in the Scope3 project, which looked at journals whose content was entirely available in Arxiv, spent 10 years persuading and working with publishers to get those journals to be open access so that post prints were as open access as all previous and parallel versions of the same paper. But that was about protecting journals. Libraries seem to be so keen on journals that they are desperate to protect them, sometimes in the face of huge opposition from publishers!

So we have a very conservative system. You have to see journals not as a form of scholarly communication, it is about reward mechanisms. If you are rewarded for being in one of those high energy physics journals it does make sense that you should be so invested in supporting their existance. The current reward structures are the issue, but what is the solution there? One of the governments key advisers, Dr Mike Walker, raised this issue without suggesting solutions. And in research institutions and libraries we are so far away in terms of our sphere of influence from those reward mechanisms which means all we can do is nudge and inform…

We can, however, see open access advocacy as a success. In the last government we saw some openness to talking about open access… We can talk about whether the impact of that has been totally helpful but there has been impact. Something like 80 institutions now have open access policies – they vary in effectiveness but those even being in place are remarkable. And where they work well they make a real difference, with Wellcome Trust, the University of the Age (?), RLUK and the HEFCE policy is really the game changer.

It has been interesting, over the last few weeks, to see a change in the HEFCE policy. It is interesting to see how ready institutions are for it – there are many that are not ready yet and that could mean a pressure to change that policy but we see them stating that policy mistakes  will be treated leniently which is helpful. Authors usually know if their paper has been accepted but it can be harder to know when it has been published, which is an important trigger. But it seems that the stick of the HEFCE policy is too strong. Universities don’t trust academics and researchers to deposit regularly, and they recognise the risks that that brings in terms of the REF and their funding in the future. This is why a lot of Russell group universities in particular have lobbied for acceptance rather than publication date…

It says a lot about scholarly communications that authors and institutions do not always know when a paper has been accepted or published. The idea of the notification of acceptance being a private transaction between the author and the publisher, that raises some concerns for research libraries.

Now, I wanted to make a small diversion here to talk a bit about RCUK. With the comprehensie spending review coming up in UK Government, and saber rattling about 40% cuts in research budgets. And I think funders, RCUK in particular, will look at what they are spending and ask if they are getting value for money. And I think researchers will also question, if their budgets are cut, why RCUK are paying so much money to Reed Elsevier. So there will be pressure to stop paying for open access. And there is a transition period where longer embargoes are allowed for open access – this has led to groteque growth of publishers decision trees! It could be that that the end of that transition period, and a cut in funding for gold OA, may put the focus back on repositories. That is an important scenario that we should be thinking seriously about. And the issue of embargoes means I need to say that there is still no harm in shorter embargoes. Any embargo is a concession to the publisher. It’s a concession that potentially slows down the communication and sharing of research.

There is also an important embargo change where publishers fail to respond to enquries about gold OA, such that those crucial first few weeks of interest may be lost to them. Now I think that’s another incompetance issue rather than something more sinister.

By failing to engage authors in the deposit process, to engage them in that way. We are making APCs payments easier – we just ask authors to tell us where they are publishing and we pay from them. I am concerned about separating the author from the process in general, but particularly from APCs. The author doesn’t know or care about the costs involved. If they do engage with that, if they do look, then they need to make that choice about whether the price charged is worth it for the relevance, impact or importance of that journal. Separating the author from the process makes us in danger of creating an APC crisis in the same way that we had a serials crisis.

TRaditionally Universities have shown a shocking indifference to their scholarly output – the research papers, publications, etc. It was very hard to understand what was published, what was created. Very little responsibility on scholars to capture their own published outputs – an assumption that library would purchase but that assumption was not always correct. Some of that is being addressed by REF, but also be Universities becoming much more aware of their intellectual output. Capturing and reflecting on that output is no longer seen as weird or alien, and that is good for our work, for our arguements about the value of open access, of respositories, etc. BUt universities do also care about cost benefit analysis for this work. And for data in particular there can be really high costs associated with making data available for reuse. We need better stories to explain how th ebenefits outweigh the costs.

We have had issues over the last 15 years around the visiion of open access that we originally had… In the UK we could talk about

Danny Kingsley, Cambridge University talked at LIBER about the idea that in a sense the compliance engine aspects of repository fringe can devalue the potential of repositories, of what they could be for open access in the academic community. If open access is “just” a side effect of repositories it is an amazing side effect! Making work available under open access is a real achievement, even if the route is rather tortuous, and has involved pain in negotiating the confusion and issues with publishers, we have made a real difference. And there is nothing wrong with being flexible over open access, and of jumping onto band wagons. Compliance is a useful band wagon right now, so we should use it! We should stop worrying about whether people do the right thing for the wrong reason, and just be glad that the right things are taking place.

But over the next few days we should be thinking about how we can use what is in our repositories, how easy to rights statements make licenses, how can we look across a topic easily across multiple repositories. And, in terms of preservation, how concerned are we and should we be about that? Are they more about dissemination? If we are going to get an explosion of material of the next few years do we have the capacity to handle and interpret that material?

So we have had a messy tortuous route here but open access is really happening, and we have several days to develop our vision for what we should be doing with this. David Willets has talked a lot about open access, I think he’s rather overplayed his hand based on what is happening in the US, but there is so much more that we could do with open access,

Q&A

Q1, Grant, University of Leicester) How many repositories should there be? There tends to be 1 per university. There are some joint ones between institutions…

A1) If you started with a blank state today would you set up 100-120 institutional repositories? I tend to think no, you wouldn’t… You would want something more centralised. There are a variety of institution with very varied expertise: Edinburgh is very skilled and engaged and would want their own repository but there are many institutions are really concerned about what they can set up to meet HEFCE requirements, and there is an opportunity there for someone to bring them together so that they can all meet those requirements in a centralised way. I think there should be more centralisation…

Comment – Paul Walk) There is a shadow issue there about not the number of repositories, but who controls it. A collaborative set up where control is retained seems the important thing…

A1) I think White Rose seems like a great example – a shared repository but it looks like each institution has their own space in terms of how that is presented on the web.

One of the big areas in fashion is the idea of library as publisher, of each institution publishing. I think what should be learned is that infrastructure for University presses should be shared, but content is where each institution should focus. The idea of all institutions using their own publishing platforms, different set ups, appearing to be but not quite interoperable, doesn’t seem like the way to go.

Q2 – Kevin Ashley, DCC) I remember Andrew Prestwick talking about institutional repositories in Wales where he commented that for smaller institutions the issue of control, of their own system, was really important to them.

A2) We live in a strange world where authors are hugely keen to give away all of their Intellectual Property to commercial publishers but can be odd about making it open access.

Comment – Rachel Bruce, Jisc) I remember the conversations Kevin talked about, and we set up a shared repository, the Depot but that was not a success. The institutional repository structure seemed more effective at that time.

A2) I think that may have been an issue of that being too early. The Depot has been more a repository for lost souls, for authors without institutions… But there wasn’t really an attempt to engage institutions.

Comment – RB again) It was a repository of last resort… And we would engage differently around that if we were doing that now.

Q3 – Les Carr, Southampton) In terms of where things should be put, should there be departmental repositories? As someone with a national view, looking over a national research ecology, how would you reshape the research landscape 15 years ago? We seem to have gotten stuck in commerciality, compliance, quality of journals, quality of research, and not questioning the system. How would you have shaken up the system in 2000 to change that?

A3) It is really hard. Many of the decisions of the last 15 years were made with good intent. The whole of scholarly communications is about the reward structure. It makes people write papers that are not really intended for communicating results, but for getting rewards. You see Peter Murray-Rust talking about this a lot… You have a huge range of data and outputs that you have to reduce to 5 pages of write up and results that are not easy to reuse. We do that because of pay and reward… Here we have the bizarre situation that HEFCE says that Impact Factor and where you publish isn’t the issue in REF, but everything academics and researchers believe is that that stuff matters. And so much of what we are doing are hampered by the idea that journals are how we decide funding, how people develop their careers. But if Dr Mike Walker can’t say what the alternative would be, I don’t think I can.

Repositories for Open Access, Research Data Management and beyond – Rory McNicholl, Timothy Miles-Board, University of London Computer Centre

I am going to start with a short potted history of the University of London Computing Centre… In 1966 the Flowers Report assessed the probably computer eeds during the next five years of users in Universities and civil research. The great and the good of the University of London met to discuss this and they commissioned a glamorous building in 1968 for computing. By the 1980s we had a new machine which had a fantastic amount of computing power which could be used by researchers around the region.

After the 1980s there was deemed to be less need for a single computer centre in quite the same way. But that there was a real need for computing for HE and Public sectors. So, what are we doing at Repository Fringe? Well back in 1997 Kevin Ashley and colleagues recognised the need to preserve at-risk digital objects and work was undertaken to address that through a project, NDAQ, that ran to 2009. Following that we have been working on a new project, from 2006, including a Digital Preservation Training Programme, and what we are now calling the Research Technologies Service.

The Research Technologies Service provide various things including Open access repositories; research data repositories; eJournals – which there has been growing interest in; Archvival storage; and Bespoke asset presentation – a way to have a front end customised for specific organisations.

To achieve this we are using ePrints, alongside OJS for our eJournals, and Arkivum (A-Stor in ULCC DC), as well as Python, Django and elasticsearch. And we do that for various institutions which means we need to be interoperable with 3rd party systems. So we are interoperable with institutional HR systems, Harvesters, etc… with crossref, fundref, CERIF, IRUS UK, Altmetric, BL, OpenAIRE, ORCID, DataCite, SHERPA. But there are so many more – too many to detail in full.

How do we do what we do? We are flexible, a small team which is very well supported with infrastructure expertise and a service desk. We are community driven, as part of the HE community responsive to that community. We are also fluid, platform agnostic, and ready to listen to our customers and embrace change.

That brings us on to the community platform, how we realise those things. Those funder (HEFCE and SFC) mandates, tend to drive what we do… That’s what keeps the community up at night, and thinking about what they can achieve and how. We engage in a way that takes best advantage of the shared code and initiatives around open source software. So developers write code, share on GitHub, and the people we host can then access that shared expertise and development via ePrints and the ePrints Bazaar – bypassing commercial coders and quickly ensuring they are able to address RDM, open access, etc. issues.

And we have a community platform for Open Access – oa_compliance, OpenAccess, rioxx2 – we’ve made that something we can put into a repository so it describes what needs to be described; datesdatesdates – a way to understand which dates count; reviewed_queue – to manage the process and workflow to crack the publication process; ref2014; and… more? The open access button is of interest… ePrints has had a “request copy” button for years and years… Maybe we need a “request open access copy” added?

Over the last couple of years there has been a huge push towards using repositories for research data, and with RDM. We have been working with University of Essex and University of Southampton to look at the recollect profile – keeping research data and describing it effectively. And we have put that into the community platform. And then we undertook work with University of East London, and the London School for Hygiene and Tropical Medicine; and we have worked on DataciteDOI, developing that on a bit with University of Southampton and DataCite; and arkivum has been a big part of the OA Mandate… Seeing that it became clear that there was a need for infrastructure, and work with arkivum has helped us top up access to the archive network. And another thing that came out of the EdShare world, UEL, and LSHTM which was about describing project data sets and collections. And more? We are working with Jisc, University of Creative Arts, and CREST on the next phase of the Research Data Spring project to improve the way that data moves from the researcher to the repository, to make that quicker and more efficient, and the presentations of that data.

And beyond… Lots of other things have happened… RepoLink – for linking research papers together, UEL have gone for this for linking research objects; pdf_publicationslist; soundcloud; iiif-manifest – coming out of work on presenting digitised objects and that is feeding back into how we do presentation; bootstrap – our colleague at UEL did some fantastic work around bootstrap to make repositories work well on mobile, that’s available to use and explore now; crosswalks_sgul – this has been around symplectic tools, we work with St George’s a lot on this and they have been happy to publish this back into the community.

So, all of the work we’ve done can be found on the community platform, ePrints bazaar (bazaar.eprints.org), but the source code around that isn’t always obvious so you can also find this work on GitHub (github.com/eprintsug).

So, what’s next? Well the Public Knowledge Project and the idea of university presses seems timely, there are more opportunities for more community platforms. There are exciting things coming from our siblings at the School of Advanced Study and Senate House Library, who have made an interesting appointment in the area of digital so exciting things should come out of that… And we are also looking at Preservation as a Service… working with Arkivum and artfactual… or maybe something more simple. And we are also creeping backwards through the research object lifecycle… And of course more collaborations, so we have ORCID in place but can we help institutions get more impact from it for instance?

Lastly… We have a job ad out – we’re hiring – so come join the team! Contact me: rory.mcnicholl@ulc.ac.uk. Thank you!

Q&A

Q1 – Rachel Bruce, Jisc) Who is using OJS?

A1) We have several universities using OJS, we’ve worked on a plugin for integrating in repositories, on a system for another universities to encourage universitity staff and students to set up their own journals. We have three universities using OJS in those ways so far, but lots of interest in this area at the moment.

Q2 – Dominic Tate, UoE) Is there one area of service you are particularly looking at suppoet?

A2) I think CREST has been an interesting example… provising archiving for organisations that can’t justify doing that separately. We do tend to focus on the technology

Poster Session – why should we look at your poster? – Martin Donnelly orchestrating the minute madness!

Sebastian Palucha: I am talking about Hydra, I can tell you about Hydra.. If you know about we moved from ePrints to Hydra, and we can also talk about how we integrate DataCite.

Gareth Knight from London School of Hygiene: We developed a plugin to add geospatial data to items in ePrints. Come and ask us about it!

Alan Hyndman from FigShare: My poster is on how FigShare can interoperate with institutional repositories, and also some of the other interoperabilities we are already doing…

Robin Burgess, from GSA: Apologies, no guitar this year! This is on exploring research data manager in the digital arts, and in the communities fields. And this is my last chance to present here – I’m moving on to Sydney as their Repository and Digitisation manager so I wanted to go out with a bang!

Adam Carter, from the EPCC: I’m here for the Pericles project, and EU FP7 project on digital preservation. We are not building a digital repository, we are about the various different aspects of managing change around a digital repository. We are arguing getting data in is easy, how do you deal with technological change in terms of accesing and using data, and the change in who uses your data and how, so it ties into repositories in many ways. The poster includes some work we are doing modelling the preservation ecosystem. Also on sheer curation – that preservation when the data object is created, not when you deposit it.

Rory Macneil, from RSpace: on integrating electronic lab notebooks with RDM and linking in DataStore at University of Edinburgh, we’ll be doing a demo at lunchtime on this too! RSpace supports export of documents, folders and associated metadata in XML files, and that work leads to an integrated RDM workflow for researchers and the institution, so that the data is collected, structured, and archived and shared. That’s possible by working with researchers, RDM professionals and IT managers.

Pablo de Castro, from LIBER: my poster is on the EU FP7 post cancellation access project. This is an experiment which OpenAIRE has been managing in order to implement fair Gold OA. Some specific constraints that this project looks at is that publication in hybrid journals will not be funded. And we are working on what we call the APC alternative funding project. We are working with a €4 M with a pilot that began in May, with significant help from University of Glasgow. And given those constraints we are keen to engage institutions to make this a success, this idea for an alternative way to implement Gold OA. And we have some idea of the main places requests are coming from, etc. But it should grow quite a lot in the forthcoming months.

Martin: In the spirit of the Fringe please do make use of the blank poster boards! Add your own literature, arrows, etc!

Hardy Schwamm: DMA Online is an online dashboard, funded under Jisc Research Data Spring, which provides a view of how many data sets are funded and created in your institution, how many have an RDM plan, how much data they plan to use. It takes data from various places and hopefully DMP Online, and any information that is held spreadsheets. You can see our poster and our demo. Do come and tell us what you would like to see from the dashboard…

Dominic Tate: I’ve been asked by my colleague Pauline Ward, there are some noticeboards up in the forum for comments on tomorrow’s workshop – do come and ask me if you have any questions about that.

Lunch, which includes: and Demo:RSpace – Rory Macneil, Research Space

And we are back…

Open Access Workshop – Valerie McCutcheon, University of Glasgow

We are using the EPrints repository at Glasgow in this session, but this is just one example of open access. But you may have your own set up or perspective. And we’ll talk for about 40 minutes, then you can choose what you want to talk about in more details.

So, we are going to have a live demo of the journey an open access article goes through when it goes into our repository, Enlighten.

So, we would select the type of item, we upload that file, and then we add details about that paper – the title, abstract, etc. [to fill these in Valerie is taking audience suggestions – not all of the journal titles being suggested sound quite authentic!]. Then our next screen adds the source of funding for the publication. That’s so far, so traditional… But wouldn’t it be nice to get some of that data from the journals?

So, without further ado, I’m handing over to Steve Byford from the Jisc to talk about the Jisc Publications Router

Wouldn’t it be lovely if publications data could automatically go into your institutional repository in a timely and REF compliant sort of way. Now, to manage expectations a bit, this won’t fix all the possible problems but the Router will prompt at two key stages at the publications process. The Router gathes details of research articles from publishers etc. Then it directs articles to appropriate institutions, alerting them to the outputs and helping capture of the content into a repository or CRIS.

This has been funded as a project based at EDINA. That project reached it’s conclusion on Friday. The aim of that project was to demonstrate a viable prototype, which it did. It processed real publications information and that worked well. And now that that project has finished a successor system is currently being developed, to migrate existing participants in August to September 2015. Then we will be recruiting new participants, and then aiming for rapid expansion of content captured. And we have the intention to move to full service by August 2016. So, if you want to hear more about that, then choose that for your breakout session following this one.

Back to Valerie

Now, once I’ve uploaded my article, and the information, I might want to look at the access status of that article. And on that note I’d like to introduce Bill Hubbard to give you an update on SHERPA services.

I’m here with my colleague who actually manages the SHERPA services… If you select us for a breakout session we’ll be doing a double act! So, we have five minutes to tell you what’s new… Hopefully you have already heard of us, and use the site. If you do then I hope you find us useful. We support open access processes around publishers rights, open access statuses, etc. We are about making your job easier, and so part of what I want to find out from you today is how we can do that, what we can do to help make your life easier.

RoMEO, which we started over 10 years ago, the world was a lot simpler but the policies and rights picture has only become more complex. JULIET is a registry of policies on Open Access and that is a more straightforward process. OenDOAR is the world’s authoritative and quality assured directory of open access resources. We also run FACT and then also REF – advice to UK authors on compliance with HEFCE’s OA policy which will launch soon!

In RoMEO we have rights data on over 19,000 journals, In Juliet we have over 155 funders, and in OpenDOAR we have around 2937 IR listings.

Futures… I’m asking you not to tweet pictures… This is work in progress… We have a new interface and improved funcationality coming in OpenDOAR, FACT, for RoMEO we are working on Improved User Feedback, and improved international collaboration, and maybe even improved policies – we are working with publishers about the quality of expression. And REF is a new service of course. And what else? Well we are moving towards an improved range of shared services with Jisc… Come to our session to find out more…

And now back to Valerie

So we are going to look at some of these services just now… So I will look up our article on SHERPA/RoMEO… We have integrated more open access information in our repository – we have a whole screen for this now. And so on this screen we see the estimated cost – lets assume we’ve gone for the Gold option – and I can later update with actual costs to reflect any changes in currency/price etc. And we can select the status of the paper – including Green, Gold but also “No OA Option”, Pending etc. And we can add the Article Reference, Date of compliant deposit, funder acknowledgement, etc. Then we have an RCUK screen for completion… And finally a deposit screen.

And now over to Balviar Notay to talk about how Jisc are working on RCUK Compliance.

I will be talking about the RIOXX metadata application profile and guidelines for research papers, worked with RCUK and HEFCE. This was developed by Paul Walk (EDINA) and Sheridan Brown (Key Perspectives). You probably collect this data already, but this is about standardising this. RIOXX doesn’t cover all REF requirements but will cover many of the key areas. It has been a long time getting to this place but now at a place where if we do this, we can really see consistency of tracking research papers across systems in a really coherent way.

RCUK will be releasing some communication in the coming weeks and strongly recommending that all institutional repositories at research organisations in receipt of RCUK funding use RIOXX. We have developed Plug ins and patches to support implementation, plug ins for EPrints RIOXX, also DSpace and CRIS user groups have also started to engage with RIOXX but we need more engagement here.

Now, onto the REF plugin. This has been developed with HEFCE. This will build on the original plugin developed for the 2014 REF. Institutions wishing to use the REF plugin must also install the RIOXX plugin. And we are looking for expressions of interest to trial the EPrints plugin. We are also looking at developing a DSPace plugin. The development team are Tim Miles-Board and Sheridan Brown (Key Perspectives).

Back to Valerie

The next screen after RCUK, moves us to a page where you can capture OpenAIRE compliance…

Balviar again: CORE is a system to aggregate open access content, providing access to content through a set of services. There are about 74 million items from around 666 repositories, 10k journals, and 60 countries. In terms of services there is a service, API and Data dump. And you can access and data mine that service. They are also developing a dashboard for institutions to track data in CORE, how the contest has been harvested, etc. Looking at how we support funder compliance through that dashboard as well. And we are looking at that work through a project called Jisc Monitor…

Back to Valerie

We now have a choice of breakout sessions… We have a sheet

Comment: There has been a lot of discussion on UK CORE about peer to peer repositories, so that may be useful session to add to the list.

Paul Walk: This has been about repositories alerting each other about non-corresponding authors. So discussion around peer to peer repositories. There is a Google Doc on this that can be accessed, I’ll add that under a working title of COAX.

We are now hearing a quick recap of the breakout groups looking at some themes in more detail:

  • SHERPA Services, Azhar Hussain, Jisc
·        
  • Open Access Metadata, Valerie McCutcheon, William Nixon, University of Glasgow
·        
  • Publications Router, Steve Byford, Jisc
·        
  • Profiles for Reporting to Funders (e.g. RCUK, REF, EU), Balviar Notay, Jisc
·        
  • Aggregation Services, Balviar Notay, Jisc, Lucas Anastasiou, The Open University
  • What advice do we want about open access – Helen Blanchett, Jisc Customer Services
  • COAX / Co-operative Open Access eXchange (Peer to Peer Repository information sharing) – Paul Walk, EDINA

We’ll try and take the blog to some good sessions… 

In fact we have loitered in the main room where Valerie and her colleague William Nixon are talking about the drivers for the various add ons and customisations they have made to EPrints. They were doing a lot of this work via spreadsheets and this system represents a major saving of time and improves accuracy.

Valerie and William, in response to questions, are also going through some of the features of their item records in more detail – for instance there is the potential to add multiple transactions for one article, at that sometimes apply. Valerie: we would love a better solution for capturing some of the finance information on open access and welcome your comments and feedback, or if you have a solution to the issue…

We are now moving on to roam the other Repository Fringe breakouts taking pictures and tweeting. Best place to catch summaries/highlights from the breakouts is over on the hashtag #rfringe15. And if you are in or leading a session that you’d like to write up, just leave a comment here and we’d welcome a follow up blog post from you!

Tea and coffee Demo:
DMPonline
 – Jonathan Rans, Digital Curation Centre

Demo:DMAOnline – Hardy Schwamm, Lancaster University

The blog will be in the EPrints session in the main room at Repository Fringe 2015 but there are three sessions taking place for the next half hour, keep an eye on Twitter for more on all of these:

Parallel sessions:

  • EPrints update, Les Carr, University of Southampton
  • Dspace update, Sarah Molloy, QMUL
  • PURE update, Dominic Tate, University of Edinburgh, Appleton Tower, Lecture Theatre 3

EPrints update, Les Carr, University of Southampton

Adam Fields, the Community Manager for EPrints is presenting this session via live video link from South Korea! It is around midnight there so he’s also presenting from the future!

We are starting (after a few snapshots to set the scene) with a chart of the EPrints services team, to give a sense of how many are in the team.

EPrints Services exists to effectively serve the community for expertise and support, initially for the open access agenda, but it is becoming a lot more than that with what is happening with RDM in the sector. We are a not for profit service, and we exist to serve the OA community through Commercial Services (hosting, support, etc), we Lead on EPrints roadmap and releases, and provides funding for the development of the EPrints software.

What I’m going to talk about with the software side of things, the general trend of where we are going and the software side of things. I want to start with the past. The main feature released in EPrints 3.3.14 was a change to the EPrints Bazaar. It had been a bucket of packages but there had been little by way of tagging and properties. So we have added an accolades section, to tag stuff as having particular properties etc. You can filter based on these accolades… These are alphabetically listed, with EPrints Services Recommended appearing at the top to indicate those packages that we have tested and recommended – anyone can contribute to the bazaar.

In EPrints 4 / 3.4.0 has the key philosophy that the “Base” EPrints storing and handling of generic data and objects. And “Layers” to handle specific metadata schema import/export, rendering, search, etc. for specific domains. So this concept is that the database and everything else are two separate aspects. So there would be a layer for publications, but another layer for a data repository. The reason for doing this is to more sustainability develop against the increasingly complex requirements of the sector. So the 3.4 releases will be collections of metadata schemas, renderers, etc. to support this. So here are two (diagrams) illustrating the difference between EPrints for Publications, for Open Data, or for Dataset Showcases… These releases are sort of all the same in abstract… For a Dataset Showcase it’s about showcasing or visualising a dataset, so you need a bespoke metadata schema for your data sets. But in abstract the set up is the same – the metadata schema and the tools you need to describe your data set. And similarly we have an EPrints for Social Media Data, for importing tweets and that has similar abstract shape but with specific functionality to reflect the large sale of the dataset.

Q1) Is this one instance or multiple instance?

A1 – Les) You can have multiple repositories running on one EPrints repository. So the idea is of having a range of repositories but that could be on one installation. But you can connect up repositories and mash things up of course. But the idea is to trim down how many metadata fields and how much data you collect – that you only need to gather the relevant information for the relevant type of item.

A1 – Adam) This approach is about having a repository with key features… We categorise particular components, and these are examples of what that might look like. But anyone can customise EPrints for the combination of features and functionality that they need.

Adam: Now I want to talk a bit about my role. I am here to engage with the community, and to engage with the community, understanding what it is you need and want from EPrints. I hang out a lot in the various community spaces but I have also been working on supported developments – where individual organisations require something specific but need help to do those, for instance a thesis deposit tool. I am creating training videos for those supporting and administrating EPrints repository. We also have community members discussing improvements to the wiki, and I’m expecting lots of progress there from the last 6 to 8 months. I’m also always encouraging everyone to share documentation, or write documentation – everyone in the room here will have knowledge and expertise to share with other EPrints community members. Are there other things we can do to help? [no, not at the moment based on audience response].

One of the side effects of creating videos is that I get feedback and statistics on who is viewing those videos. So, for instance, I put up a video on installing an EPrints repository. It has been viewed several times a week since it went live. But the intriguing thing is the countries it has been viewed from – the top two countries have been Indonesia and India. It has been viewed around 40 times in the UK, but 120 countries in Indonesia, and 20 views in Iraq and Guatemala. That suggests a truly global community, but also means we need to think about how we could bring this community together.

Finally, a quick plug for the EPrints UK User Group meeting, which is on 11th September in Southampton. If you would like to present please do post to the EPrints UK User Group Google Group, or contact Adam directly (af05v@ecs.soton.ac.uk).

Q2) Is there more information on when these versions will be released?

A2 – Les) The release version 3.4 is coming soon and that will take us to the modular stage. But at the moment we are waiting for a developer to get us there. But in terms of getting to EPrints 4.0… Much of what we needed will have been delivered in 3.4. But the whole point of 3.4 is that it will contain the same underlying system but moves us to that modular layer idea.

And on that note we leave Adam to get some much needed rest in Korea, as we turn to our final session of the day… 

Panel Session: Building data networks: exploring trust and interoperability between authors, repositories and journals with:

  • Varsha Khodiyar (VK), Scientific Data(Chair);
  • Neil Chue Hong (NCH), Journal of Open Research Software;
  • Rachael Kotarski (RK), DataCite;
  • Reza Salek (RS), European Bioinformatics Institute;
  • Peter McQuilton (PM), Biosharing, 

Varsha is introducing this session for us: I work for Nature Publishing Group, one of the “evil publishers” and I work as a data curator at Scientific Data, part of the NPG group. An example of the sort of repository we work on is PhysioNet, a very specialist space in which data is shared.

We have a number of requirements for data journals and our criteria include that that they must be (1) recognised within their scientific community (2) long term preservation of datasets (3) implementation of relevant reporting standards (4) allow confidential review of submitted datasets (5) stable identifiers for submitted datasets (6) allow public access to data without unnecessary restrictions. And we have a questionnaire online to help assess repositories of data against these.

Neil Chue Hong – I am Director of the Software Sustainability Institute but for the purpose of this presentation I am also Editor in Chief of the Journal of Open Research Software. So what does a metapaper in this journal look like? Well it describes the software, the license, the potential for reuse, etc. And so a paper as a whole tends to include an introduction, how it came to be, screenshots, implementation, quality control, metadata, reuse, references. In some ways it is a proxy for

For the panel we are trying to do the same things in Software as research data in some ways, but have concerns about preserving code – Google Chrome is shuttering, how do we preserve that?

Rachael Kotarski: We are looking at assigning DOIs to data theses and software, among other things, so that DOIs remain stable even as data develops and changes over time. We are working with 52 organisations across the UK. I also have a role at the British Library around providing collections as data – so enabling researchers to use large scale collections of data. And the Alan Turing Institute is to be physically hosted at the British Library – we aren’t a partner in that project but we are hosting it.

Reza Salek: I am at the European Bioinformatics Institute, the largest freely available data on life sciences and it is available for reuse, and completely open open access terms. The repository at EMBL houses a couple of experiments, we were the first one to provide a repository for sharing data in this way. Historically this community was not as happy to share their data. We learned quite a lot – hope to learn a bit more.

Peter McQuilton: I work at Biosharing.org and we are a web-based, curated and searchable portal where biological standards and databases are registered, linked and discoverable. We have a database registry, a standards registry, and a policies registry. You can also make a collection of your own from a sub set of these collections of materials.

Our mission is to help people make the right choice – for researchers, developers and curators who lack support and guidance on which format or checklist standards to use. We are a small team with collaborators that include NPG, EMBOPress, BioMedCentral, Jisc and others.

Varsha: I have some questions for our panel, but do just jump in…

Q1 – Varsha) How did your community embed your repository?

A1 – Rachael) For us the persistent identifiers are key to enable reuse over time. Specifically for DataCite we have very few requirements: we have five fields that enable you to cite the DOI. There are more fields one would want to actually use the data, but because it is cross subject and format we can’t specify exactly what that should be. The other thing we require is a landing page – a target for the DOI, so that the object can be found and used. You could make a DOI link to, say, your Excel spreadsheet, but it is preferable to have a landing page with more information on how that object can be used. We also expect longevity, but we leave it to community to decide what longevity means for them.

Comment – Paul Walk) I absolutely agree about the necessity for that, but from a machine to machine process that isn’t as much of a priority…

A1 – Rachael) We recognise the importance of M2M interfaces, but we argue that shouldn’t be the default. So from that page you might then have the information there on how to access in an M2M way.

A1 – Varsha) Actually for privacy and sensitive data

Q2) For some of our researchers longevity might mean 50 or 100 or 200 years and longevity can really be about preservation in the long term…

A2 – Peter M) And that is about format of course, having the technology to read that data is its own challenge.

A2 – Paul Walk) I was at something at British Library talking about longevity in terms of generation, and that seemed like a useful approach.

Comment – Rachel Bruce) That came from the National Science Foundation work

Q2) It is also about funding around preservation.

A2 – Reza S) The scale of data and data sets is also changing really quickly. But even recent data sets are effectively archived. But it is so hard to know what the technology will be, what science will evolve into…Is there a solution or approach that works here?

Comment) An astronomical image 10 years back versus now allows you to see what has changed, an archeological site you probably dig up once… You can’t re create that data… But then we can’t keep everything!

A2 – Neil CH) I think sometimes the data can be recaptured, sometimes we only have one shot. But in many cases it is interesting that preserve the data. Is it for reuse and sharing? Or is it for checking and comparison? Those two approaches have very different timelines and requirements associated with them. It is not always the data that needs preserving.

Comment) Surely the whole point is that we cannot predict what others might want to do with our data…?

A2 – Varsha) Sure, the historical ships logs being used in climate change are a great example.

A2 – Neil CH) Interestingly those ships logs can be used as our means of expression haven’t been changing that much. But in software we are used to moving on… And that is much harder to go back to. If we forget how to read a PDF file, that would be a disaster… But we have a lot of examples. We have to be careful not to support niche standards if we are talking about long term preservation

Comment) Do we know of data that has been well preserved but the means to read them has been lost?

Comment) I have word perfect files on my computer!

Varsha) A fellow researcher had a similar issue around use of Floppy Discs, which nowhere in his university was there any way to read those…

Comment – Kevin Ashley) The issue is also about what is worth doing… You can read a floppy disc but you have to want it enough to be worth a high level of expense.

Varsha) Do you have researchers depositing data? What are the issues there about deposit and reuse?

Les) A lot of my research is about social media and existing data, and there the existance and readibility isn’t the issue, but making some sort of collection of it, away from the wild west of the web, but curating as a selection within the University, we create all sorts of ethical and legal problems. That is the issue when we are gathering data from lots of people are interacting. The deposit mechanism isn’t the issue, it’s convincing people that is the right thing to do, the processes around thinking that through, the data for access, for anonymity…

Neil CH) In my experience as a part time researcher and we have been creating data sets. Because I give talks on license and data policy for RCUK I feel I should be able to do all those right things with my own data… So for me, asking this room, why is it so difficult to do the right thing here? I put my data in a data repositories, share my colleagues names, also my asset register… But I can’t just give that my DOI so that all of those details get imported in. This is where trust breaks down. If I can’t be bothered to add all authors, and it’s just me, then I’ve broken the compliance. I use PURE and if I have a copy in RoMEO that can be one click, and I love that. But everything else should be easy too.

Comment) Rather than have a go at publishers, lets have a go at museums! I am a palaeontologist. I have a great 3D scan of bones I am researching… I’d love to share that with the world but if I did I would be in trouble as the museum believes all images created there are their property. It is a political issue though. If collections management and commercial arms of the museum can be talked to, you are fine… Unless there is deemed to be a potential commercial application/use of that scan.

Rachael K) There is one library allowing photographs of their material, if in appropriate copyright state, and those are shared on Flickr. But the people taking pictures have to understand what they can take pictures of… Digitisation is expensive. Phone cameras in a reading room isn’t great, but

Comment) My feeling is that the copyright on a 110 million year old bone should have expired!

Neil CH) We are looking to work on a project at the Natural History Museum where some of the same issues arise – about who owns copyright of derivative products in that way, for educational use. It may be that educational use may be a way to do that in future, but still too early days yet.

Comment) In Germany they have the view that if they make the scan of their own materials, they hold data, but others scanning it can do as they wish.

Neil CH) I think in Australia they have also had some quite forward thinking examples there.

Varsha) We have drifted from repositories a little… But in our last few minutes what are the best ways to support our communities around repositories? How can we say that a repository is trustworthy?

Comment) I think for me that issue of trust being part of how easy it is to deposit, is important. The issue I find is that it is also hard to find and discover data…

Peter M) That is changing though… In biology that is improving, asit is known that it is important that data is discoverable.

Comment) Perhaps rather than prescribed repositories or journals, there is a peer review process. When you say that it is peer reviewed, does that include the data?

Varsha) Yes, that includes the data and that it is shared in the right repository. We make sure that we can access the data, download files, etc. before we will publish. We only publish if that is appropriate.

Neil CH) We do similar. We have a list of repositories and documentation that helps ensure that data is accessible. And of having identifiers, and some sort of plan for managing that software. I actually kicked off a debate, inadvertantly, that this is an expensive checking process and it is at the wrong end of the cycle… So there is an arguement that you should pre-register before you generate data, and that that should be signed off at the end. An interesting idea for having peer review atthe outset, not after generation of data.

Reza S) These are good questions. It can take a long time to go through that process. Repositories are usualy at the end of the process, and there are issues there… It takes time… But that is culture change. For every year working on data you should expect maybe 3 days of curation work before depositing, in my experience.

Varsha) And on that note, thank you to all of our panel and for all your excellent questions.

Dominic Tate is announcing our drinks reception, remember whilst you are out there to vote on your favourite poster!

And, with that, the blog is done for the day. Remember to pass on comments, corrections, etc. and we will be back tomorrow for Day Two of Repository Fringe! 

Tagged with: , , , , , ,
Posted in LiveBlog

Preview: Repositories for OA, RDM and beyond

With just a few days to go until we see you all in Edinburgh we are delighted to bring you this guest post and podcast from Frank Steiner and Rory McNicholl both from ULCC, one of our lovely sponsors this year. 

In the run up to this year’s Repository Fringe event I sat down with Rory McNicholl, Lead Developer at ULCC to find out more about the event and his talk “Repositories for OA, RDM and beyond”.

Hopefully this quick whistle-stop tour of Rory’s repository experience, some of the projects he and the team have worked on and a preview of things to come makes for a nice little for his talk on Monday morning.

We look forward seeing you in Edinburgh next week.

Frank Steiner, Marketing Manager, ULCC

Tagged with: , , , , , , , , ,
Posted in guest posts, programme

Still time to register for Repository Fringe 2015

We’ve already had a fantastic response for Repository Fringe 2015 but there is still time to register free!

Head on over to our EventBrite page to secure your place at this year’s edition of the original repository unconference.

Whether you have registered for this year’s event or not we’d encourage you to share your repository wisdom on our hashtag #rfringe15, which is also a great way to meet your fellow participants ahead of August. If you haven’t used Twitter before/for a while or are keen to get involved in Repository Fringe through blogging, sharing images, etc. then you might find our beginners guide to social media useful too – we wrote it for our 2013 event but we welcome discussion and participation in any space, including those that have emerged since that post, just use the #rfringe15 tag on your images/updates when possible!

We will be sharing our updated programme soon so do keep an eye out for that. In the meantime we look forward to seeing you all in Edinburgh in a few weeks time!

– Nicola on behalf of the Repository Fringe organising Team

Posted in Uncategorized

Latest Tweets