Designing Open Science in a Decentralized World

*This is a guest blog post by Judy Ruttenberg, ARL Program Director for Strategic Initiatives*

In my past and current roles as a program officer in a regional library consortium, and now at the Association of Research Libraries, I’ve had the privilege of visiting many libraries. I have observed that often, while explaining a (usually challenging) aspect of local culture or practice, librarians at research-intensive universities both public and private will characterize their campus as “highly decentralized.” The new consensus report from the National Academies of Sciences, Engineering, and Medicine, “Open Science by Design: Realizing a Vision for 21st Century Research,” recognizes that because institutions and the entire research enterprise is highly decentralized, so too must the stewardship of research assets be coordinated across key stakeholders. If we, the stewardship community, get this coordination right, researchers will be able to practice and reap the benefits of open science with the confidence that their scholarly contributions will be supported, rewarded, and discoverable in the future.

“Open Science by Design” provides a high-level roadmap for the stewardship community in which research libraries embody a unique combination of mission and professional expertise. Research libraries provide enduring and barrier-free access to knowledge for current and future generations. The open science/open scholarship movement has expanded the research community’s definition of knowledge assets worthy of curation for long-term use to include software, data, code, and more. By practicing scholarship openly, researchers not only create knowledge assets across the lifecycle—from hypothesis and study design to data collection and narrative publication—they also generate a digital paper trail that contributes to our collective understanding of research dynamics and workflow. Given the report’s observation that “commercial publishers have undertaken significant horizontal and vertical integration in recent years, … acquiring important pieces of the scholarly communications infrastructure, such as preprint servers, institutional repositories, and expanding data archiving, and analytics services associated with their journals,” (p 118) decentralization is actually a strength against consolidation and enclosure of that workflow.

If research institutions and funders embrace the NAS recommendations to encourage, support, and reward openness across the scholarly workflow, librarians can contribute both information science and archival expertise early and often throughout that workflow, as well as preserve research environments to enable the study of science itself. For example, the preprint communities hosted by the Center for Open Science’s OSF Preprints now have an integrated annotation layer in hypothes.is. Librarians are embedded in the leadership of many of these preprint services—and in many research projects themselves—and can advise on the stewardship aspects of peer review as the hypothes.is service is implemented. Librarians will also continue to work in long-standing coalitions to influence the information policy environment to support openness. This fall, ARL will produce a Code of Best Practices in Fair Use for Software Preservation, funded by the Alfred P. Sloan Foundation, to ensure that the subjects, products, and tools of scholarship will continue to be accessible despite evolving technology.

Decentralization, often presented as a barrier to coordination, is an advantage and a goal in the context of open scholarship, provided that the stakeholder community adheres to the report’s implementation principles of interoperability, including:

  • Researchers choosing open repositories for their preprints, publications, and data
  • Research funders ensuring that research products are available in repositories that allow for bulk transfer of digital objects
  • Requirement of unique, persistent identifiers for digital objects identified for long-term preservation
  • Greater attention and investment in metadata schemas for improved discovery
  • Participation of professional societies and research funders in the networking and federation of existing repositories for improved discovery.

The SHARE project team has worked with many different types of open repositories (data, institutional repositories, preprints, grant databases, etc.) on all of these issues, and the implementation of improved metadata schemas, persistent identifiers, and methods of bulk transfer are complex. What we’ve learned is that more decentralization, not less, is the answer. Rather than continuing the centralized harvest of metadata from many sources, the SHARE technical team is now developing easy to use tools for institutions and repositories to write their own harvesters to push out their metadata to the network and develop local frameworks for hosting the data they exchange with others. The Data Curation Network, also funded by the Sloan Foundation, is leveraging the decentralization of expertise across more than ten institutions to improve the treatment, discoverability, and use of data.

Research libraries will be critical partners within their institutions and within the research enterprise in the implementation of NAS’s open science principles, standards, and business arrangements. ARL looks forward to continuing existing partnerships and developing new ones to support Open Science by Design.