Tag Archives: Cynthia Hudson-Vitale

Opportunities for Libraries in the AI Ecosystem

Guest post by Cynthia Hudson-Vitale, head, Research Informatics and Publishing, Penn State University Libraries

Artificial intelligence (AI) for data discovery and reuse was the topic of a recent conference sponsored by the National Science Foundation (NSF) and hosted by Carnegie Mellon University (CMU), in cooperation with the Association for Computing Machinery (ACM). Beth Plale, Senior Advisor for Public Access for NSF, set the context: Harnessing the data revolution will require research, educational pathways, and advanced cyberinfrastructure.

Librarians, researchers across disciplines, computer scientists, industry representatives, and technologists came together at CMU to share practices and discuss methods for leveraging machine learning and artificial intelligence for metadata generation, data curation, data discovery, and data integration. Prominent themes of data privacy, data security, and mechanisms to limit algorithmic bias were found through many of the papers.    

While many institutions and researchers are exploring or developing AI models to solve complex issues, this conference was unique, both in the variety of perspectives it provided and the intentional focus on data discovery and reuse practices. Notable papers and presentations included:

  • Extracting key phrases from texts to aid in discovery
  • Creating descriptive tags for images
  • Recognizing and transcribe handwriting from digitized assets
  • Finding and extracting dataset references from published articles
  • Protecting clinical patient privacy  
  • Developing synthetic control arms for clinical trials

While much research focused on AI, many speakers emphasized human curation and intervention s a required component of workflows for model design and validation.

Keith Webster, Dean of CMU Libraries, summed up the takeaways and themes of the conference as demands for:

  • collaboration across disciplines and domains,
  • improved mechanisms for data discovery,
  • increased incentives for sharing data,
  • improved standards for data interoperability and adoption,
  • A better understanding and application of ethical guidelines,
  • research on the power of data reuse, and
  • enhanced tools for AI  

Huajin Wang, PhD., Research Liaison, Biology & Computer Science at CMU and Co-PI for the conference said, “I am really excited and touched by the enthusiasm participants shared for moving forward as a unique and diverse community.  I look forward to growing the community, and encourage everyone to keep the conversation going and join the mailing list aidr-all@lists.andrew.cmu.edu”.

For Libraries this conference surfaced a number of opportunities, including:

  • Delivering training and education around AI and data science topics
  • Providing expertise around metadata and controlled vocabularies
  • Acting as facilitators of local communities of practices for AI
  • Leveraging AI models to supplement human curation of datasets and enhance the discoverability of library digital assets (including digitized images, text, etc.)
  • Supporting and advocating for AI privacy initiatives

Discussions around  data privacy and AI reinforced many of the ongoing conversations that libraries are having in protecting student and library patron privacy, including:

Presentation and poster abstracts may be found on the conference website, some of which are published as a F1000Reseach collection.  Full papers of selected presentations will be peer-reviewed and published shortly in AIDR ’19 – ACM ICPS.

 

Report from AAU-APLU Workshop on Accelerating Access to Research Data

*This is a guest blog post by Mary Lee Kennedy, Executive Director of ARL; Judy RuttenbergProgram Director for Strategic Initiatives; and Cynthia Hudson-Vitale, Head Digital Scholarship and Data Services, Penn State University Libraries*

Over the past two days we participated in the AAU-APLU workshop on Accelerating Access to Research Data, sponsored by the National Science Foundation (NSF). Eighteen of the thirty teams were ARL institutions from Canada and the United States.

This workshop followed directly from the November 2017 AAU-APLU Public Access Working Group Report and Recommendations, and was further informed by the National Academies recommendations in their 2018 consensus report, Open Science by Design: Realizing a Vision for 21st Century Research.  For those who attended the Association meeting, you will remember the update from Alexa McCray, Chair of the National Academies report, and Kacy Redd, Assistant Vice President, Science & Mathematics Education Policy from APLU who staffed the AAU-APLU working group.

This workshop was a pivotal experience at a time in which governmental agencies in the US, Canada, and the EU are focusing on open science, and when many institutions are figuring out how to apply and influence policies, practices and infrastructure. Thirty institutional teams, some of whose members had never worked together before, grappled with the above mentioned report recommendations with a commitment to a set of next steps. Most teams included someone from the research office, IT/high performance or academic computing, and the library, while some included provosts and faculty.

The NSF, NIH, Department of Energy, National Institute on Standards and Technology, the Department of Defense, OSTP, and National Academies actively participated. Alexa McCray and Sarah Nusser (chair of the AAU-APLU Public Access Working Group) set the context upfront: agencies, institutions, and institutional teams including their libraries need to collaboratively design researcher-centered data services and support; RDM is an integral part of good study design; and research data is a valuable institutional asset.

With this context in mind, the teams got to work, with many conversations, and commitments to work together on specific tasks back at their institutions, and to continue to work together as a whole.  I know we all look forward to the workshop report and decisive next steps. In the meantime, please find below a sample of identified priorities and an initial set of next steps for ARL, as well as steps to consider in your institutions.  

Institutional Priorities for Public Access to Research Data

A number of themes emerged when institutions shared their priorities for accelerating public access to research data. A sample of these included:

  • Facilitating low-barrier, seamless support and services for public access to data at the institutional-level through:
    • Establishment of local “one stop services” for research support services, including data management and sharing stakeholder groups to coordinate faculty-centered research data services;
    • Development of training and workshops for public access to data and open science practices, specifically focused on graduate students.
  • Collecting and then mining data management plans (DMP’s) of funded research to:
    • Plan for the deposit and curation of research data;
    • Work with faculty members earlier in the research process to facilitate good data management practices.
    • Identify high value data.
  • Leveraging existing partnerships, cross-institutional collaborations, resources, and tools to extend capabilities for research data services, such as the:

Initial Considerations on Community Next Steps

With our greatest impact being at the intersection between institutional, research and learning community, and public policy communities, ARL will work with:

  • Our colleagues at AAU and APLU, including
    • Articulating a vision, a strategy, and a direction for accelerating public access to data, and
    • Collaborating to scope, and as appropriate participate in, additional workshops for the university and agency communities.
  • Our Advocacy and Public Policy Committee and Research Communications and Collections Working Group to seek ways to influence federal data management policies by representing the needs, capacity, and role of the research library.
  • National agencies, associations, and our ARL Academy (as appropriate) to support the membership in developing open science and open scholarship fluency—particularly as it relates to methods, tools, and data management practices across the institution, and with other research communities.
  • Scholarly and professional societies as potential partners in articulating disciplinary expectations around research data quality, value, and retention.

What can you do as an ARL member?

Please reach out to Mary Lee or Judy to discuss the workshop and its outcomes. ARL member directors James Hilton, Erik Mitchell, and Steve Mandeville-Gamble were also present, along with 15 additional ARL institutions, many of whom included library staff on their teams.

The workshop provided a structured and focused opportunity for institution-based teams to meet and begin to map their assets—technology, policy, people, and other—as well as their challenges. Many institutional teams pledged to continue meeting. If you were not able to attend, you could circulate the agenda to your institutional colleagues (in the research office, in IT, in high performance or academic computing, and other) and encourage discussion along the same lines.

The workshop organizers at AAU and APLU are considering site visits beginning in early 2019 to include institutions that were not able to participate in the workshop. If this advances into a plan, please watch for an announcement of that opportunity.

This was a very engaging workshop, concluding with commitments on concrete deliverables.  It sets an optimistic tone for the path ahead.