Guest post by Cynthia Hudson-Vitale, head, Research Informatics and Publishing, Penn State University Libraries
Artificial intelligence (AI) for data discovery and reuse was the topic of a recent conference sponsored by the National Science Foundation (NSF) and hosted by Carnegie Mellon University (CMU), in cooperation with the Association for Computing Machinery (ACM). Beth Plale, Senior Advisor for Public Access for NSF, set the context: Harnessing the data revolution will require research, educational pathways, and advanced cyberinfrastructure.
Librarians, researchers across disciplines, computer scientists, industry representatives, and technologists came together at CMU to share practices and discuss methods for leveraging machine learning and artificial intelligence for metadata generation, data curation, data discovery, and data integration. Prominent themes of data privacy, data security, and mechanisms to limit algorithmic bias were found through many of the papers.
While many institutions and researchers are exploring or developing AI models to solve complex issues, this conference was unique, both in the variety of perspectives it provided and the intentional focus on data discovery and reuse practices. Notable papers and presentations included:
- Extracting key phrases from texts to aid in discovery
- Creating descriptive tags for images
- Recognizing and transcribe handwriting from digitized assets
- Finding and extracting dataset references from published articles
- Protecting clinical patient privacy
- Developing synthetic control arms for clinical trials
While much research focused on AI, many speakers emphasized human curation and intervention s a required component of workflows for model design and validation.
Keith Webster, Dean of CMU Libraries, summed up the takeaways and themes of the conference as demands for:
- collaboration across disciplines and domains,
- improved mechanisms for data discovery,
- increased incentives for sharing data,
- improved standards for data interoperability and adoption,
- A better understanding and application of ethical guidelines,
- research on the power of data reuse, and
- enhanced tools for AI
Huajin Wang, PhD., Research Liaison, Biology & Computer Science at CMU and Co-PI for the conference said, “I am really excited and touched by the enthusiasm participants shared for moving forward as a unique and diverse community. I look forward to growing the community, and encourage everyone to keep the conversation going and join the mailing list email@example.com”.
For Libraries this conference surfaced a number of opportunities, including:
- Delivering training and education around AI and data science topics
- Providing expertise around metadata and controlled vocabularies
- Acting as facilitators of local communities of practices for AI
- Leveraging AI models to supplement human curation of datasets and enhance the discoverability of library digital assets (including digitized images, text, etc.)
- Supporting and advocating for AI privacy initiatives
Discussions around data privacy and AI reinforced many of the ongoing conversations that libraries are having in protecting student and library patron privacy, including:
Presentation and poster abstracts may be found on the conference website, some of which are published as a F1000Reseach collection. Full papers of selected presentations will be peer-reviewed and published shortly in AIDR ’19 – ACM ICPS.