A Tool for Reading Scientific Text and Interpreting Metadata from the PDF Documents

This article introduces PDFDataExtractor, an open-source, template-based plug-in for ChemDataExtractor that uses spatial layout and rule-based methods to extract logical text blocks and metadata from scientific PDFs. The tool is designed to extract key elements like titles, authors, abstracts, keywords, DOI, and references from scientific documents, facilitating downstream text mining and data-driven materials discovery. Evaluated on multisource chemistry journals, the tool demonstrates high precision and recall, making it a valuable resource for researchers and organizations that need to extract, organize, and analyze large volumes of scientific literature efficiently.

Author(s) :

Miao Zhu, Jacqueline M. Cole

Yes

Get in touch with authors

No ratings yet

Rate this article

Yes

Key topics

Data Science for social impact

Also found in

Share

Join Our Newsletter

Explore More Articles

In this age of AI, India’s Women Are Being Left Behind inSTEM and Skilling

‘In Fact’ is a quarterly newsletter by ISDM DataShakti. ISDM DataShakti, powered by Capgemini, is a pioneering single-window SDG data platform that makes SDG data easily accessible to social sector professionals like you, so you can focus on creating change on the ground.
Blog

An Urgent Call for Digital Literacy

‘In Fact’ is a quarterly newsletter by ISDM DataShakti. ISDM DataShakti, powered by Capgemini, is a pioneering single-window SDG data platform that makes SDG data easily accessible to social sector professionals like you, so you can focus on creating change on the ground.
Blog

Why India needs to start washing its hands more

‘In Fact’ is a quarterly newsletter by ISDM DataShakti. ISDM DataShakti, powered by Capgemini, is a pioneering single-window SDG data platform that makes SDG data easily accessible to social sector professionals like you, so you can focus on creating change on the ground.
Blog

Double trouble: Why India urgently needs policies to address the challenges of bothits youth, and elderly population

‘In Fact’ is a quarterly newsletter by ISDM DataSights. ISDM DataSights is a pioneering single-window SDG data platform that democratises data access for the social sector, developed by the Indian School of Development Management (ISDM), and powered by Capgemini.
We use essential and analytics cookies to operate this website and understand how visitors interact with it. As this site also functions as a login identity provider (IDP) for other ISDM portals, some cookies are necessary to enable secure authentication. By continuing to use this site, you consent to our use of cookies.