A Tool for Reading Scientific Text and Interpreting Metadata from the PDF Documents

This article introduces PDFDataExtractor, an open-source, template-based plug-in for ChemDataExtractor that uses spatial layout and rule-based methods to extract logical text blocks and metadata from scientific PDFs. The tool is designed to extract key elements like titles, authors, abstracts, keywords, DOI, and references from scientific documents, facilitating downstream text mining and data-driven materials discovery. Evaluated on multisource chemistry journals, the tool demonstrates high precision and recall, making it a valuable resource for researchers and organizations that need to extract, organize, and analyze large volumes of scientific literature efficiently.

Author(s) :

Miao Zhu, Jacqueline M. Cole

Yes

Get in touch with authors

No ratings yet

Rate this article

Yes

Key topics

Data Science for social impact

Also found in

Share

Join Our Newsletter

Explore More Articles

Case Study

Rooted in Trust, Rising Together

In one of Chennai’s oldest urban settlements, over 3,000 families navigate daily challenges marked by poverty, addiction, and instability. Many children return from school to difficult home environments with limited support. Amidst this, one young woman chose to respond— not with sympathy, but with sustained action. Although Vedika was overjoyed, she was left overwhelmed by the amount of work she was putting in. Fundraising, curriculum, operations, team building, trauma counselling—she was doing everything alone. There was no advisory board. No formal governance structure. Just urgency, heart, and a mounting emotional toll on her.
Event Report

Pune and Mumbai Summary Report: Systems Convening for Philanthropy for Inclusive Development

This report summarises insights from two systems thinking convenings held in Pune and Mumbai in April 2024. Organised by CPID at ISDM, the convenings engaged stakeholders to reflect on challenges and systemic traps in Indian philanthropy, and proposed collaborative, inclusive approaches for sustainable development.
Event Report

Summary Report: Systems Convening for Philanthropy for Inclusive Development.

This report summarises the first systems convening held by CPID at ISDM in September 2023, where funders, SPOs, and PSOs used systems thinking to reflect on philanthropic dilemmas in India. Themes include trust-based philanthropy, flexibility in funding, community participation, and the use of the Iceberg Model to uncover behavioural, institutional, and belief system barriers.
Blog

RBPM- The Missing Piece- Why RBPM is Essential for Social Change

Social change initiatives in India are driven by a collective desire to make a positive difference. Billions of dollars are poured into programs aimed at tackling complex issues like poverty, education, and healthcare. But often, the impact of these programs is difficult to measure, leading to a nagging question: are we truly achieving lasting change?This is where Results-Based Program Management (RBPM) steps in as the missing piece. RBPM is a structured approach that equips program managers with the skills and…
We use essential and analytics cookies to operate this website and understand how visitors interact with it. As this site also functions as a login identity provider (IDP) for other ISDM portals, some cookies are necessary to enable secure authentication. By continuing to use this site, you consent to our use of cookies.