A Tool for Reading Scientific Text and Interpreting Metadata from the PDF Documents

This article introduces PDFDataExtractor, an open-source, template-based plug-in for ChemDataExtractor that uses spatial layout and rule-based methods to extract logical text blocks and metadata from scientific PDFs. The tool is designed to extract key elements like titles, authors, abstracts, keywords, DOI, and references from scientific documents, facilitating downstream text mining and data-driven materials discovery. Evaluated on multisource chemistry journals, the tool demonstrates high precision and recall, making it a valuable resource for researchers and organizations that need to extract, organize, and analyze large volumes of scientific literature efficiently.

Author(s) :

Miao Zhu, Jacqueline M. Cole

Yes

Get in touch with authors

No ratings yet

Rate this article

Yes

Key topics

Data Science for social impact

Also found in

Share

Join Our Newsletter

Explore More Articles

Paper / Report

Funders Perspectives on Data Maturity of Social Purpose Organisations in India

Funders’ Perspectives on Data Maturity of Social Purpose Organisations in India is an addendum to the State of the Sector Report published by ISDM CDSSI. It explores how funders view data-driven decision-making in India’s social sector. The insights in this report come from a roundtable discussion, that was held with 19 representatives, from CSRs, family foundations, philanthropic investors, outcome-based funders, and thought leaders. It shows that data maturity is a sector-wide challenge that requires collaboration, trust, and alignment between funders, SPOs, governments, and communities.
Case Study

Rooted in Trust, Rising Together

In one of Chennai’s oldest urban settlements, over 3,000 families navigate daily challenges marked by poverty, addiction, and instability. Many children return from school to difficult home environments with limited support. Amidst this, one young woman chose to respond— not with sympathy, but with sustained action. Although Vedika was overjoyed, she was left overwhelmed by the amount of work she was putting in. Fundraising, curriculum, operations, team building, trauma counselling—she was doing everything alone. There was no advisory board. No formal governance structure. Just urgency, heart, and a mounting emotional toll on her.
Event Report

Pune and Mumbai Summary Report: Systems Convening for Philanthropy for Inclusive Development

This report summarises insights from two systems thinking convenings held in Pune and Mumbai in April 2024. Organised by CPID at ISDM, the convenings engaged stakeholders to reflect on challenges and systemic traps in Indian philanthropy, and proposed collaborative, inclusive approaches for sustainable development.
Event Report

Summary Report: Systems Convening for Philanthropy for Inclusive Development.

This report summarises the first systems convening held by CPID at ISDM in September 2023, where funders, SPOs, and PSOs used systems thinking to reflect on philanthropic dilemmas in India. Themes include trust-based philanthropy, flexibility in funding, community participation, and the use of the Iceberg Model to uncover behavioural, institutional, and belief system barriers.
We use essential and analytics cookies to operate this website and understand how visitors interact with it. As this site also functions as a login identity provider (IDP) for other ISDM portals, some cookies are necessary to enable secure authentication. By continuing to use this site, you consent to our use of cookies.