A text mining framework for accelerating the semantic curation of literature

Batista-Navarro, R.; Hammock, J.; Ulate, W.; Ananiadou, S.

doi:/10.1007/978-3-319-43997-6_44

basket (0): add | show

A text mining framework for accelerating the semantic curation of literature

Batista-Navarro, R.; Hammock, J.; Ulate, W.; Ananiadou, S. (2016). A text mining framework for accelerating the semantic curation of literature, in: Fuhr, N. et al. Research and advanced technology for digital libraries. Lecture Notes in Computer Science, 9819: pp. 459-462. https://dx.doi.org/10.1007/978-3-319-43997-6_44

In: Fuhr, N. et al. (Ed.) (2016). Research and advanced technology for digital libraries. Lecture Notes in Computer Science, 9819. Springer International Publishing: Switzerland. ISBN 978-3-319-43996-9. XXV, 476 pp. https://dx.doi.org/10.1007/978-3-319-43997-6, more

In: Lecture Notes in Computer Science. Springer-Verlag: Heidelberg; Berlin. ISSN 0302-9743; e-ISSN 1611-3349, more

Available in	Authors
VLIZ: Non-open access 310525 [ request ]
Document type: Conference paper

Authors		Top
Batista-Navarro, R. Hammock, J. Ulate, W. Ananiadou, S.

Abstract

The Biodiversity Heritage Library is the world’s largest digital library of biodiversity literature. Currently containing almost 40 million pages, the library can be explored with a search interface employing keyword-matching, which unfortunately fails to address issues brought about by ambiguity. Helping alleviate these issues are tools that automatically attach semantic metadata to documents, e.g., biodiversity concept recognisers. However, gold standard, semantically annotated textual corpora are critical for the development of these advanced tools. In the biodiversity domain, such corpora are almost non-existent especially since the construction of semantically annotated resources is typically a time-consuming and laborious process. Aiming to accelerate the development of a corpus of biodiversity documents, we propose a text mining framework that hastens curation through an iterative feedback-loop process of (1) manual annotation, and (2) training and application of statistical concept recognition models. Even after only a few iterations, our curators were observed to have spent less time and effort on annotation.

All data in the Integrated Marine Information System (IMIS) is subject to the VLIZ privacy policy

Top | Authors