{"refrec":{"BRefID":354115,"RR":"<b>Paragkamian, S.; Sarafidou, G.; Mavraki, D.; Pavloudi, C.; Beja, J.; Eliezer, M.; Lipizer, M.; Boicenco, L.; Vandepitte, L.; Perez Perez, R.; Zafeiropoulos, H.; Arvanitidis, C.; Pafilis, E.; Gerovasileiou, V.</b> (2022). Automating the curation process of historical literature on marine biodiversity using text mining: The DECO Workflow. <i>Front. Mar. Sci. 9</i>: 940844. <a href=\"https://dx.doi.org/10.3389/fmars.2022.940844\" target=\"_blank\">https://dx.doi.org/10.3389/fmars.2022.940844</a>","BEntID":351828,"PublicFlag":1,"CheckedFlag":0,"wosflag":1,"vabbflag":0,"RefStringPartII":". <i>Front. Mar. Sci. 9</i>: 940844. <a href=\"https://dx.doi.org/10.3389/fmars.2022.940844\" target=\"_blank\">https://dx.doi.org/10.3389/fmars.2022.940844</a>","DocTypID":8,"DocType":"Journal article","MarineFlag":1,"FreshFlag":0,"BrackishFlag":0,"TerrestrialFlag":0,"Authorstring":"Paragkamian, S.; Sarafidou, G.; Mavraki, D.; Pavloudi, C.; Beja, J.; Eliezer, M.; Lipizer, M.; Boicenco, L.; Vandepitte, L.; Perez Perez, R.; Zafeiropoulos, H.; Arvanitidis, C.; Pafilis, E.; Gerovasileiou, V.","OrigTitleTranslFlag":0,"Authorstringtrunc":"Paragkamian, S. <i>et al.</i>","Englishabstract":"Historical biodiversity documents comprise an important link to the long-term data life cycle and provide useful insights on several aspects of biodiversity research and management. However, because of their historical context, they present specific challenges, primarily time- and effort-consuming in data curation. The data rescue process requires a multidisciplinary effort involving four tasks: (a) Document digitisation (b) Transcription, which involves text recognition and correction, and (c) Information Extraction, which is performed using text mining tools and involves the entity identification, their normalisation and their co-mentions in text. Finally, the extracted data go through (d) Publication to a data repository in a standardised format. Each of these tasks requires a dedicated multistep methodology with standards and procedures. During the past 8 years, Information Extraction (IE) tools have undergone remarkable advances, which created a landscape of various tools with distinct capabilities specific to biodiversity data. These tools recognise entities in text such as taxon names, localities, phenotypic traits and thus automate, accelerate and facilitate the curation process. Furthermore, they assist the normalisation and mapping of entities to specific identifiers. This work focuses on the IE step (c) from the marine historical biodiversity data perspective. It orchestrates IE tools and provides the curators with a unified view of the methodology; as a result the documentation of the strengths, limitations and dependencies of several tools was drafted. Additionally, the classification of tools into Graphical User Interface (web and standalone) applications and Command Line Interface ones enables the data curators to select the most suitable tool for their needs, according to their specific features. In addition, the high volume of already digitised marine documents that await curation is amassed and a demonstration of the methodology, with a new scalable, extendable and containerised tool, “DECO” (bioDivErsity data Curation programming wOrkflow) is presented. DECO’s usage will provide a solid basis for future curation initiatives and an augmented degree of reliability towards high value data products that allow for the connection between the past and the present, in marine biodiversity research.","AbstractOtherLang":null,"BibLvlCode":"AS","StandardTitle":"Automating the curation process of historical literature on marine biodiversity using text mining: The DECO Workflow","OrigTitleLangCode":"en","OrigTitleLangCodeExtended":"eng","OrigTitleLangID":15,"DateLastModified":{"date":"2026-04-18 01:32:30.100490","timezone_type":1,"timezone":"+02:00"},"UserAccessRight":null,"UserAccID":null,"AuthorKeywords":"marine historical ecology, marine biodiversity data rescue, data archaeology, data curation, text mining, information extraction, scientific workflow, software containers","OtherDescriptors":null,"Notes":null,"AnaPub":2022,"MonPub":null,"DateUpdate":"2022-11-18","DateCreate":"2022-07-25","SecASFANote":null,"ConfID":null,"PeerRev":1,"VlizCoreFlag":1,"WoScode":"WOS:000837216000001","VABBcode":null,"OpenAcc":1,"DOI":"10.3389/fmars.2022.940844"},"refs":null,"anarec":{"AnaID":354115,"PubliDate":2022,"Pagination":"940844","XtraPublOfAnaID":null,"ISBN":null,"Volume":"9","Issue":null,"BRefMon":null,"BRefMonRR":null,"BRefXtra":null,"BRefXtraRR":null,"SerBRefID":233602,"SerRR":"Frontiers in Marine Science. Frontiers Media: Lausanne.  ISSN 2296-7745; e-ISSN 2296-7745","StandardTitleSer":"Frontiers in Marine Science","ISSN":"2296-7745","AbbrevSer":"Front. Mar. Sci.","StandardTitleMon":null,"StartPage":null,"Pages":null,"ToPubliDate":null,"BRefBibLvlCode":"S","SerNotes":null},"monrec":null,"serrec":null,"relations":null,"relationsRev":null,"addrec":null,"othpubs":null,"ownerships":null,"authors":[{"AutName":"Paragkamian","Firstname":"Savvas","Initials":"S.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":471461,"OrderNr":1,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Sarafidou","Firstname":"Georgia","Initials":"G.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":497757,"OrderNr":2,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Mavraki","Firstname":"Dimitra","Initials":"D.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":234159,"OrderNr":3,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Pavloudi","Firstname":"Christina","Initials":"C.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":201016,"OrderNr":4,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":"0000-0001-5106-6067","PersID":30471,"InsID":null},{"AutName":"Beja","Firstname":"Joana","Initials":"J.","Affiliation":"Flanders Marine Institute (VLIZ)","Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":405722,"OrderNr":5,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":"VLIZ","InsFSN":"Vlaams Instituut voor de Zee","ORCID":"0000-0002-5196-8447","PersID":38046,"InsID":36},{"AutName":"Eliezer","Firstname":"Menashè","Initials":"M.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":244211,"OrderNr":6,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Lipizer","Firstname":"Marina","Initials":"M.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":11259,"OrderNr":7,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Boicenco","Firstname":"Laura","Initials":"L.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":234156,"OrderNr":8,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Vandepitte","Firstname":"Leen","Initials":"L.","Affiliation":"Flanders Marine Institute (VLIZ)","Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":145843,"OrderNr":9,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":"VLIZ","InsFSN":"Vlaams Instituut voor de Zee","ORCID":"0000-0002-8160-7941","PersID":7528,"InsID":36},{"AutName":"Perez Perez","Firstname":"Ruben","Initials":"R.","Affiliation":"Flanders Marine Institute (VLIZ)","Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":405721,"OrderNr":10,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":"VLIZ","InsFSN":"Vlaams Instituut voor de Zee","ORCID":"0000-0003-0974-3401","PersID":36274,"InsID":36},{"AutName":"Zafeiropoulos","Firstname":"Haris","Initials":"H.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":471457,"OrderNr":11,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Arvanitidis","Firstname":"Christos","Initials":"C.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":348215,"OrderNr":12,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Pafilis","Firstname":"Evangelos","Initials":"E.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":234152,"OrderNr":13,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":null,"PersID":null,"InsID":null},{"AutName":"Gerovasileiou","Firstname":"Vasilis","Initials":"V.","Affiliation":null,"Discriminator":null,"CorporateFlag":0,"BEntID":351828,"AutID":237292,"OrderNr":14,"DegrID":null,"EditorFlag":0,"CorrespFlag":0,"IllustratorFlag":0,"ReviserFlag":0,"TranslatorFlag":0,"InsAcronym":null,"InsFSN":null,"ORCID":"0000-0002-9143-7480","PersID":32909,"InsID":null}],"mapdetails":null,"datasets":null,"monographs":null,"monparts":null,"serparts":null,"BEntOpen":null,"BEntPrivate":null,"availability":[{"BInstID":379916,"LibID":36,"BRefID":354115,"EmbargoDate":null,"FullEmbargoDate":null,"PhysMedID":16,"hasOCRd":1,"ShelfLocCode":"379916","RFID":null,"PaidValue":null,"Medium":"Server","Description":"VLIZ Open Access","Acronym":"VLIZ","Library":"Vlaams Instituut voor de Zee","DutchTerm":"Open access","URL":null,"ClassifID":53,"Classification":"Open access","ReqLink":null,"ClassifTypID":1,"URLLocation":"https://www.vliz.be/imisdocs/publications/","SubDir":null,"InternalReq":0,"LoggedInReq":0,"Disclaimer":null,"DutchDisclaimer":null,"FileFormat":".pdf","FileDescr":"pdf","InsPub":1,"InsID":36,"FileFormID":6,"LendableFlag":1,"PublicFlag":1,"orderLib":"A","Notes":null,"AccConID":null,"AccessConstraint":null,"LicURL":null}],"litstyles":null,"thespers":null,"arch2discl":null,"SERpubls":[{"PublName":"Frontiers Media","City":"Lausanne"}],"MONpubls":null,"pictures":[],"thestermsPath":null,"thestermsASFA":null,"taxtermsASFA":null,"geotermsASFA":null,"collections":[{"Collection":"OMA - Open Marien Archief","ShortName":"OMA"},{"Collection":"VLIZ Acknowledged Publications","ShortName":"VLIZ ackn"}],"conf":null,"proj":null,"Physdatasets":null,"spcols":{"222":{"SpName":"BMB - Belgische Mariene Bibliografie","SpColID":222,"ParSpColID":null,"TopParID":null,"ShortName":"BMB","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"BMB"},"918":{"SpName":"EMODnet acknowledged","SpColID":918,"ParSpColID":39,"TopParID":39,"ShortName":"EMODnet ack","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/EMODnet ack"},"980":{"SpName":"EMODnet Biology acknowledged","SpColID":980,"ParSpColID":918,"TopParID":39,"ShortName":"EMODnet Biology ackn","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/EMODnet ack/EMODnet Biology ackn"},"992":{"SpName":"FRIS (VLIZ) - Flanders Research Information Space","SpColID":992,"ParSpColID":null,"TopParID":null,"ShortName":"FRIS","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"FRIS"},"923":{"SpName":"Lifewatch acknowledged","SpColID":923,"ParSpColID":null,"TopParID":null,"ShortName":"Lifewatch ackn","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"Lifewatch ackn"},"941":{"SpName":"LifeWatch Species Information Backbone","SpColID":941,"ParSpColID":39,"TopParID":39,"ShortName":"LifeWatch Species Information Backbone","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/LifeWatch Species Information Backbone"},"793":{"SpName":"Marine Regions acknowledged","SpColID":793,"ParSpColID":941,"TopParID":39,"ShortName":"Marine Regions ackn","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/LifeWatch Species Information Backbone/Marine Regions ackn"},"880":{"SpName":"OBIS-cited publications","SpColID":880,"ParSpColID":null,"TopParID":null,"ShortName":"OBIS-cited publications","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"OBIS-cited publications"},"221":{"SpName":"OMA - Open Marien Archief","SpColID":221,"ParSpColID":null,"TopParID":null,"ShortName":"OMA","URLLocation":null,"LibID":36,"OpenRepoFlag":1,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"OMA"},"1064":{"SpName":"Peer reviewed VLIZ publications","SpColID":1064,"ParSpColID":null,"TopParID":null,"ShortName":"PR VLIZ pubs","URLLocation":null,"LibID":null,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"PR VLIZ pubs"},"39":{"SpName":"VLIZ Acknowledged Publications","SpColID":39,"ParSpColID":null,"TopParID":null,"ShortName":"VLIZ ackn","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"VLIZ ackn"},"507":{"SpName":"World Register of Marine Species","SpColID":507,"ParSpColID":null,"TopParID":null,"ShortName":"WoRMS website","URLLocation":null,"LibID":null,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":null,"SpColPath":"WoRMS website"},"915":{"SpName":"World Register of Marine Species (WoRMS) acknowledged","SpColID":915,"ParSpColID":941,"TopParID":39,"ShortName":"WoRMS ackn","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/LifeWatch Species Information Backbone/WoRMS ackn"},"947":{"SpName":"WoRMS ackn - direct reference","SpColID":947,"ParSpColID":915,"TopParID":39,"ShortName":"WoRMS ackn - direct","URLLocation":null,"LibID":36,"OpenRepoFlag":null,"SpTypID":null,"TopParIDNotWebsite":39,"SpColPath":"VLIZ ackn/LifeWatch Species Information Backbone/WoRMS ackn/WoRMS ackn - direct"}},"doi":null,"publs":null,"serparttypes":null,"monauthors":null,"MParts":null,"SParts":null,"hLibs":null,"langs":[{"BEntID":351828,"AbstractFlag":0,"LangID":15,"LangCode":"en","Lang":"English","DutchTerm":"Engels","LangCodeExtended":"eng"},{"BEntID":351828,"AbstractFlag":1,"LangID":15,"LangCode":"en","Lang":"English","DutchTerm":"Engels","LangCodeExtended":"eng"}],"urls":[{"URL":"https://dx.doi.org/10.3389/fmars.2022.940844","externalID":"10.3389/fmars.2022.940844","URLTypeCode":"DOI","URLID":104689,"URLTypID":13,"URLType":"DOI","URLPrefix":"http://dx.doi.org/"}],"thesterms":null,"taxterms":null,"geoterms":null,"othterms":null,"asfacodes":null,"asfa2codes":null,"thestermsFRIS":null,"taxtermsFRIS":null,"geotermsFRIS":null,"othtermsFRIS":null,"resmessage":"","complete":1,"sessions":{"newSesName":"Chisala, Chilekwa, C.","newSesDate":{"date":"2022-07-25 13:35:40.433000","timezone_type":3,"timezone":"Europe/Brussels"},"updSesName":"Lust, Heike, H.","updSesDate":{"date":"2022-11-18 15:06:31.297000","timezone_type":3,"timezone":"Europe/Brussels"}}}
