Here’s
the first focused book that puts the full range of cutting-edge biological text
mining techniques and tools at your command. This comprehensive volume
describes the methods of natural language processing (NLP) and their
applications in the biological domain, and spells out in detail the various
lexical, terminological, and ontological resources now at your disposal — and
how best to utilize them.
You
see how terminology management tools like term extraction and term structuring
facilitate effective mining, and learn ways to readily identify biomedical
named entities and abbreviations. The book offers step-by-step guidance to
implement various information extraction methods for biological applications,
from pattern matching and full parsing approaches to sublanguage- and
ontology-driven extraction techniques. It discusses strategies to make the most
of text collections and to use corpora and corpus annotation efficiently in
text mining applications, and also gives you tested guidelines for evaluating
and optimizing text mining systems. Rounding out the volume are techniques for
integrating text mining and data mining efforts to further facilitate
biological analyses.
Both
a critical review of the state of the art and a solution-focused guide packed
with field-tested expertise and advice, this first-of-its-kind work will prove
indispensable whether you’re long experienced with text mining from biomedical
literature or entirely new to the field.
Supplementary Material
Sophia Ananiadou and John McNaught discuss the motivations that led to the creation of this cutting-edge resource.
Introduction
to Text Mining for Biology and Biomedicine‑ Text Mining: Aims, Challenges
and Solutions. Outline of the Book. References.
Levels
of Natural Language Processing for Text Mining ‑ Introduction. The
Lexical Level of Natural Language Processing. The Syntactic Level of Natural
Language Processing. The Semantic Level of Natural Language Processing. Natural
Language System Architecture for Text Mining. Conclusions and Outlook.
References.
Lexical,
Terminological and Ontological Resources For Biological Text Mining ‑
Introduction. Extended Example. Lexical Resources. Terminological Resources.
Ontological Resources. Issues Related to Entity Recognition. Issues Related to
Relation Extraction. Conclusion. References.
Automatic
Terminology Management in Biomedicine ‑ Introduction. Terminological
Resources in Biomedicine. Automatic Terminology Management. Automatic Term
Recognition. Dealing with Term Variation and Ambiguity. Automatic Term
Structuring. Examples of Automatic Term Management Systems. Conclusion.
References.
Abbreviations
in Biomedical Text ‑ Introduction. Identifying Abbreviations. Normalizing
Abbreviations. Defining Abbreviations in Text. Abbreviation Databases.
Conclusions. References.
Named
Entity Recognition ‑ Introduction. Biomedical Named Entities. Issues in
Gene/Protein Name Recognition. Approaches to Gene and Protein Name Recognition.
Discussion. Conclusion. References.
Information
Extraction ‑ Information Extraction: The Task. The Message Understanding
Conferences. Approaches to Information Extraction in Biology. Conclusion.
References.
Corpora
and their Annotation ‑ Introduction. Literature Databases in Biology.
Corpora. Corpus Annotation in Biology. Issues on Manual Annotation. Annotation
Tools. Conclusion.
Evaluation
of Text Mining in Biology ‑ Introduction. Why Evaluate? What to Evaluate?
Current Assessments for Text Mining in Biology. What Next? References.
Integrating
Text Mining with Data Mining ‑ Introduction: Biological Sequence Analysis
and Text Mining. Gene Expression Analysis and Text Mining. Conclusion.
References.
Sophia Ananiadou is deputy director of the National Centre for Text Mining and a reader in text mining at the School of Informatics at the University of Manchester.
John McNaught is
associate director of the National Centre for Text Mining and a lecturer in
informatics at the University of Manchester.