CoTri

CoTri: extracting chemical-disease relations with co-reference resolution and common trigger words

Introduction:

The main purpose of this program is to extract chemical-disease relations (CDR) in the Medline (abstracts of biomedical literature).
This program was developed for BioCreative V track 3-CDR (Click here to see data, background, and task goal).

Source code, document, relative tools and relative data download:

Download link(Google drive)

Usage:

There are two main different ways to use our program, and here are some simple description of our method.

Details of our methods are written in document (in “doc” directory) that could be downloaded

Case 1: If you have entire dataset(training and testing) with named entity recognition result(NER result) and you want to evaluate the result of this system (precision/recall/f-score), you should use the 1st method.

Case 2: If you have single abstract or entire dataset without named entity recognition result(NER result) and you want to extract potential chemical-disease relations, you should use the 2nd method.

First method (for case 1):
a. Translate data(train set and test set) from abstract-level into sentence-level
You can use the program (main function) in “AbstractToSentences” class (in “preprocess” package)
b. Evaluate result in sentence-level
You can use the program (main function) in “Demo” class (in “extractCDRfeatures.relation” package)
c. Evaluate result in abstract-level
You can utilize the “BC5CDR_Evaluation” tool (in “BC5CDR_Evaluation” directory)
(More details in “BC5CDR_Evaluation/readme.txt”)

Second method (for case 2):
a. Turn on the Poll version of named entity recognition(NER) tools included “DNorm” and “tmChem”

i. Use command line and turn on “DNorm”
ii. Use command line and turn on “tmChem”

Notice that these two poll versions require Linux OS and about totally 20GB memeroy.
b. Annotate single abstract or entire dataset
The last step, use the program (main function) in “CDRServlet” class (in “org.biocreative.cdr.web” package)

Result:

Test result of the CDR task on BioCreative V track 3

  Precision Recall F-score
Co-occurrence 0.164 0.765 0.271
Our method without co-reference resolution 0.417 0.412 0.414
Our method with partial co-reference resolution* 0.418 0.414 0.416
Our method with co-reference resolution** 0.414 0.401 0.407

*We conducted the co-reference resolution only when the number of sentences in an abstract was less than 15 and the number of words in each sentence was less than 30.

**Spent too much time and failed in submission in several test cases.

Contact:

Main programmer: Ming-Yu Chien
E-mail: imwilly37@iir.csie.ncku.edu.tw

Reference:

  • BioCreative V track 3: a challenge task of automatic extraction of mechanistic and biomarker chemical-disease relations from the biomedical literature
  • DNorm: Disease named entity recognition
  • tmChem: Chemical named entity recognition