Automatic Extension of Gene Ontology with Flexible Identification of Candidate Terms


Gene Ontology (GO) has been manually developed to provide a controlled vocabulary for gene product attributes. It continues to evolve with new concepts that are compiled mostly from existing concepts in a compositional way. If we consider the relatively slow growth rate of GO in the face of the fast accumulation of the biological data, it is much desirable to provide an automatic means for predicting new concepts from the existing ones.

We present a novel method that predicts more detailed concepts by utilizing syntactic relations among the existing concepts. We propose a validation measure for the automatically predicted concepts by matching the concepts to biomedical articles. We also suggest how to find a suitable direction for the extension of a constantly-growing ontology such as GO.

    Extension of GO
         Extended GO from the version of June 2004
    Validation of the extended GO
         Test 1 (evaluation for 55 concepts)
         Baseline test 1 (evaluation for 55 concepts)
         Test 2 (evaluation for 50 concepts)
         Baseline test 2 (evaluation for 50 concepts)


