Skip to main content


ChemMine Database: A Web-Based Infrastructure that Enhances Chemical Genomics Studies in Plants and Other Organisms


Chemi-informatic infrastructure development. A fundamental challenge to high throughput research is the challenge of effectively disseminating information to the community. To facilitate the discovery of bioactive chemicals and disseminate the generated information, researchers in the Center for Plant Cell Biology at the University of California, Riverside (UC Riverside) have developed ChemMine, a free-access database for the mining of chemical compound structures and data from compound screens (; Girke et al. 2005).

This chemical genomics Web portal allows users to query compound collections by chemical property, structure, substructure, superstructure, and biological activity. An integrated drug-informatics workbench provides access to a variety of online analysis tools that facilitate study of chemical compounds identified by screening biological organisms for phenotypes or proteins for changes in structure or activity. The analysis of “hits” uses computational approaches such as structure-based clustering (hierarchical, binning) and predictions of chemical properties and drug-likeness.

Improvement of ChemMine is an ongoing process. Additions include:

  1. A public screening and phenotype database ( that efficiently manages complex screening data of the ChemGen Interdisciplinary Graduate Education and Training Program (IGERT) at UC Riverside.
  2. Addition of the Plant Gene Expression Database (; Horan et al., 2008).
  3. ChemMineR, a framework for mining of biologically active compounds for specific protein targets developed by Eddie Cao, a participant in the UC Riverside ChemGen program sponsored by campus matching funds.

ChemMine is a crucial and versatile tool for UC Riverside’s NSF-ChemGen IGERT program participants. It allows researchers to evaluate compound hits and identify related molecules. It allows students to upload compound screening data including images, movies, textual information, quantitative data, so that they may be easily analyzed and shared with the community. Student data is password protected until formal release but is available to UC Riverside chemical genomicists. This cyber-infrastructure facilitates recognition of bioactive found to be active in diverse screens encompassing an array of biological organisms.

Girke, T, Cheng, L C, Raikhel, N (2005) ChemMine. A compound mining database for chemical genomics. Plant Physiol, 138: 573-577.

Horan, K., Jang C., Bailey-Serres, J., Mittler, R., Shelton, C., Harper, J.F., Zhu, J.-K., Cushman, J.C., Gollery, M. and Girke, T. (2008) Annotating genes of known and unknown function by large-scale co-expression analysis. Plant Physiology. 147 41-57.

Address Goals

The ChemMine database portal provides a critically needed publicly available database for analysis of chemical compounds and their biological activity. The graduate students who participate in the ongoing improvement of ChemMine are informatics but receive hands-on training in biology. This is accomplished by a ten-week rotation with a biologist in which the student performs biological assays on compounds selected from the project ~50,000 compound library on the basis of specific criteria, such as structural similarity to other bioactive molecules or prediction of target proteins. The purpose of this extradisciplinary experience in biology is to increase the level of understanding of the computational scientists of the challenges and resources of biologists.