CAPRI- Content-based Analysis of Protein Structure for Retrieval and Indexing

Download
  1. (PDF, 380 KB)
AuthorSearch for: ; Search for:
TypeArticle
ConferenceThe 2nd Workshop on Data Mining in Bioinformatics (DMB 2007), September 23, 2007., Vienna, Austria
AbstractIn molecular biology, current research suggests that the function of a protein may be inferred from its structure. Two proteins with similar local parts (or active sites) and shape are often closely related. This observation is of importance when determining the adverse effects of new medicine, identifying new protein architectures, predicting protein interactions such as the docking problem (where the so-called receptor connects to the ligand) and explaining unexpected evolutions. Due to the vast amounts of newly discovered protein structures, there is an urgent need for multimedia data mining systems which can efficiently find similar proteins structures, based on both shape and physical properties. In this paper, we describe the Content-based Analysis of Protein Structure for Retrieval and Indexing (CAPRI) data mining system, which is used to explore very large multimedia databases containing numerous protein structure families. CAPRI is able to find similar proteins based on their structure, by utilizing firstly, the 2D colours, textures and composition and secondly, the 3D structure of the proteins. Our results against more than 26,000 protein structures as contained in the Protein Data Bank shows that our system is able to accurately and efficiently locate related protein structures. Through the use of the CAPRI system, domain experts are able to find these similar protein structures, using a “query by prototype” example. In this way, they are aided in the task of labelling new structures effectively, finding the families of existing proteins, identifying mutations and explaining unexpected evolutions.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number49841
NPARC number8913303
Export citationExport as RIS
Report a correctionReport a correction
Record identifier0d493998-977e-47bd-a783-3f550ec7252b
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)