pdb-l: number of proteins in nature and pdb

Kevin Karplus karplus at soe.ucsc.edu
Fri Oct 20 10:06:06 PDT 2006

Jinfeng Zhang asked

> I have a couple of questions about some general knowledge on
> proteins. I did not find the answers by google.  Does anyone know
> approximately how many different proteins are there in nature?  PDB
> now have almost 40,000 entries. Does anyone know how many of them
> are natural proteins? Basically, I want to know what is the fraction
> of the natural proteins with known structures among all natural
> proteins. 

The question is not well posed.  What does "different" mean?  If you
are looking for identical strings of amino acids, you get huge counts
for both PDB and natural proteins, including very minor variations,
parts of pro-proteins that are excised, sequencing errors, gene
prediction errors, ...   Once you try to cluster proteins so that
almost identical proteins are only counted once, you get very
different answers depending on how tightly you do the clustering.

One can make very rough estimates that about 1% of sequenced proteins
have experimentally determined structures, and about 50% of sequenced
proteins have a substantial part similar enough to a solved structure
to make a rough model.  Refining these estimates is probably not worth
the bother, as the numbers change more as a result of dfferences in
estimating them than they do from changes in the databases they are
based on.

Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE)	(Board of Directors & Chair of Education Committee, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.

More information about the pdb-l mailing list