     Things would be very simple if the amino acids in every
chain were numbered in the obvious way, starting with 1.
The problem with numbering started when people wanted to
compare the 'same' proteins from different species.  They
found that there were the following possibilities that
gave rise to differences:

1. More or fewer residues at either end.
2. Extra residues at various places within the chain.
3. Fewer residues at various places within the chain.
4. Different amino acids at the same place.

Now imagine that residue PHE 195 is very important for the
activity of the protein in species A.  But in species B
it is residue PHE 197 and in species C it is PHE 212,
because species B and C are not the same length as species

     Because people felt it was important to preserve the
amino acid numbering for 'important' residues and to be
able to readily discuss and compare the structures from
different species, various people decided to try to number
the proteins from species B, C, etc. to match the numbering
used for species A.  In doing this, one must have gaps
(missing numbers) where a sequence is shorter.  But what
should one do when a sequence is longer?  This is the case
where it is necessary to insert extra numbers and this is
done by using insertion codes.

     Thus the insertion code is an integral part of the
residue number and it is improper to ignore that field when
using a PDB entry.  You must also allow for 'missing' numbers
when using a PDB entry.

     Another issue that has not arisen in this discussion
is that not all residues that are present in the material
being studied may be findable in the experiment.  The amino
acid sequence on SEQRES is defined as being the sequence
of the material being studied.  The amino residues that have
coordinates on ATOM records may, therefore, be fewer than
the total on SEQRES.

     Please let me know if you need more explanation.

