What is CATH?
Cath is a database of manually curated protein domain structures. It’s a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in this hierarchy:
-
Class – structures are classified according to their secondary structure composition (mostly alpha, mostly beta, mixed alpha/beta or few secondary structures).
-
Architecture – structures are classified according to their overall shape as determined by the orientations of the secondary structures in 3D space but ignores the connectivity between them.
-
Topology (fold family) – structures are grouped into fold groups at this level depending on both the overall shape and connectivity of the secondary structures.
-
Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous.
Cath Domain
structure Domains are regions of contiguous polypeptide chain that have been described as compact, local, and semi-independent units. Within a protein, domains can be anything from independant globular units joined only by a flexible length of polypeptide chain, to units which have a very extensive interface. There are a number of algorithms that have been developed to detect domains automatically, some of which have been incorporated into the CATH update protocol. Many domains, however, still …