ProSequence Assisted Protein - DataBase, ProSeqAProDB is a comprehensive, non-redundant and manually curated dataset of proteins which are translated in a pre-pro-form. The “pre” represents a stretch of amino acids termed as prosequence, which is often cleaved off during protein maturation process, often via an autocatalytic process. Prosequences are considered to act as Intra-Molecular Chaperones (IMC), helping its cognate protein to achieve its native structure. These sequences were earlier thought to be a feature specific to protease like proteins. However, accumulation of experimental data over the last two decades is making it evident that the prosequence containing proteins are neither rare, nor unique. Prosequence containing proteins appear to be widely distributed among all possible forms of life, prokaryotes (archaea and bacteria), eukaryotes and viruses. Such proteins are involved in multitudes of functions, participating in variety of cellular and biological processes.
So far, there has been no comprehensive or systematic study to understand the mechanism of action of these prosequences in driving the folding of their cognate proteins. Their additional roles, i.e, in protein transport, inhibition of cognate protein etc. also remain elusive. ProSeqAProDB aims to collate the information about such proteins at a common platform, followed by annotation of this data to gather further insights into the role of prosequences in biological systems. Presently, an overview of the distribution of such proteins across taxonomy, cellular locations and participation in various processes is added, to emphasize the prevalence of such proteins in biological systems.