Analysis of proteins and genes based solely on their sequences is one of tools employed successfully in the early days of bioinformatics. Conservation of a gene or protein across different species is usually defined as the corresponding sequences being highly similar. Such sequence conservation usually implies a similar function.
In this work, we explore conservation, based on a very weak notion: the existence of a minimal similarity among sequences. We study this question in the context of 18 species, including mammals and insects. We show that even this minimal requirement yields interesting observations relating conservation, function, and the evolutionary history as represented by the tree of life.
The main tools we use are enrichment analysis, based on the hypergeometric distribution, and clustering.
The talk will be self contained.
Joint work with Jonathan Witztum, Erez Persi, David Horn, and Metsada Pasmanik-Chor.