Difficult to establish can be a source of choice of scoring methods the determinant model works better than discriminant

In this work we have argued that a heuristic method to detect specificity in a set of paralogous proteins can be broken down to several independent components: conservation scoring function, overlap scoring function, the rule to add them together in a combined score, and, last but not least, the underlying model of evolution, specifying which groups are expected to be conserved, and which groups are expected to overlap in the amino acid type choice. This disassembly of a heuristic scoring function enables tracking down the information contributing to the score, and discussing the merits of particular choice of its individual components. Some attention should be devoted to the model of evolution built therein – the siren call of symmetry across functionally divergent branches is a trap we easily fall into. To the contrary, it is easily demonstrable on the examples provided here that, with everything else kept the same, a method awarding determinant behavior may fare better than the one looking for discriminants. Stated plainly, positions of functional importance in one group need not be conserved in the groups of its paralogues. Somewhat more puzzlingly, the linear combination of the scores has a tendency to perform better than the Euclidean one, perhaps Nilotinib stemming simply from the even distribution of scores in the space. Also, one of the outcomes of our investigation is the conclusion that, as intriguing as the assumption might seem, non-conserved, non-overlapping positions do not typically fall into the set of residues determining the functional divergence, and the scores not imposing the conservation as a requirement do not seem to represent a good strategy to accommodate our intuitive expectations on the exchangeability of amino acid types. In our experiments with the scoring functions, we have demonstrated that the scoring functions that involve some degree of exchangeability of amino acid types fare better that the ones that include none. However, the available amount of experimental data does not presently allow us to prove that one way of treating conservation and overlap or including the exchangeability of amino acid types systematically outperforms the rest. Their different ranking in different examples indicates they are all within the noise bracket imposed by the underlying experiment, by the estimate of the average evolutionary behavior, and by the assumption of independent evolution of each site. We merely note that the description we offered in Eqs. 7 and 14 performs stably, and matches our intuitive expectations well. Finally, one may ask, why bother with a heuristic approach which dispenses with the evolutionary tree, if ways for detailed description, including branching events, exist. The answer lies in its robustness, which allows one to deduce the gross features of evolutionary behavior that should be reproduced and bettered in development of a chronological model of evolution of a protein family. At the same time, the very lack of detailed features, in particular, of the order of the branching events leading to the observed set of sequences.

Leave a comment