Abstract
In order to identify and to characterise gene clusters conserved in microbial genomes, the algorithm AMIGOS was developed. It is based on a categorisation of genes using a predefined set of gene functions (GFs). After the categorisation of all genes of a genome and based on their location on a replicon, distances between GFs were determined and stored in genome-specific matrices. These matrices ...
Abstract
In order to identify and to characterise gene clusters conserved in microbial genomes, the algorithm AMIGOS was developed. It is based on a categorisation of genes using a predefined set of gene functions (GFs). After the categorisation of all genes of a genome and based on their location on a replicon, distances between GFs were determined and stored in genome-specific matrices. These matrices were used to identify GF clusters like those strictly conserved in 13 archaeal, in 47 bacterial genomes and in the combination of the sets. Within the combined set of these 60 microbial genomes, there exist only two strictly conserved clusters harbouring two ribosomal genes each, namely those for L4, L23 and L22, L29. In order to characterise less strictly conserved GF clusters, content of genomes i.e. matrices were analysed pairwise. Resulting clusters were merged to (meta-) clusters if their content overlapped. A scoring system named cons(CL) was developed. It quantifies conservedness of cluster membership for individual GFs. For the genome of Escherichia coli it was shown that a grouping of cluster elements on cons(CL) values dissected the clusters into smaller sets. These sets were frequently overlapped by known transcriptional units (TUs). This finding justifies the usage of cons(CL) scores to predict TU membership of genes. In addition, cons(CL) values provide a sound basis for non-homologous gene annotation. Based on cons(CL) values, examples of conserved clusters containing annotated genes and single ones with unknown function are given.