*Single species represented by multiple clades*

*Prochlorococcus* and marine *Synechococcus* organisms are small marine cyanobacteria, their genomes are characterized by small size and an evolutionary trend toward low GC content [25]. Whereas many shared derived characters define *Prochlorococcus* as a clade, many genomebased analyses recover them as paraphyletic. The single species, *Prochlorococcus marinus*, comprises six named ecotypes. Our ribosomal marker analysis and whole-genome alignment (described above in section on Methods) analysis suggests that this species should be repre‐

**Figure 6.** Ribosomal-marker-based clade comprises various species of Brucella. The pairwise genome distance is de‐ fined by the number of shared proteins in the core set of Brucella pan-genome. Green dots – proteins present in CORE set; red dots – proteins absent in CORE set.

**Figure 7.** *Prochlorococcus marinus* interspecies diversity. The dendrogram is calculated using blast genome alignment score (%identity). The leaf nodes displayed as circles represent genomes of individual isolates/strains.

sented by 11 different clades (see Figure 7.) These results are supported by recent genomic analysis of the genus of *Prochlorococcus* [26].

Novel species from noncultured not-isolated single cell and metagenome assemblies and new unclassified isolates (<genus> sp.) from clinical and epidemiological studies can be organized in hierarchical groups by genome sequence comparison methods. These groups can be used for downstream analysis: 1) pan-genome by clades not species; 2) groups of closely related genomes below species that can be calculated by nucleotide whole-genome comparison like K-mer or BLAST; 3) classification validation; 4) visualization of large data sets by selecting the genome representatives. Some of the applications marker-based clades and tight genome groups have been previously briefly described in [27,28].
