Over 6,869 Mycobacteriophages have been isolated and purified. Of these, 1,367 genomes have been sequenced at the DNA level and more are added each year through the SEA-PHAGES program. Sequenced mycobacteriophages are grouped into clusters based on a 50% or greater nucleotide identity. The number and breadth of these clusters represents the diversity present in the environment. Each year, as new phages are discovered by students in the SEA-PHAGES program, the question arises, “Which isolates should we sequence?” In order to sequence phages that represent the greatest possible diversity, and thus broaden under-represented clusters and identify new singletons, we need a rapid way to identify phage cluster membership or singleton status before selection for DNA sequencing. One approach is to identify unique short nucleotide sequences that are common across a cluster. Unique sequences could then be used as primers or probes to assign membership to a cluster or potential singleton group. A computer program called PhageUniqueSeq was written in Go language to identify all the oligonucleotides that are common to all members of a cluster but unique between clusters. The program generated millions of unique sequences that can be used as probes or in Polymerase Chain Reactions to determine sub-cluster assignment. Unique sequences will help us to target underrepresented phages for sequence analysis.

Dr. Claire Rinehart


Biochemistry | Bioinformatics | Genomics