Jul 20, 2015 Contact: Jim Tripp, DOE Joint Genome Institute hjtripp@lbl.gov Description of contents: The AdditionalFile1.xlsx file is described in the main text. The subfolders contain peptide and gene call data for a proteogenomics paper authored by H. James Tripp et al. 2015. The genecalls subdirectory contains the RefSeq Gene Call Directories column (col C) of the Excel spreadsheet in Additional File 1 of the paper. The peptides subdirectory contains the .gff files for the Peptide Source column (col D) of the same Excel file. The RefSeq subdirectories inside the genecalls subdirectory are named by organism. In each subdirectory are .fna files of nucleotides for each replicon in the genome of the organism and a set of files with Glimmer, GeneMark, and Prodigal gene predictions for each replicon. Note that not every replicon was used for each organism; only the 45 replicons listed in Additional File 1 were used. The peptide .gff files that begin with "NC_" come from Venter et al. 2011. The other peptide .gff files that contain organism names come from the Peptide Source and Peptide Ref columns (cols D and E) of Additional File 1.