The raw contigs FASTA file, the GTF annotations and the peptide sequences were downloaded from: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_040712315.1/ The hard-masked FASTA file was produced by Perl conversion of the soft-masked sequences from NCBI with the following: perl -lne 'if(!/^>/){ s/[a-z]/N/g } print' Castanea_sativa_Marrone.GCF040712315.1.RefSeq.dna.toplevel.fna > Castanea_sativa_Marrone.GCF040712315.1.RefSeq.dna_rm.genome.fa The rest of files were produced by RSAT-Tools to install this genome.