Abstract
Supporting dataset for genome assembly data found within NCBI BioProjects PRJNA1032953 (UTT2), PRJNA722258 (IDT2) and PRJNA795150 (IDT3-Reference Genome). Fasta assembly data used in the EDTA analysis to generate subsequent output files are available from the NCBI Genome database and raw sequence data are available from the NCBI SRA database. Each input fasta contains the nine pseudo-chromosomes described in Melton et al. 2022</a>. Reads from each sample were mapped to the nine pseudo-chromosomes and used to call a consensus sequence. The EDTA analysis provides several outputs, listed below. The primary file of interest is the "SAMPLE_consensus.fasta.mod.EDTA.TEanno.gff3" file. This file was used as inputs for comparisons of TE content across the three genomes.
Please visit https://github.com/oushujun/EDTA for more information about the EDTA pipeline.