Oxford Nanopore Duplex data for HG002. Duplex reads are found in the stereo_duplex and additional_duplex folders. For questions about this data, contact khmiga@ucsc.edu ############################################################################### ## Layout ## ############################################################################### stereo_duplex/ *.bam (All reads) *_pass.bam (reads over Q10) *_pass.fastq.gz (reads over Q10) *_fail.bam (reads under Q10) *_fail.fastq.gz (reads under Q10) additional_duplex/ *.bam (All reads) *_pass.bam (reads over Q10) *_pass.fastq.gz (reads over Q10) *_fail.bam (reads under Q10) *_fail.fastq.gz (reads under Q10) simplex/ *.bam (used for duplex calling, not recommended to use as inputs to analysis) *.fastq.gz (used for duplex calling, not recommended to use as inputs to analysis) ancillary_files/ pair_ids/ (used for duplex calling, not recommended to use as inputs to analysis) sequencing_summary/ (used for duplex calling, not recommended to use as inputs to analysis) ############################################################################### ## Data Generation ## ############################################################################### ## Simplex calling Fast5 files were converted to POD5 and then grouped by channel with: ``` pod5 convert fast5 --force-overwrite --threads 90 ${FAST5}/*.fast5 ${POD5}/output.pod5 pod5 subset --force_overwrite --output ${POD5_GROUPED} --summary $SEQSUMMARY --columns $POD5_GROUPING -M ${POD5}/output.pod5 ``` Call Simplex data with Dorado: ``` MODEL_PATH="dorado_v4_duplex_beta_models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0" dorado basecaller -x "cuda:all" $MODEL_PATH $POD5_GROUPED > ${OUTPUT}/${output_name}_Dorado_v0.1.1_400bps_sup.sam ``` Converted from sam to fastq.gz and bam with samtools ## Duplex calling: ``` duplex_tools pair ${OUTPUT}/${output_name}_Dorado_v0.1.1_400bps_sup.bam dorado duplex ${MODEL_PATH} $POD5_GROUPED --pairs ${OUTPUT}/pairs_from_bam/pair_ids_filtered.txt > ${OUTPUT}/${output_name}_Dorado_v0.1.1_400bps_sup_stereo_duplex.sam ``` convert to bam with samtools and Pass/Fail files are generated by filtering for reads by Q-score ## Read rescue and duplex calling on rescued reads: ​ For extra duplex, first fast-call (with --emit-moves) ``` FAST_MODEL_PATH="dorado_v4_duplex_beta_models/dna_r10.4.1_e8.2_400bps_fast@v4.0.0" dorado basecaller ${FAST_MODEL_PATH} ${POD5} --emit-moves > ${OUTPUT}/${output_name}_unmapped_reads_with_moves.sam ​``` Second, use duplex tools split pairs to recover non-split duplex reads ``` duplex_tools split_pairs ${OUTPUT}/${output_name}_unmapped_reads_with_moves.sam ${POD5} pod5s_splitduplex/ ​``` Finally, duplex-call with sup ``` dorado duplex ${MODEL_PATH} pod5s_splitduplex/ --pairs split_duplex_pair_ids.txt > ${OUTPUT}/${output_name}_duplex_splitduplex.sam ``` convert to bam with samtools and Pass/Fail files are generated by filtering for reads by Q-score