Skip to content

Duplicated sequences in reference fasta #42

@robbueck

Description

@robbueck

Some taxa have duplicated sequences in the fasta, thus you get an error like this from samtools:

[W::sam_hdr_create] Duplicated sequence "QJKH01000049.1" in file "/tmp/panphlan_wrzoniqg.sam"
[E::sam_hrecs_update_hashes] Duplicate entry "QJKH01000001.1" in sam header
samtools view: failed to add PG line to the header
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
samtools index: "panphlan/output/Dielma_fastidiosa/map_results/SRR14117082_Dielma_fastidiosa_out.bam" is in a format that cannot be usefully indexed
[E] Samtools index encountered some error.

fixed by removing duplicated sequences with seqkit rmdup -n
I had this issue for the genomes of: Cutibacterium_acnes, Roseburia_intestinalis, Olsenella_uli, Acinetobacter_ursingii, Actinomyces_naeslundii, Dialister_pneumosintes, Peptoniphilus_lacrimalis and Dielma_fastidiosa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions