Skip to content

taffy norm affected by changes between blocks (that aren't in rows to be merged) #75

@glennhickey

Description

@glennhickey

(edited for real example):

wget -q https://public.gi.ucsc.edu/~hickey/debug/irwin/case2-tiny-h2m.raw.maf.gz
wget -q https://public.gi.ucsc.edu/~hickey/debug/irwin/genome.list

taffy view -i case2-tiny-h2m.raw.maf.gz | taffy norm -k | mafFilter -i hg38,Tarsius_lariang -m - > norm.maf

zcat case2-tiny-h2m.raw.maf.gz | mafRowOrderer -m - --order-file genome.list | taffy view | taffy norm -k | mafFilter -i hg38,Tarsius_lariang -m - > sort.norm.maf

Tarsius is is one row in the first maf

a
s       hg38.chr1       11707   6       +       248956422       GGGGC-C
s       Tarsius_lariang.k141_1886670    1879    6       -       2088    AGGGC-C
s       hg38.chr1       182226  6       +       248956422       GGGGC-C
s       hg38.chr12      9309644 6       +       133275309       GGGGC-C
s       hg38.chr12      123853115       6       -       133275309       GGGGC-C
s       hg38.chr12      31100331        6       +       133275309       GGGGC-C
s       hg38.chr12      11838   6       +       133275309       GGGGC-C
s       hg38.chr15      11930   6       -       101991189       GGGGC-C
s       hg38.chr16      11388   6       +       90338345        GGGGC-C
s       hg38.chr2       128591799       6       -       242193529       GGGGC-C
s       hg38.chr9       11820   6       +       138394717       GGGGC-C
s       hg38.chrX       12546   6       -       156040895       GGGGC-C

but two rows in the second maf

a
s       hg38.chr1       11707   6       +       248956422       GGGG-C-C
s       hg38.chr1       182226  6       +       248956422       GGGG-C-C
s       hg38.chr12      9309644 6       +       133275309       GGGG-C-C
s       hg38.chr12      123853115       6       -       133275309       GGGG-C-C
s       hg38.chr12      31100331        6       +       133275309       GGGG-C-C
s       hg38.chr12      11838   6       +       133275309       GGGG-C-C
s       hg38.chr15      11930   6       -       101991189       GGGG-C-C
s       hg38.chr16      11388   6       +       90338345        GGGG-C-C
s       hg38.chr2       128591799       6       -       242193529       GGGG-C-C
s       hg38.chr9       11820   6       +       138394717       GGGG-C-C
s       hg38.chrX       12546   6       -       156040895       GGGG-C-C
s       Tarsius_lariang.k141_1886670    1882    3       -       2088    ---G-C-C
s       Tarsius_lariang.k141_1886670    1879    3       -       2088    AGG-----

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions