Skip to content

Merge delimiter for input file to define how to number columns #165

@nroak

Description

@nroak

I'm interested in defining column separator for -c option in bedtools merge. I have a bedj file where the 4th column of input file has characters that are automatically detected as column separators. I want to ignore those and use TAB as a column separator instead. I have given an example file below and the output I get vs output I expect.

#Input FIle (generated from bedtools intersect)
chr1	4510001	4769999	{"color":"rgba(128,0,128,1.00)","exon":[[4510001,4519999],[4760001,4769999]],"name":"_n19_qBL1.7390461E-4"}	.	-1	-1		.	0
chr1	4850001	5099999	{"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"}	chr1	4857814	4897909	Tcea1	40095
chr1	4850001	5099999	{"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"}	chr1	5070018	5162529	Atp6v1h	29981
chr1	4850001	5099999	{"color":"rgba(128,0,128,1.00)","exon":[[4850001,4874999],[5075001,5099999]],"name":"_n59_qBL1.1236174E-9"}	chr1	4909576	5070285	Rgs20	160709
# Expected output with bedtools merge -c 8,9 -o collapse,collapse
chr1	4510001	5099999 Tcea1,Atp6v1h,Rgs20 40095,29981,160709
# Actual Output
chr1	4850001	5099999	{"color":"rgba(128	0	128	5099999]],5099999]],5099999]]	"name":"_n59_qBL1.1236174E-9"}	chr1	4857814	4897909	Tcea1	40095,"name":"_n59_qBL1.1236174E-9"}	chr1	5070018	5162529	Atp6v1h	29981,"name":"_n59_qBL1.1236174E-9"}	chr1	4909576	5070285	Rgs20	160709

As you can see merge used comma as a delimiter instead of a TAB. A way to define this input delimiter would be extremely useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions