-
Notifications
You must be signed in to change notification settings - Fork 15
Description
I'm trying to use sumstats.py lift to lift hg19 SNPs in 5 GWAS sumstats files over to hg38. I have already run `sumstats.py csv' to standardise these files.
SNP CHR BP PVAL A1 A2 N Z OR BETA SE
rs11579922 1 1036860 .1662 A C 50914 -1.3868004 .97278 -.02759733 .0199
rs11579015 1 1036959 .1067 T C 49514 -1.6133769 .96435 -.03630098 .0225
rs11260592 1 1037303 .1716 T C 50914 -1.3683987 .97287 -.02750481 .0201
rs11260593 1 1037313 .169 A G 50914 -1.3730014 .97278 -.02759733 .0201
rs66622470 1 1038088 .1659 C G 50914 1.3867192 1.02798 .02759571 .0199
However, I'm getting the following error for 2 out of the 5 files so far - the others are still running:
Traceback (most recent call last):
File "python_convert/sumstats.py", line 2212, in <module>
args.func(args, log)
File "python_convert/sumstats.py", line 1375, in make_lift
df.loc[index, cols.CHR] = int(lifted[0][0][3:])
ValueError: invalid literal for int() with base 10: '2_KI270773v1_alt'
Analysis finished at Tue May 18 18:02:06 2021
Total time elapsed: 2.0h:7.0m:48.60999999999967s
This appears to relate to entries in the the 'hg19ToHg38.over.chain.gz' file as there are no alt_chrs in the original GWAS sumstat files. There are 114 alt_chrs in total.
I'm wondering if there is a way around this, i.e. can I add a parameter to ignore/deal with these loci? What exactly does `--keep-bad-snps' do? I'm reluctant to do this without knowing fully what it does.
Interestingly, this error does not arise when I use the standard liftover tool, but using that means I need to generate bed files first. sumstats.py would be the neatest option for me.
Here is my code:
rule lift_over:
input: SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg19_withZ_sumstats.tsv"
output: SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg38_sumstats.tsv"
message: "Formatting {input} sumstats"
log: SCRATCH + "logs/lift_over/{GWAS}_hg38.log"
params: SCRATCH + GWAS_DIR + "hg19ToHg38.over.chain.gz"
shell:
"""
python python_convert/sumstats.py lift \
--sumstats {input} \
--out {output} \
--chain-file {params} \
--log {log}
"""
I could also remove these entries from the chain file, but I thought I'd ask if there is a way to deal with them before proceeding.
Many Thanks.