Skip to content

Allow vcf-expression-annotator to work on multiple samples or work on a VCF that has previously been annotated #82

@wanqiangdehuoguo

Description

@wanqiangdehuoguo

For example,

$ head genetpm.tsv

  GeneID          N190533 T190533
ENSG00000000003  0.743   12.1  
ENSG00000000005  0.0232   0.115
ENSG00000000419 46.4     43.4  
ENSG00000000457  5.22     6.26 
ENSG00000000460  9.80     4.45 
ENSG00000000938  2.22    31.5  

190533.vep.vcf is a mulit sample vcf for somatic mutation:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  N190533 T190533
chr1    1041944 .       C       G       .       PASS AS_FilterStatus=SITE;AS_SB_TABLE=209,110|41,22;DP=397;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=229,220;MMQ=60,60;MPOS=37;NALOD=2.01;NLOD=30.01;POPAF=6.00;ROQ=93;TLOD=136.09;CSQ=G|splice_polypyrimidine_tract_variant&intron_variant|LOW|AGRN|ENSG00000188157|Transcript|ENST00000379370.7|protein_coding||6/35|ENST00000379370.7:c.1178-12C>G|||||||||1||HGNC|HGNC:329|1|||MAGR......VVVGRHPLHLLEDAVTKPELRPCPTP      GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:142,0:9.708e-03:142:48,0:47,0:100,0:90,52,0,0       0/1:177,63:0.26 2:240:61,21:67,23:134,47:119,58,41,22

The error is:

$ vcf-expression-annotator --sample-name N190533 --id-column GeneID --expression-column N190533 --output-vcf 190533.vep1.vcf 190533.vep.vcf genetpm.tsv custom gene
WARNING:root:69 of 1300 genes did not have an expression entry for their gene id.
$ vcf-expression-annotator --sample-name T190533 --id-column GeneID --expression-column T190533 --output-vcf 190533.vep2.vcf 190533.vep1.vcf genetpm.tsv custom gene
Traceback (most recent call last):
  File "/home/sym/.conda/envs/pVACtools/bin/vcf-expression-annotator", line 8, in <module>
    sys.exit(main())
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 191, in main
    (vcf_reader, is_multi_sample) = create_vcf_reader(args)
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 90, in create_vcf_reader
    raise Exception("ERROR: VCF {} is already gene expression annotated. GX format header already exists.".format(args.input_vcf))
Exception: ERROR: VCF 190533.vep1.vcf is already gene expression annotated. GX format header already exists.

And same error is

$ vcf-expression-annotator --sample-name N190533,T190533 --id-column GeneID --expression-column N190533,T190533 --output-vcf 190533.vep1.vcf 190533.vep.vcf genetpm.tsv custom gene
Traceback (most recent call last):
  File "/home/sym/.conda/envs/pVACtools/bin/vcf-expression-annotator", line 8, in <module>
    sys.exit(main())
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 191, in main
    (vcf_reader, is_multi_sample) = create_vcf_reader(args)
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 84, in create_vcf_reader
    raise Exception("ERROR: VCF {} does not contain a sample column for sample {}.".format(args.input_vcf, args.sample_name))
Exception: ERROR: VCF 190533.vep.vcf does not contain a sample column for sample N190533,T190533.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions