Skip to content

background noise too high for my data? #98

@zmz1988

Description

@zmz1988

Hello, thanks for the nice and user-friendly tool! I have atac-seq data (150 bp pair-end) from small plant genome, and have been trying HMMRATAC for a while. So the problem for me is that I can't get the model right, as I had tried many combination of -l, -u and even -z. I showed one of the example as below.

Version:        1.2.10
Arguments Used:
-b      Sample.atac.forHMMRATAC.bam
-i     Sample.atac.forHMMRATAC.bam.bai
-g      Sample.genome
-o      Sample_60_25
--bedgraph      True
-u      60
-l      25
Fragment Expectation Maximum Done
Mean    50.0    StdDevs 20.0
Mean    190.63425024911237      StdDevs 62.502097145190604
Mean    400.60895549134216      StdDevs 50.93694387902569
Mean    729.022089951699        StdDevs 145.67453137113827
ScalingFactor   103.086173
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.531 1.169 1.604 4.339 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.075 1.678 1.373 0.931 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 2.349 4.864 4.519 4.93 ]

Model created and refined. See Can_60_25.model
Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.979 0.015 0.006
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.501 0.855 0.831 1.056 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.012 0.985 0.003
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.495 2.338 1.724 0.847 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.015 0.009 0.976
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.065 1.737 2.032 2.792 ]

Genome split and subtracted masked regions
0 round viterbi done
37 round viterbi done
Total time (seconds)=   4519

I had tried at least ten different combinations of -l and -u, and none of them can get the ideal model you described in your paper. After checking some of the log files showed in the issues channel, I realised that most of people have mean values of 0 for state 0 (which is the background or starting model if I understand correctly?). So does this mean that the background noise of my data is pretty high?

I also attach a insert size picture here (our data were generated though the purification of nuclei, and the insert size calculation is for the clean bam after multiple filtering)
Can.dedup.clean.bam.insertsize.hist.pdf

Could you please give me some advice how I could solve the problem of my data? Thanks a lot in advance!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions