-
|
Dear Professor Smith, I have been analyzing multiple morphological datasets with TreeSearch and have realized that when I try to estimate the support for each clade with I thought this might be a problem, so I also calculated (parsimony-based) site concordance factor values in IQ-TREE, which were quite different from those in TreeSearch and generally showed lower values. Is this because TreeSearch's site concordance factor possibly considers inapplicable characters? Or is there something wrong with the implementation of TreeSearch, or IQ-TREE? Below are the results of my tests using the dataset of Aria et al. (2015), which is a sample file in TreeSearch. Strict consensus tree with site concordance factor values calculated in TreeSearchStrict consensus tree with site concordance factor values calculated in IQ-TREEHere's the folder used for the analysis. I would appreciate it if you could review this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
|
I've checked the maths in the TreeSearch My guess would be that, because IQTREE uses a random subsample of quartets where TreeSearch enumerates all quartets, its figures reflect the quartets it happened to sample. If IQTREE is configured with much larger datasets, it may sample a very small proportion of quartets, which in a small dataset like this could lead to statistically misleading numbers (e.g. the alarmingly low '7' near the base of the leanchoiliids). I don't often use IQTREE – perhaps you could see whether this is plausible, perhaps by setting different random seeds or replicating the analysis? Another possibility that I ruled out is that this reflects a difference in how the QC value is averaged across sites. I've introduced this as an option to the user in #176. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.



I've looked back over Minh et al. 2020, and think I see where the difference in behaviour lies.
The interpretation of a branch in Minh et al. (2020) is shown in their Fig. 1:
They define interpret a branch as defining four clades, and thus only consider quartets that contain one representative of each surrounding subtree, A, B, C and D.
My implementation interprets a branch as defining two clades – i.e. I consider any quartet that contains two taxa from (AB), and two taxa from (CD).
Since we are sampling from a different subset of quartets, it makes sense that we obtain different values.
I haven't yet wrapped my head around why the Minh et al. implementation would be preferable. I have s…