Skip to content

Conversation

@tmaklin
Copy link
Owner

@tmaklin tmaklin commented Mar 4, 2025

  • Add -f/--format toggle to specify the output format (default .aln, can also write .vcf)

.vcf from kbo-cli have this format:

##fileformat=VCFv4.4
##contig=<ID=30224_1#305_1,length=4971108>
##fileDate=20250304
##source=kbo-cli v0.1.1
##reference=30224_1#305_1.fna
##phasing=none
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	unknown
30224_1#305_1	28660	.	C	T	.	.	.	GT	1
30224_1#305_1	87002	.	A	G	.	.	.	GT	1
30224_1#305_1	169420	.	G	A	.	.	.	GT	1

When writing .vcf, kbo-cli will always process and report results for the reference contigs separately. This is different from writing .aln, where the contigs are still processed separately but the results are concatenated.

Building the .vcf header and records is done using noodles_vcf.

Caveats:

  • .vcf files currently only contain SNPs.
  • INDELs should be possible algorithmically but require some research.
  • map does not handle SNPs that are very close to each other (<< k), this may be possible to resolve by traversing the SBWT whenever short gaps are encountered.
  • Default options to map are not good if the reference is very fragmented, can get better results by changing -k and --max-error-prob.

@tmaklin
Copy link
Owner Author

tmaklin commented Mar 20, 2025

Changes merged into #9

@tmaklin tmaklin closed this Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants