Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

DBSCAN clusters of size < min_points being returned #196

@NatPRoach

Description

@NatPRoach

Hello, I've been using your implementation of DBSCAN, and noticed that its been outputting clusters smaller than the minimum size I specified at initialization. The relevant section of code I've been using to look at clusters is below:

    let mut db = DBSCAN::new(1.5, 5);
    db.train(&similarity_matrix).unwrap();
    let cluster_assignments = db.clusters().unwrap();
    let mut clusters = Vec::<Vec::<usize>>::new();
    for (i,assignment) in cluster_assignments.iter().enumerate() {
        if assignment.is_some(){
            let val = assignment.unwrap();
            println!("read {} {}: cluster {}",i, read_ids[i], val );
            if clusters.len() == val{
                clusters.push(vec![i])
            }
            else{
                clusters[val].push(i)
            }
        }
        else{
            println!("read {} {}: cluster {}", i, read_ids[i], -1 );
        }
    }
    for (i,cluster) in clusters.iter().enumerate(){
        println!("Cluster {}, size {}:", i, cluster.len());
        for index in cluster.iter(){
            println!(">{}",read_ids[*index]);
            let bytes = seqs[*index].clone();
            println!("{}",String::from_utf8(bytes).unwrap());
        }
    }

Using this code I've been getting clusters of sizes < 5, as small as 1 or 2 elements in some cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions