-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello,
was interested in using your repo and running your benchmark suite but it seems that some of it still uses pandas functions or mentions pandas: (shown below) and some of the documentation on how to run it is out of date ("-s" doesn't seem to exist)
I also tried a test sequence and found that exact matches didn't work with a subsequence of it.
repeatable script (below)
this seems to be from https://github.com/IEDB/PEPMatch/blob/master/pepmatch/matcher.py#L308 as target_kmers does not include duplicates.
"""
stdout:
Matching peptides: 0%| | 0/1 [00:00<?, ?peptide/s]
Missing preprocessed file or table. Creating table for k=5. This may take a bit...
Matching peptides: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.88peptide/s]
None
"""
from pepmatch import Matcher
if __name__ == "__main__":
seq_fn = "keratin.fasta"
keratin_seq = """>sp|P35527|K1C9_HUMAN Keratin, type I cytoskeletal 9 OS=Homo sapiens OX=9606 GN=KRT9 PE=1 SV=3
MSCRQFSSSYLSRSGGGGGGGLGSGGSIRSSYSRFSSSGGGGGGGRFSSSSGYGGGSSRV
CGRGGGGSFGYSYGGGSGGGFSASSLGGGFGGGSRGFGGASGGGYSSSGGFGGGFGGGSG
GGFGGGYGSGFGGFGGFGGGAGGGDGGILTANEKSTMQELNSRLASYLDKVQALEEANND
LENKIQDWYDKKGPAAIQKNYSPYYNTIDDLKDQIVDLTVGNNKTLLDIDNTRMTLDDFR
IKFEMEQNLRQGVDADINGLRQVLDNLTMEKSDLEMQYETLQEELMALKKNHKEEMSQLT
GQNSGDVNVEINVAPGKDLTKTLNDMRQEYEQLIAKNRKDIENQYETQITQIEHEVSSSG
QEVQSSAKEVTQLRHGVQELEIELQSQLSKKAALEKSLEDTKNRYCGQLQMIQEQISNLE
AQITDVRQEIECQNQEYSLLLSIKMRLEKEIETYHNLLEGGQEDFESSGAGKIGLGGRGG
SGGSYGRGSRGGSGGSYGGGGSGGGYGGGSGSRGGSGGSYGGGSGSGGGSGGGYGGGSGG
GHSGGSGGGHSGGSGGNYGGGSGSGGGSGGGYGGGSGSRGGSGGSHGGGSGFGGESGGSY
GGGEEASGSGGGYGGGSGKSSHS"""
with open(seq_fn, "w") as fh:
fh.write(keratin_seq)
seq = ["GGGGGGGLGSGGSIRSSY"]
m = Matcher(query=seq, proteome_file=seq_fn, max_mismatches=0, k=5)
print(m.match())
% rg 'pandas|pd\.'
benchmarking/benchmarking.py
6:import pandas as pd
23:) -> pd.DataFrame:
49: benchmark_df = pd.DataFrame(columns = columns)
115: expected_df = pd.read_csv(inputs['expected'], sep='\t')
128: new_df = pd.DataFrame([benchmark_stats], columns = columns)
129: benchmark_df = pd.concat([benchmark_df, new_df], ignore_index = True)
137:def recall(results_df: pd.DataFrame, expected_df: pd.DataFrame) -> float:
142: results: pandas dataframe with results from the benchmarking.
143: expected_df: pandas dataframe with expected matches for the benchmarking."""
151: matched_rows = pd.merge(results, expected, how='inner', on=columns)
187: master_df['Searching Time (s)'] = pd.to_numeric(master_df['Searching Time (s)'])
README.md
24:- [Pandas](https://pandas.pydata.org/)
141:If specifying `dataframe`, the ```match()``` method will return a pandas dataframe which can be stored as a variable:
benchmarking/methods/blast.py
5:import pandas as pd
74: df = pd.read_csv(
157: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/z.py
3:import pandas as pd
114: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/mmseqs2.py
6:import pandas as pd
118: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/horspool.py
3:import pandas as pd
106: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/boyer_moore.py
3:import pandas as pd
280: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/diamond.py
4:import pandas as pd
111: return pd.DataFrame(all_matches, columns = columns)
benchmarking/methods/NmerMatch.py
9:import pandas as pd
254: results_df = pd.DataFrame([s.split(',') for s in results], columns = columns)
benchmarking/methods/knuth_morris_pratt.py
3:import pandas as pd
125: return pd.DataFrame(all_matches, columns = columns)
Metadata
Metadata
Assignees
Labels
No labels