-
Notifications
You must be signed in to change notification settings - Fork 4
Feature/tree sitter python subprocess syscalls #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| _ => continue, | ||
| }; | ||
|
|
||
| if let Ok(libs) = query_db(&func_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for all: Do we want only matches that exist in the databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wound tend to think we'd want all the matches, even if they don't exist in the database.
In some case that might even be more interesting: "What's this call to this random program I haven't heard of <xyz>"
The C++ parser (for #include statements, not sys calls) does include results that it doesn't find in the database, they just result in an entry with an empty list for "no matches found" eg. ("example_included_file.h", [])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great point. I will make that change!
src/parsing/python_parser.rs
Outdated
| fn process_files<T>(&self, file_paths: T) -> HashMap<PythonImport, Vec<Vec<String>>> | ||
| fn is_likely_syscall(module: &str, func: &str) -> bool { | ||
| let combined = format!("{}.{}", module, func); | ||
| let predefined = ["os.system", "subprocess.run", "os.run"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find os.run in https://docs.python.org/3/library/os.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, yeah great point I can't find it now either. I'll take it out and also edit the test file I created with that. Thanks for catching that!
7d78d0b to
3ff6d3b
Compare
3ff6d3b to
8cb6a82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Are we looking up the command names in one of the ubuntu datasets? I'm seeing packages that seem to match Python package import names rather than ones that would install the native (Ubuntu) commands that get run from a shell.
| os.system("echo from os.system") | ||
|
|
||
| # Should match: "subprocess.run" | ||
| subprocess.run(["echo", "from subprocess.run"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: It looks like test case is giving us an extraneous result:
OS(Application("from")):
[]
When the argument to subprocess.run is a list, only the first item in the list will have the command to run in it (every item in the list afterwards is just an argument). When the argument to subprocess.run is a string, I think it behaves the same as os.system (in most cases). The underlying cause seems to be that right now is that when it is a list we seem to be treating each item in the list as if it were a command.
suggestion: Some other commands that can be treated the same way as subprocess.run for spawning a subprocess are: subprocess.Popen, subprocess.call, subprocess.check_call, and subprocess.check_output
Summary
This PR closes issue #125. If merged this pull request will add support to find specific sub processes that are called within python source code. It also adds a unit test and a test file for the extract_sys_callls function.