Skip to content

Conversation

@mgallegos4
Copy link
Collaborator

Summary

This PR closes issue #125. If merged this pull request will add support to find specific sub processes that are called within python source code. It also adds a unit test and a test file for the extract_sys_callls function.

@mgallegos4 mgallegos4 self-assigned this Oct 1, 2025
@mgallegos4 mgallegos4 marked this pull request as ready for review October 1, 2025 23:40
@mgallegos4 mgallegos4 linked an issue Oct 1, 2025 that may be closed by this pull request
_ => continue,
};

if let Ok(libs) = query_db(&func_name) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for all: Do we want only matches that exist in the databases?

Copy link
Collaborator

@swest50 swest50 Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound tend to think we'd want all the matches, even if they don't exist in the database.
In some case that might even be more interesting: "What's this call to this random program I haven't heard of <xyz>"

The C++ parser (for #include statements, not sys calls) does include results that it doesn't find in the database, they just result in an entry with an empty list for "no matches found" eg. ("example_included_file.h", [])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great point. I will make that change!

fn process_files<T>(&self, file_paths: T) -> HashMap<PythonImport, Vec<Vec<String>>>
fn is_likely_syscall(module: &str, func: &str) -> bool {
let combined = format!("{}.{}", module, func);
let predefined = ["os.system", "subprocess.run", "os.run"];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, yeah great point I can't find it now either. I'll take it out and also edit the test file I created with that. Thanks for catching that!

@mgallegos4 mgallegos4 force-pushed the feature/tree-sitter-python-subprocess-syscalls branch from 7d78d0b to 3ff6d3b Compare October 9, 2025 00:16
@nightlark nightlark added the enhancement New feature or request label Oct 9, 2025
@mgallegos4 mgallegos4 force-pushed the feature/tree-sitter-python-subprocess-syscalls branch from 3ff6d3b to 8cb6a82 Compare October 21, 2025 01:13
Copy link
Collaborator

@nightlark nightlark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Are we looking up the command names in one of the ubuntu datasets? I'm seeing packages that seem to match Python package import names rather than ones that would install the native (Ubuntu) commands that get run from a shell.

os.system("echo from os.system")

# Should match: "subprocess.run"
subprocess.run(["echo", "from subprocess.run"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: It looks like test case is giving us an extraneous result:

OS(Application("from")):
	[]

When the argument to subprocess.run is a list, only the first item in the list will have the command to run in it (every item in the list afterwards is just an argument). When the argument to subprocess.run is a string, I think it behaves the same as os.system (in most cases). The underlying cause seems to be that right now is that when it is a list we seem to be treating each item in the list as if it were a command.

suggestion: Some other commands that can be treated the same way as subprocess.run for spawning a subprocess are: subprocess.Popen, subprocess.call, subprocess.check_call, and subprocess.check_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identify implicit Python dependencies

5 participants