Skip to content

PyArrow > 2.0.0 - FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. #1

@guillermo-jimenez

Description

@guillermo-jimenez

Hi all,

First of all, nice tool, congrats!

I've been checking it recently and I've had some issues when using it with PyArrow > 2.0.0. Take a serialization of a random array:

import fleetfmt
import pathlib
import numpy as np

aaa = np.random.rand(10000,10000)
print("Writing numpy content to a new Fleet file.")
with pathlib.Path("test.fleet").open('wb') as fhandle, fleetfmt.FileWriter(fhandle) as writer:
    for i,value in enumerate(aaa):
        writer.append(i, value)
print("Done.")

This raises warnings related to pyarrow.serialize and pyarrow.deserialize:

FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.

This can be solved making a few small changes in the code. In writer.py:

        import pickle
        [...]
        hbuf = SCHEMA_HEAD_SERDES.to_bytes(len(sbuf)) # Line 97
        [...]
        buf = pickle.dumps(record,protocol=5) # Line 106
        head = RECORD_HEAD_SERDES.to_bytes(len(buf)) # Line 107
        [...]
        kbuf = pickle.dumps(self._keymap,protocol=5) # Line 119

In reader.py:

        import pickle
        [...]
        rec = pickle.loads(buf) # Line 83
        [...]
        self._schema = pa.ipc.read_schema(wrap) # Line 96

And in base.py:

        import pickle
        [...]
        keymapdes = pickle.loads(keymapser) # Line 63

Furthermore, this produces an increase in performance. Comparison of running timeit for the aforementioned 10000x10000 array with pickle and pyarrow:

315 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # With pyarrow
161 ms ± 446 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # With pickle protocol 5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions