Skip to content

Bug report [BUG] Tutorial throws pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object #215

@arne-cl

Description

@arne-cl

Describe the bug
Cell [17] of the human cerebellum pycisTopic tutorial does not produce a TSS annotation BED file but throws a "pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object".

To Reproduce
Install scenicplus via conda using the official instructions and follow the pycisTopic tutorial up to cell [17].

Error output

- Get TSS annotation from Ensembl BioMart with the following settings:
  - biomart_name: "hsapiens_gene_ensembl"
  - biomart_host: "http://www.ensembl.org/"
  - transcript_type: ['protein_coding']
  - use_cache: True
/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pybiomart/dataset.py:269: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  result = pd.read_csv(StringIO(response.text), sep='\t')
Traceback (most recent call last):
  File "/home/arne/miniconda3/envs/scenicplus/bin/pycistopic", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/cli/pycistopic.py", line 26, in main
    args.func(args)
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/cli/subcommand/tss.py", line 459, in run_tss_get_tss_annotation
    get_tss_annotation_bed_file(
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/cli/subcommand/tss.py", line 164, in get_tss_annotation_bed_file
    tss_annotation_bed_df_pl = ga.get_tss_annotation_from_ensembl(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/gene_annotation.py", line 172, in get_tss_annotation_from_ensembl
    ensembl_tss_annotation_bed_df_pl = pl.from_pandas(ensembl_tss_annotation).select(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/polars/convert.py", line 719, in from_pandas
    return pl.DataFrame._from_pandas(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/polars/dataframe/frame.py", line 621, in _from_pandas
    pandas_to_pydf(
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/polars/utils/_construction.py", line 1837, in pandas_to_pydf
    arrow_dict[str(col)] = _pandas_series_to_arrow(
                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arne/miniconda3/envs/scenicplus/lib/python3.11/site-packages/polars/utils/_construction.py", line 665, in _pandas_series_to_arrow
    return pa.array(values, pa.large_utf8(), from_pandas=nan_to_null)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 340, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 86, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object

Expected behavior
I expected the TSS annotation BED file to be written to "outs/qc/tss.bed".

Version (please complete the following information):

  • Python 3.11.8
  • pycisTopic 2.0a0

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions