Skip to content

csr_matrices #122

@kwchurch

Description

@kwchurch

I have a large csr_matrix in npz format. I'd like to use that as input as is, but it doens't have IDs field

added this to graph.py (but it doesn't work)

if 'IDs' in raw:
    self.set_node_ids(raw["IDs"].tolist())
else:
    # added by kwc                                                                                                                                                                                                                          
    self.set_node_ids(np.arange(raw["shape"][0]).tolist())

Created edg2npz.py with this:

import numpy as np
import scipy.sparse
import sys

dtype=bool
if sys.argv[2] == "int":
    dtype=int

X=[]
Y=[]

for line in sys.stdin:
    fields = line.rstrip().split()
    if len(fields) >= 2:
	x,y = fields[0:2]
	X.append(int(x))
        Y.append(int(y))

X = np.array(X, dtype=np.int32)
Y = np.array(Y, dtype=np.int32)
N = 1+max(np.max(X), np.max(Y))
V = np.ones(len(X), dtype=bool)

M = scipy.sparse.csr_matrix((V, (X, Y)), dtype=dtype, shape=(N,N))

scipy.sparse.save_npz(sys.argv[1], M)

called it with

python edg2npz.py demo/karate.bool.npz bool < demo/karate.edg 

Unfortunately, I can't use this kind of csr_matrix...

I can write out my matrix to text and then run pecanpy on that, but my matrix is very large and it will take a long time to write it out and read it back. My matrix has N = 300M nodes and E=2B nonzero edges.

 pecanpy --input demo/karate.bool.npz --output demo/karate.int.emb --mode SparseOTF
init pecanpy: p = 1, q = 1, workers = 1, verbose = False, extend = False, gamma = 0, random_state = None
WARNING: when p = 1 and q = 1 with unweighted graph, highly recommend using the FirstOrderUnweighted over SparseOTF. The runtime could be improved greatly with improved  memory usage.
Took 00:00:00.02 to load Graph
Took 00:00:00.00 to pre-compute transition probabilities
Traceback (most recent call last):
  File "/home/k.church/venv/gft/bin/pecanpy", line 8, in <module>
    sys.exit(main())
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line 333, in main
    walks = simulate_walks(args, g)
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/wrappers.py", line 18, in wrapper
    result = func(*args, **kwargs)
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line 320, in simulate_walks
    return g.simulate_walks(args.num_walks, args.walk_length)
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/pecanpy.py", line 153, in simulate_walks
    walk_idx_mat = self._random_walks(
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)                                                                                                                                                                                          
Failed in nopython mode pipeline (step: nopython frontend)                                                                                                                                                                                          
No implementation of function Function(<built-in function itruediv>) found for signature:                                                                                                                                                           
                                                                                                                                                                                                                                                    
 >>> itruediv(array(bool, 1d, C), Literal[int](1))                                                                                                                                                                                                  

There are 6 candidate implementations:

  • Of which 2 did not match due to:
    Overload in function 'NumpyRulesInplaceArrayOperator.generic': File: numba/core/typing/npydecl.py: Line 244.
    With argument(s): '(array(bool, 1d, C), int64)':
    Rejected as the implementation raised a specific error:
    AttributeError: 'NoneType' object has no attribute 'args'
    raised from /home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/typing/npydecl.py:255
  • Of which 2 did not match due to:
    Operator Overload in function 'itruediv': File: unknown: Line unknown.
    With argument(s): '(array(bool, 1d, C), int64)':
    No match for registered cases:
    • (int64, int64) -> float64
    • (int64, uint64) -> float64
    • (uint64, int64) -> float64
    • (uint64, uint64) -> float64
    • (float32, float32) -> float32
    • (float64, float64) -> float64
    • (complex64, complex64) -> complex64
    • (complex128, complex128) -> complex128
  • Of which 2 did not match due to:
    Overload of function 'itruediv': File: numba/core/typing/npdatetime.py: Line 94.
    With argument(s): '(array(bool, 1d, C), int64)':
    No match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions