Eric J Ma's Website

Sparse Matrix Multiplication in Python 3

written by Eric J. Ma on 2016-07-24 | tags: data science python sparse matrix linear algebra


Sparse matrix multiplication shows up in many places, and in Python, it's often handy to use a sparse matrix representation for memory purposes.

One thing nice about the newest version of Python 3 is the @ operator, which takes two matrices and multiplies them. While numpy has had the np.dot(mat1, mat2) function for a while, I think mat1 @ mat2 can be a more expressive way of expressing the matrix multiplication operation. One hidden benefit of the @ operator: it handles scipy.sparse matrices pretty well!

Consider the following binary matrix in a dense format:

In [7]: dense = np.random.binomial(n=1, p=0.1, size=(10, 10))

In [8]: dense
Out[8]:
array([[0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 1, 1, 0, 0, 0, 0]])

We can convert that matrix to a sparse format:

In [9]: sparse = csr_matrix(dense)

In [10]: sparse
Out[10]:
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 14 stored elements in Compressed Sparse Row format>

Let's say now that we want to multiply it against a random matrix. Let's first instantiate the random matrix:

In [11]: rand = np.random.random(size=(10, 1))

In [12]: rand
Out[12]:
array([[ 0.07692164],
       [ 0.96495678],
       [ 0.03917639],
       [ 0.00568423],
       [ 0.62973733],
       [ 0.09481768],
       [ 0.97617109],
       [ 0.58497563],
       [ 0.85287145],
       [ 0.76737415]])

Cool! So, starting with the np.dot() operation...

In [13]: np.dot(dense, rand)
Out[13]:
array([[ 1.54993241],
       [ 0.        ],
       [ 0.80655054],
       [ 0.96495678],
       [ 0.07692164],
       [ 0.04486062],
       [ 0.58497563],
       [ 0.        ],
       [ 0.07692164],
       [ 0.84065304]])

Because np.dot doesn't natively recognize scipy.sparse matrices, when we try to do the matrix multiplication, we get...

In [14]: np.dot(sparse, rand)
Out[14]: /Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/compressed.py:296: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, using <, >, or !=, instead.
  "using <, >, or !=, instead.", SparseEfficiencyWarning)

array([[ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>]], dtype=object)

Totally not what we wanted! On the other hand, if we used the @ operator...

In [15]: sparse @ rand
Out[15]:
array([[ 1.54993241],
       [ 0.        ],
       [ 0.80655054],
       [ 0.96495678],
       [ 0.07692164],
       [ 0.04486062],
       [ 0.58497563],
       [ 0.        ],
       [ 0.07692164],
       [ 0.84065304]])

Exactly what we wanted!

The reverse multiplication, in which we take a (10x1) matrix and try to multiply it with a (10x10) matrix, should fail:

In [16]: np.dot(rand, dense)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-5abc172f5ef1> in <module>()
----> 1 np.dot(rand, dense)

ValueError: shapes (10,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)

With the scipy.sparse matrix being multiplied in np.dot, however, it doesn't work out to the correct answer...

In [17]: np.dot(rand, sparse)
Out[17]: /Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/compressed.py:296: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, using <, >, or !=, instead.
  "using <, >, or !=, instead.", SparseEfficiencyWarning)

array([[ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>]], dtype=object)

On the other hand, the @ matrix multiplier catches the error perfectly!

In [18]: rand @ sparse
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-4aa22b6b8f2e> in <module>()
----> 1 rand @ sparse

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __rmatmul__(self, other)
    402             raise ValueError("Scalar operands are not allowed, "
    403                              "use '*' instead")
--> 404         return self.__rmul__(other)
    405
    406     ####################

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __rmul__(self, other)
    386             except AttributeError:
    387                 tr = np.asarray(other).transpose()
--> 388             return (self.transpose() * tr).transpose()
    389
    390     #####################################

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __mul__(self, other)
    353
    354             if other.shape[0] != self.shape[1]:
--> 355                 raise ValueError('dimension mismatch')
    356
    357             result = self._mul_multivector(np.asarray(other))

ValueError: dimension mismatch

I hope I've provided one more piece of evidence in favour of switching to Python 3!


Cite this blog post:
@article{
    ericmjl-2016-sparse-3,
    author = {Eric J. Ma},
    title = {Sparse Matrix Multiplication in Python 3},
    year = {2016},
    month = {07},
    day = {24},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2016/7/24/sparse-matrix-multiplication-in-python-3},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!