Eric J Ma's Website

Sparse Matrix Multiplication in Python 3

written by Eric J. Ma on 2016-07-24 | tags: data science python sparse matrix linear algebra


Sparse matrix multiplication shows up in many places, and in Python, it's often handy to use a sparse matrix representation for memory purposes.

One thing nice about the newest version of Python 3 is the @ operator, which takes two matrices and multiplies them. While numpy has had the np.dot(mat1, mat2) function for a while, I think mat1 @ mat2 can be a more expressive way of expressing the matrix multiplication operation. One hidden benefit of the @ operator: it handles scipy.sparse matrices pretty well!

Consider the following binary matrix in a dense format:

In [7]: dense = np.random.binomial(n=1, p=0.1, size=(10, 10))

In [8]: dense
Out[8]:
array([[0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 1, 1, 0, 0, 0, 0]])

We can convert that matrix to a sparse format:

In [9]: sparse = csr_matrix(dense)

In [10]: sparse
Out[10]:
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 14 stored elements in Compressed Sparse Row format>

Let's say now that we want to multiply it against a random matrix. Let's first instantiate the random matrix:

In [11]: rand = np.random.random(size=(10, 1))

In [12]: rand
Out[12]:
array([[ 0.07692164],
       [ 0.96495678],
       [ 0.03917639],
       [ 0.00568423],
       [ 0.62973733],
       [ 0.09481768],
       [ 0.97617109],
       [ 0.58497563],
       [ 0.85287145],
       [ 0.76737415]])

Cool! So, starting with the np.dot() operation...

In [13]: np.dot(dense, rand)
Out[13]:
array([[ 1.54993241],
       [ 0.        ],
       [ 0.80655054],
       [ 0.96495678],
       [ 0.07692164],
       [ 0.04486062],
       [ 0.58497563],
       [ 0.        ],
       [ 0.07692164],
       [ 0.84065304]])

Because np.dot doesn't natively recognize scipy.sparse matrices, when we try to do the matrix multiplication, we get...

In [14]: np.dot(sparse, rand)
Out[14]: /Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/compressed.py:296: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, using <, >, or !=, instead.
  "using <, >, or !=, instead.", SparseEfficiencyWarning)

array([[ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>]], dtype=object)

Totally not what we wanted! On the other hand, if we used the @ operator...

In [15]: sparse @ rand
Out[15]:
array([[ 1.54993241],
       [ 0.        ],
       [ 0.80655054],
       [ 0.96495678],
       [ 0.07692164],
       [ 0.04486062],
       [ 0.58497563],
       [ 0.        ],
       [ 0.07692164],
       [ 0.84065304]])

Exactly what we wanted!

The reverse multiplication, in which we take a (10x1) matrix and try to multiply it with a (10x10) matrix, should fail:

In [16]: np.dot(rand, dense)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-5abc172f5ef1> in <module>()
----> 1 np.dot(rand, dense)

ValueError: shapes (10,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)

With the scipy.sparse matrix being multiplied in np.dot, however, it doesn't work out to the correct answer...

In [17]: np.dot(rand, sparse)
Out[17]: /Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/compressed.py:296: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, using <, >, or !=, instead.
  "using <, >, or !=, instead.", SparseEfficiencyWarning)

array([[ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>],
       [ <10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>]], dtype=object)

On the other hand, the @ matrix multiplier catches the error perfectly!

In [18]: rand @ sparse
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-4aa22b6b8f2e> in <module>()
----> 1 rand @ sparse

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __rmatmul__(self, other)
    402             raise ValueError("Scalar operands are not allowed, "
    403                              "use '*' instead")
--> 404         return self.__rmul__(other)
    405
    406     ####################

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __rmul__(self, other)
    386             except AttributeError:
    387                 tr = np.asarray(other).transpose()
--> 388             return (self.transpose() * tr).transpose()
    389
    390     #####################################

/Users/ericmjl/anaconda/lib/python3.5/site-packages/scipy/sparse/base.py in __mul__(self, other)
    353
    354             if other.shape[0] != self.shape[1]:
--> 355                 raise ValueError('dimension mismatch')
    356
    357             result = self._mul_multivector(np.asarray(other))

ValueError: dimension mismatch

I hope I've provided one more piece of evidence in favour of switching to Python 3!


I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!