SparCity Data Collection

A Collection of Sparse Problem Instances.

The motivation for creating a repository of sparse problem instances (including sparse matrices, sparse graphs, sparse tensors and other relevant data sets) is to build an open data repository for the scientific community that contains the has been created. The majority of the sparse matrices are from the well-known SuiteSparse Matrix Collection (https://sparse.tamu.edu/), but all these SuiteSparse matrices have also been reordered by the Reverse Cuthill-McKee (RCM) algorithm, and saved as new instances of sparse matrices. (The original RCM algorithm has been extended to handle non-square matrices.) Other reordering strategies, such as using the PaToH partitioning tool, are also currently being experimented to generate differently permuted matrices. These reordered sparse matrices are stored as new instances for helping to ensure reproducibility of research results that, e.g., concern performance optimizations arising from reordering. The other part of the current collection of sparse matrices consists of matrices that are in connection with the real-world simulator of cardiac electrophysiology. Different matrices of this category arises from numerical discretizations of irregularly shaped 3D domains with various resolutions. The entire collection of sparse matrices currently amounts to around 8TB data, stored in the file system of the eX3 infrastructure at Simula.

Data Collection Details

Currently SparCity data collection contains one dataset.

Sparse Matrices:This dataset contains a mirror copy of [The SuiteSparse Matrix Collection](http://sparse.tamu.edu) as well as the derivatives generated based on the corresponding sparse matrices from the SuiteSparse collection.

Browse and Download

Whe whole collection of sparse matrices of the SparCity data collection can be browsed and downloaded separately here. Complete sets of the dataset files are available for download as the following ZIP-archives:
File Description Size Download
suitesparse.orig.mtx.zip The entire set of original MTX files of the SuiteSparse dataset in one zip file. 212GB
suitesparse.rcm.mtx.zip The entire set of RCM-reordered MTX files of the SuiteSparse dataset in one zip file. 214GB
suitesparse.orig.png.zip The entire set of visual representation (PNG images) of original MTX files of the SuiteSparse dataset in one zip file. 73MB
suitesparse.rcm.png.zip The entire set of visual representation (PNG images) of RCM-reordered MTX files of the SuiteSparse dataset in one zip file. 63MB
suitesparse.orig.csr.zip The entire set of CSR-binary-converted original matrices of the SuiteSparse dataset in one zip file. 91GB
suitesparse.rcm.csr.zip The entire set of CSR-binary-converted RCM-reordered matrices of the SuiteSparse dataset in one zip file. 100GB

Cite

If you use SparCity data collection or it's parts in your research, please cite the following paper:

@inproceedings{dhandhania2021explaining,
    title = {Explaining the Performance of Supervised and Semi-Supervised Methods for Automated Sparse Matrix Format Selection},
    author = {
        Dhandhania, Sunidhi and 
        Deodhar, Akshay and 
        Pogorelov, Konstantin and 
        Biswas, Swarnendu and 
        Langguth, Johannes},
    year={2021}
    booktitle = {50th International Conference on Parallel Processing Workshop},
    doi = {10.1145/3458744.3474049},
    pages={1--10},
    numpages = {10},
}

Terms of use

The use of the SparCity data collection is restricted to research and education purposes. The use of the dataset is forbidden for commercial use without prior written permission. For other purposes, contact us (see below). In all documents and publications that use the SparCity dataset or report experimental results based on the SparCity dataset, a reference to the dataset paper has to be included (see above). Please email konstantin (at) simula (dot) no if you have any questions regarding how to cite the dataset.

Contact

Email konstantin (at) simula (dot) no if you have any questions about the dataset and our research activities. We always welcome collaboration and joint research!

References for the used third-party data

If you use original or reordered sparse matrices from the SparCity data cata collection, please consider the following licenses and cite the following corresponding papers:

SuiteSparse:

The matrices themselves are under the [CC-BY 4.0 License](https://creativecommons.org/licenses/by/4.0).

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software 38, 1, Article 1 (December 2011), 25 pages. DOI: https://doi.org/10.1145/2049662.2049663

Kolodziej et al., (2019). The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35), 1244, DOI: https://doi.org/10.21105/joss.01244
Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (November 2011), 25 pages. DOI: https://doi.org/10.1145/2049662.2049663