Publications

Efficient Weighted Graph Matching on GPUs

Abstract

Weighted matching identifies a maximal subset of edges in a graph such that these edges do not share any vertices in common with each other. As a prototypical graph problem, matching has numerous applications in science and engineering, such as linear algebra, multi-level graph algorithms, computer vision and machine learning. There is a critical need for efficient matching algorithms. However, there are challenges in developing efficient, parallel graph matching methods on contemporary GPGPU systems, due to common complexities in general graph processing, such as irregular memory access patterns and load imbalances. Furthermore, increasingly massive graph sizes and resultant intermediate data commonly exceeds available GPU memory. Although dense-GPU systems are mainstream and offer accelerated on-node interconnection to enhance data access bandwidth, data dependencies and device synchronization costs in multi-GPU enabled massive-graph processing create challenges to sustainable scalability.

Michael Mandulak, Sayan Ghosh, S M Ferdous, Mahantesh Halappanvar, George Slota, "Efficient Weighted Graph Matching on GPUs." In the proceedings of SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, 2024.

Semi-Streaming Algorithms for Weighted k-Disjoint Matchings

Abstract

We design and implement two single-pass semi-streaming algorithms for the maximum weight k-disjoint matching (k-DM) problem. Given an integer k, the k-DM problem is to find k pairwise edge-disjoint matchings such that the sum of the weights of the matchings is maximized. For k ≥ 2, this problem is NP-hard. Our first algorithm is based on the primal-dual framework of a linear programming relaxation of the problem and is 1 -approximate. We also develop an approximation preserving reduction from k-DM 3+ε to the maximum weight b-matching problem. Leveraging this reduction and an existing semi-streaming b-matching algorithm, we design a ( 1 )(1 − 1 )-approximate semi-streaming algorithm for k-DM. For 2+ε k+1 any constant ε > 0, both of these algorithms require O(nk log21+ε n) bits of space. To the best of our knowledge, this is the first study of semi-streaming algorithms for the k-DM problem. We compare our two algorithms to state-of-the-art offline algorithms on 95 real-world and synthetic test problems, including thirteen graphs generated from data center network traces. On these instances, our streaming algorithms used significantly less memory (ranging from 6× to 512× less) and were faster in runtime than the offline algorithms. Our solutions were often within 5% of the best weights from the offline algorithms. We highlight that the existing offline algorithms run out of 1 TB of memory for most of the large instances (> 1 billion edges), whereas our streaming algorithms can solve these problems using only 100 GB memory for k = 8.

S. M. Ferdous, Bhargav Samineni, Alex Pothen, Mahantesh Halappanavar, Bala Krishnamoorthy, "Semi-Streaming Algorithms for Weighted k-Disjoint Matchings." In the proceedings of ESA 24: European Symposium on Algorithms, 2024.

AGS-GNN: Attribute-guided Sampling for Graph Neural Networks

Abstract

We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs). AGS-GNN exploits the node features and the connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. In homophilic graphs, vertices of the same class are more likely to be adjacent, but vertices of different classes tend to be adjacent in heterophilic graphs. GNNs have been successfully applied to homophilic graphs, but their utility to heterophilic graphs remains challenging. The state-of-the-art GNNs for heterophilic graphs use the full neighborhood of a node instead of sampling it, and hence do not scale to large graphs and are not inductive. We develop dual-channel sampling techniques based on feature-similarity and feature-diversity to select subsets of neighbors for a node that capture adaptive information from homophilic and heterophilic neighborhoods. Currently, AGS-GNN is the only algorithm that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, a novel contribution in this context. We pre-compute the sampling distribution in parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ({ extless} 100𝐾 nodes) and large (≥ 100𝐾 nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compared to the state-of-the-art approaches. AGS-GNN achieves test accuracy comparable to the best-performing heterophilic GNNs, even outperforming methods that use the entire graph for node classification. AGS-GNN converges faster than methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.

Siddhartha Shankar Das, S M Ferdous, Mahantesh M. Halappanavar, Edoardo Serra, Alex Pothen, "AGS-GNN: Attribute-guided Sampling for Graph Neural Networks." In the proceedings of Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024.

Streaming Matching and Edge Cover in Practice

Abstract

Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a 1/(2 + ε)-approximation of the weight while using O(n log W/ε) memory (here n is the number of vertices and W is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best 1/2-approximate offline algorithm while requiring less time and an order-of-magnitude less memory.

S M Ferdous, Alex Pothen, Mahantesh Halappanavar, "Streaming Matching and Edge Cover in Practice." In the proceedings of LIPIcs, Volume 301, SEA 2024, 2024.

Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing

Abstract

A coloring of a graph is an assignment of colors to vertices such that no two neighboring vertices have the same color. The need for memory-efﬁcient coloring algorithms is motivated by their application in computing clique partitions of graphs arising in quantum computations where the objective is to map a large set of Pauli strings into a compact set of unitaries. We present Picasso, a randomized memory-efﬁcient iterative parallel graph coloring algorithm with theoretical sublinear space guarantees under practical assumptions. The parameters of our algorithm provide a trade-off between coloring quality and resource consumption. To assist the user, we also propose a machine learning model to predict the coloring algorithm’s parameters considering these trade-offs. We provide a sequential and parallel implementation of the proposed algorithm.

S M Ferdous, Reece Neff, Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski, Michela Becchi, Mahantesh Halappanavar, "Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing." In the proceedings of 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024.

cuAlign: Scalable Network Alignment on GPU Accelerators

Abstract

Given two graphs, the objective of network alignment is to find a one-to-one mapping of vertices in one graph (𝐴) to vertices in the other (𝐵), such that the number of overlaps is maximized. We say that edges (𝑖, 𝑗) ∈ 𝐴 and (𝑖′, 𝑗′) ∈ 𝐵 are overlapped if 𝑖 is mapped to 𝑖′ and 𝑗 is mapped to 𝑗′. Network alignment is an important optimization problem with several applications in bioinformatics, computer vision and ontology matching. Since it is an NP-hard problem, efficient heuristics and scalable implementations are necessary. However, a combination of combinatorial and algebraic kernels within the network alignment algorithm poses significant hurdles for parallelization. Further, load imbalance and irregular DRAM traffic limit achievable performance on GPUs. In this work, we introduce a novel framework (cuAlign) that combines intra-network proximity using node (vertex) embedding, sparsification for computational efficiency, and belief propagation (BP) and approximate weighted matching for alignment. We demonstrate qualitative improvements up to 22% over state-of-the-art approaches. We provide a scalable implementation targeting modern GPU accelerators. Our novel approach identifies and exploits unique structural properties of the BP-based algorithm and employs code fusion to reduce data movement between different steps of the algorithm. Using a diverse set of inputs, we demonstrate up to 19× speedup for belief propagation, 3× speedup for approximate weighted matching, and 15× total, relative to a state-of-the-art multi-threaded implementation.

Lizhi Xiang, Arif Khan, S. M. Ferdous, Sr Aravind, Mahantesh Halappanavar, "cuAlign: Scalable Network Alignment on GPU Accelerators." In the proceedings of Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023.

AMG Preconditioners based on Parallel Hybrid Coarsening and Multi-objective Graph Matching

Pasqua D'Ambra, Fabio Durastante, S M Ferdous, Salvatore Filippone, Mahantesh Halappanavar, Alex Pothen, "AMG Preconditioners based on Parallel Hybrid Coarsening and Multi-objective Graph Matching." In the proceedings of 2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2023.

A Parallel Approximation Algorithm for Maximizing Submodular b-Matching

Abstract

We design new serial and parallel approximation algorithms for computing a maximum weight b-matching in an edge-weighted graph with a submodular objective function. This problem is NP-hard; the new algorithms have approximation ratio 1/3, and are relaxations of the Greedy algorithm that rely only on local information in the graph, making them parallelizable. We have designed and implemented Local Lazy Greedy algorithms for both serial and parallel computers. We have applied the approximate submodular b-matching algorithm to assign tasks to processors in the computation of Fock matrices in quantum chemistry on parallel computers. The assignment seeks to reduce the run time by balancing the computational load on the processors and bounding the number of messages that each processor sends. We show that the new assignment of tasks to processors provides a four fold speedup over the currently used assignment in the NWChemEx software on 8000 processors on the Summit supercomputer at Oak Ridge National Lab.

S M Ferdous, Alex Pothen, Arif Khan, Ajay Panyala, Mahantesh Halappanavar, "A Parallel Approximation Algorithm for Maximizing Submodular b-Matching." In the proceedings of First Applied and Combinatorial Discrete Algorithms (ACDA), 2021.

Privacy-Preserving Decentralized Aggregation for Federated Learning

Abstract

In this paper, we develop a privacy-preserving decentralized aggregation protocol for federated learning. We formulate the distributed aggregation protocol with the Alternating Direction Method of Multiplier (ADMM) algorithm and examine its privacy challenges. Unlike prior works that use differential privacy or homomorphic encryption for privacy, we develop a protocol that controls communication among participants in each round of aggregation to minimize privacy leakage. We establish the protocol’s privacy guarantee against an honestbut-curious adversary. We also propose an efﬁcient algorithm to construct such a communication pattern, which is inspired by combinatorial block design theory. Our secure aggregation protocol based on the novel group-based communication pattern leads to an efﬁcient algorithm for federated training with privacy guarantees. We evaluate our federated training algorithm on computer vision and natural language processing models over benchmark datasets with 9 and 15 distributed sites. Experimental results demonstrate the privacy-preserving capabilities of our algorithm while maintaining learning performance comparable to the baseline centralized federated learning.

Beomyeol Jeon, S M Ferdous, Muntasir Raihan Rahman, Anwar Walid, "Privacy-Preserving Decentralized Aggregation for Federated Learning." In the proceedings of IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2021.

For the full version see the arXiv link.

Approximation algorithms in combinatorial scientific computing

Abstract

We survey recent work on approximation algorithms for computing degree-constrained subgraphs in graphs and their applications in combinatorial scientific computing. The problems we consider include maximization versions of cardinality matching, edge-weighted matching, vertex-weighted matching and edge-weighted-matching, and minimization versions of weighted edge cover and b-edge cover. Exact algorithms for these problems are impractical for massive graphs with several millions of edges. For each problem we discuss theoretical foundations, the design of several linear or near-linear time approximation algorithms, their implementations on serial and parallel computers, and applications. Our focus is on practical algorithms that yield good performance on modern computer architectures with multiple threads and interconnected processors. We also include information about the software available for these problems.

Alex Pothen, S. M. Ferdous, Fredrik Manne, "Approximation algorithms in combinatorial scientific computing." Acta Numerica, 2019.

Adaptive Anonymization of Data using b-Edge Cover

Abstract

We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the ﬁrst sharedmemory as well as distributed memory parallel algorithms for the adaptive anonymity problem that achieves this goal, and produces high quality anonymized datasets.

Arif Khan, Krzysztof Choromanski, Alex Pothen, S. M. Ferdous, Mahantesh Halappanavar, Antonino Tumeo, "Adaptive Anonymization of Data using b-Edge Cover." In the proceedings of SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, 2018.

Parallel Algorithms Through Approximation: B-Edge Cover

Abstract

We describe a paradigm for designing parallel algorithms via approximation, and illustrate it on the b-Edge Cover problem. A b-Edge Cover of minimum weight in a graph is a subset C of its edges such that at least a specified number b(v) of edges in C is incident on each vertex v, and the sum of the edge weights in C is minimum. The Greedy algorithm and a variant, the LSE algorithm, provide 3/2-approximation guarantees in the worst-case for this problem, but these algorithms have limited parallelism. Hence we design two new 2-approximation algorithms with greater concurrency. The MCE algorithm reduces the computation of a b-Edge Cover to that of finding a b'-Matching, by exploiting the relationship between these subgraphs in an approximation context. The LSENW is derived from the LSE algorithm using static edge weights rather than dynamically computing effective edge weights. This relaxation gives S-LSE a worse approximation guarantee but makes it more amenable to parallelization. We prove that both the MCE and S-LSE algorithms compute the same b-EDGE COVER with at most twice the weight of the minimum weight edge cover. In practice, the 2-approximation and 3/2-approximation algorithms compute edge covers of weight within 10% the optimal. We implement three of the approximation algorithms, MCE, LSE, and S-LSE, on shared memory multi-core machines, including an Intel Xeon and an IBM Power8 machine with 8 TB memory. The MCE algorithm is the fastest of these by an order of magnitude or more. It computes an edge cover in a graph with billions of edges in 20 seconds using two hundred threads on the IBM Power8. We also show that the parallel depth and work can be bounded for the Suitor and b-Suitor algorithms when edge weights are random.

S M Ferdous, Arif Khan, Alex Pothen, "Parallel Algorithms Through Approximation: B-Edge Cover." In the proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018.

An Integer Programming Formulation of the Minimum Common String Partition Problem

Abstract

We consider the problem of finding a minimum common string partition (MCSP) of two strings, which is an NP-hard problem. The MCSP problem is closely related to genome comparison and rearrangement, an important field in Computational Biology. In this paper, we map the MCSP problem into a graph applying a prior technique and using this graph, we develop an Integer Linear Programming (ILP) formulation for the problem. We implement the ILP formulation and compare the results with the state-of-the-art algorithms from the literature. The experimental results are found to be promising.

S. M. Ferdous, M. Sohel Rahman, "An Integer Programming Formulation of the Minimum Common String Partition Problem." PLOS ONE, 2015.

Solving the Minimum Common String Partition Problem with the Help of Ants

S. M. Ferdous, M. Sohel Rahman, "Solving the Minimum Common String Partition Problem with the Help of Ants." In the proceedings of Advances in Swarm Intelligence, 2013.

Speech development of autistic children by interactive computer games

Abstract

Purpose – Speech disorder is one of the most common problems found with autistic children. The purpose of this paper is to investigate the introduction of computer-based interactive games along with the traditional therapies in order to help improve the speech of autistic children.

Mustafizur Rahman, S.M. Ferdous, Syed Ishtiaque, Anika Anwar, "Speech development of autistic children by interactive computer games." Interactive Technology and Smart Education, 2011.

A Computer Game Based Approach for Increasing Fluency in the Speech of the Autistic Children

Abstract

Autism is a complex developmental disability that typically appears during the first three years of life. This is the result of a neurological disorder that affects the functioning of human brain. Children diagnosed with autism often are self-absorbed and seem to exist in a private world where they are unable to successfully communicate and interact with others. Sometimes they have difficulties in developing speaking skills and understanding what others say to them. Lack of fluency in the speech is one of the most frequently found problems with autistic children. The traditional methods for increasing fluency were found to be monotonous and hence, not much successful during our three months of observations over the participants of Autism Welfare Foundation (AWF) at Dhaka. Therefore, we developed an interactive computer game for the autistic children for improving the fluency in their speech. Our game produced encouraging results over a participant during three months of observation. In this paper, we describe our project and the outcomes.

Anika Anwar, Md. Mustafizur Rahman, S.M. Ferdous, Samiul Alam Anik, Syed Ishtiaque Ahmed, "A Computer Game Based Approach for Increasing Fluency in the Speech of the Autistic Children." In the proceedings of 2011 IEEE 11th International Conference on Advanced Learning Technologies, 2011.

Increasing Intelligibility in the Speech of the Autistic Children by an Interactive Computer Game

Abstract

Autism is a disorder of neural development which affects about one in every 150 kids on average. One of the major complexities regarding autistic children in social communication is the speech disorder. The problems related to speech disorder fall into different categories and unintelligibility in speech is one of them. Although there is no definite medicine or treatment for autism, doctors, therapists, and special teachers can help kids with autism overcome many difficulties by different physical and psychological therapies. In this paper we have demonstrated our newly developed interactive computer game which will be helpful in increasing intelligibility in the speeches of autistic children and can be used as a therapy besides the traditional approaches. During our five months of intervention with the autistic children of Autism Welfare Foundation (AWF) at Dhaka, we checked the effectiveness of this therapy and got some encouraging results.

Md. Mustafizur Rahman, S.M. Ferdous, Syed Ishtiaque Ahmed, "Increasing Intelligibility in the Speech of the Autistic Children by an Interactive Computer Game." In the proceedings of 2010 IEEE International Symposium on Multimedia, 2010.