Research
My research interests mainly lie in the areas of computer vision, machine learning and more recently natural language processing, with a special focus on
efficient AI, including data efficiency and model efficiency.
I am also interested in image/video understanding, unsupervised/self-supervised representation learning and multimodal learning.
|
|
How Transferable are Video Representations Based on Synthetic Data?
Yo-whan Kim, Samarth Mishra, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Kate Saenko, Aude Oliva, Rogerio Feris
Neural Information Processing Systems Datasets (NeurIPS), 2022
[Dataset] [Supplementary Material]
We propose a new benchmark, SynAPT, for studying the transferability of synthetic video representations for action recognition.
|
|
FETA: Towards Specializing Foundational Models for Expert Task Applications
Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Peter Staar, Kate Saenko, Rogerio Feris, Leonid Karlinsky
Neural Information Processing Systems Datasets (NeurIPS), 2022
[Dataset] [Supplementary Material]
We introduce FETA benchmark to understand technical documentations, via learning to match their graphical illustrations to corresponding language descriptions.
|
|
Selective Regression Under Fairness Criteria
Abhin Shah, Yuheng Bu, Joshua K Lee, Subhro Das, Rameswar Panda, Prasanna Sattigeri, Gregory Wornell
International Conference on Machine Learning (ICML), 2022
We demonstrate the performance disparities across different subgroups for selective regression and develop novel methods to mitigate such disparities.
|
|
VALHALLA: Visual Hallucination for Machine Translation
Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu (Richard) Chen, Rogerio Feris, David Cox, Nuno Vasconcelos
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[Project Page] [Code] [Supplementary Material]
We introduce a visual hallucination framework which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation.
|
|
RegionViT: Regional-to-Local Attention for Vision Transformers
Chun-Fu (Richard) Chen, Rameswar Panda, Quanfu Fan
International Conference on Learning Representations (ICLR), 2022
[Code]
We propose a new architecture that adopts the pyramid structure and employ a regional-to-local attention rather than global self-attention in vision transformers.
|
|
Can an Image Classifier Suffice For Action Recognition?
Quanfu Fan, Chun-Fu (Richard) Chen, Rameswar Panda
International Conference on Learning Representations (ICLR), 2022
[Code]
We introduce the idea of rearranging input video frames into super images to re-purpose an image classifer for video action recognition.
|
|
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
International Conference on Computer Vision (ICCV), 2021
CVPR Workshop on Sight and Sound (CVPR-W), 2021 [PDF]
[Code]
We extend the concept of instance-level contrastive learning with a multimodal clustering step in the training pipeline to capture semantic similarities across modalities.
|
|
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu (Richard) Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky
International Conference on Computer Vision (ICCV), 2021 (Oral)
We propose a detector-free approach for weakly supervised grounding by learning to separate randomly blended images conditioned on the corresponding texts.
|
|
Fair Selective Classification Via Sufficiency
Joshua Lee, Yuheng Bu, Deepta Rajan, Prasanna Sattigeri, Rameswar Panda, Subhro Das, Gregory Wornell
International Conference on Machine Learning (ICML), 2021 (Oral)
We prove that sufficiency can be used to train fairer selective classifiers which ensure that precision always increases as coverage is decreased for all groups.
|
|
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris
International Conference on Learning Representations (ICLR), 2021
[Project Page] [Code]
We introduce an adaptive temporal fusion network that dynamically fuses channels from current and past feature maps for strong temporal modelling in action recognition.
|
|
VA-RED2: Video Adaptive Redundancy Reduction
Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
International Conference on Learning Representations (ICLR), 2021
[Project Page] [Code]
We propose an input-dependent adaptive framework for efficient video recognition that automatically decides what feature maps to compute per input instance.
|
|
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler,
Richard Chen, Minsik Cho, Rogerio Feris, David Kung, Bishwaranjan Bhattacharjee
AAAI Conference on Artificial Intelligence (AAAI), 2021
We analyze the architecture transferability of different NAS methods by performing a series of experiments on several large scale image benchmarks.
|
|
Large Scale Neural Architecture Search with Polyharmonic Splines
Ulrich Finkler, Michele Merler, Rameswar Panda, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan,
Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee
AAAI Workshop on Meta-Learning for Computer Vision (AAAI-W), 2021
We propose a NAS method based on polyharmonic splines that can perform architecture search directly on large scale image datasets like ImageNet22K.
|
|
Adversarial Knowledge Transfer from Unlabeled Data
Akash Gupta*, Rameswar Panda*, Sujoy Paul, Jianming Zhang, Amit K. Roy-Chowdhury
ACM Multimedia (MM), 2020
[Project Page] [Code]
We present a novel adversarial framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier on a given visual recognition task.
|
|
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris
European Conference on Computer Vision (ECCV), 2020
[Project Page] [Code]
We propose an adaptive approach to select optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed video.
|
|
Mitigating Dataset Imbalance via Joint Generation and Classification
Aadarsh Sahoo, Ankit Singh, Rameswar Panda, Rogerio Feris, Abir Das
ECCV Workshop on Imbalance Problems in Computer Vision (ECCV-W), 2020
We introduce a joint dataset repairment strategy by combining classifier with a GAN that makes up for the deficit of training examples from the minority class by producing additional examples.
|
|
Fairness of Classifiers Across Skin Tones in Dermatology
Newton M. Kinyanjui, Timothy Odonga, Celia Cintas, Noel C. F. Codella, Rameswar Panda, Prasanna Sattigeri, Kush R. Varshney
Medical Image Computing and Computer Assisted Interventions (MICCAI), 2020
We present an approach to estimate the consistency in performance of classifiers across populations with varying skin tones in skin disease benchmarks.
|
|
Non-Adversarial Video Synthesis with Learned Priors
Abhishek Aich, Akash Gupta, Rameswar Panda, Rakib Hyder, Salman Asif, Amit K. Roy-Chowdhury
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[Project Page] [Code]
We introduce a novel non-adversarial framework for generating a wide range of diverse videos from latent noise vectors without any any conditional input reference frame.
|
|
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
Sk Miraj Ahmed, Aske R. Lejbolle, Rameswar Panda, Amit K. Roy-Chowdhury
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
We propose an approach to swiftly on-board new camera(s) in an existing re-id network using only source models and limited labeled data, but without having access to source camera data.
|
|
Estimating Skin Tone and Effects on Classification Performance in Dermatology Datasets
Newton M. Kinyanjui, Timothy Odonga, Celia Cintas, Noel C. F. Codella, Rameswar Panda, Prasanna Sattigeri, Kush R. Varshney
NeurIPS Fair Machine Learning for Health Workshop (NeurIPS-W), 2019
We present an approach to estimate skin tone in benchmark skin disease datasets, and investigate whether model performance is dependent on this measure
|
|
FFNet: Video Fast-Forwarding via Reinforcement Learning
Shuyue Lan, Rameswar Panda, Qi Zhu, Amit K. Roy-Chowdhury
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
We introduce an online framework for fast-forwarding a video while presenting its important and interesting content on the fly without processing or even obtaining the entire video.
|
|
Weakly Supervised Summarization of Web Videos
Rameswar Panda, Abir Das, Ziyan Wu, Jan Ernst, Amit K. Roy-Chowdhury
International Conference on Computer Vision (ICCV), 2017
We introduce a weakly supervised approach that requires only video-level annotations for summarizing long unconstrained web videos.
|
|
Unsupervised Adaptive Re-identification in Open World Dynamic Camera Networks
Rameswar Panda*, Amran Bhuiyan*, Vittorio Murino, Amit K. Roy-Chowdhury
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
We propose an unsupervised adaptation scheme for re-identification models where a new camera may be temporarily inserted into an existing system to get additional information.
|
|
Collaborative Summarization of Topic-Related Videos
Rameswar Panda, Amit K. Roy-Chowdhury
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
This paper presents a collaborative video summarization approach that exploits visual context from a set of topic-related videos to extract an informative summary of a given video.
|
|
Diversity-aware Multi-Video Summarization
Rameswar Panda, Niluthpol C. Mithun, Amit K. Roy-Chowdhury
IEEE Transactions on Image Processing (TIP), 2017
This paper introduces a new generalized sparse optimization framework for summarizing multiple videos generated from a video search or from a multi-view camera network.
|
|
Sparse Modeling for Topic-oriented Video Summarization
Rameswar Panda, Amit K. Roy-Chowdhury
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
This paper presents a diversity-aware sparse optimization framework for summarizing topi-related videos generated from a video search.
|
|
Video Summarization in a Multi-View Camera Network
Rameswar Panda, Abir Das, Amit K. Roy-Chowdhury
IEEE International Conference on Pattern Recognition (ICPR), 2016
This paper presents a framework for summarizing multi-view videos by exploiting both intra- and inter-view content correlations in a joint embedding space.
|
|
Embedded Sparse Coding for Summarizing Multi-View Videos
Rameswar Panda, Abir Das, Amit K. Roy-Chowdhury
IEEE International Conference on Image Processing (ICIP), 2016
This paper presents a stochastic multi-view frame embedding based on KL divergence to preserve correlations in multi-view learning.
|
|
Generating Diverse Image Datasets with Limited Labeling
Niluthpol C. Mithun, Rameswar Panda, Amit K. Roy-Chowdhury
ACM Multimedia (MM), 2016
This paper presents a semi-supervised sparse coding framework which can be used to both create a dataset from scratch or enrich an existing dataset with diverse examples.
|
|
Active Image Pair Selection for Continuous Person Re-identification
Abir Das, Rameswar Panda, Amit K. Roy-Chowdhury
IEEE International Conference on Image Processing (ICIP), 2015
We present a continuous learning re-id system with a human in the loop which not only provides image labels but also improves the learned model by providing attribute based explanations..
|
|
Scalable Video Summarization using Skeleton Graph and Random Walk
Rameswar Panda, Sanjay K. Kuanar, Ananda S. Chowdhury
IEEE International Conference on Pattern Recognition (ICPR), 2014
This paper presents a scalable video summarization framework for both the analysis of the input video as well as the generation of summaries according to user-specified length constraints.
|
|
Video Storyboard Design using Delaunay Graphs
Ananda S. Chowdhury, Sanjay K. Kuanar, Rameswar Panda, Moloy N. Das
IEEE International Conference on Pattern Recognition (ICPR), 2012
This paper uses dynamic Delunay grpah clustering for summarizing videos.
|
Services
- Workshop Organizer: Dynamic Neural Networks (DNetCV) at CVPR 2022,
Dynamic Neural Networks (DNetCV) at CVPR 2021,
Neural Architecture Search (NAS) at CVPR 2021,
Multi-Modal Video Analysis at ECCV 2020,
Neural Architecture Search (NAS) at CVPR 2020,
Multi-modal Video Analysis and Moments in Time Challenge at ICCV 2019.
- Tutorial Organizer: Efficient Video Understanding at ICCV 2021,
Visual Data Summarization at CVPR 2019.
- Reviewer: CVPR, ICCV, ECCV, ICLR, NeurIPS, AAAI, WACV, ACCV, BMVC, TPAMI, TIP, TMM, TCSVT, IJCV, PR, PRL.
|
|