2024 |
Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption.
Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu* Abstract:
Infrared-visible image fusion (IVIF) is a fundamental and critical task in the field of computer vision. Its aim is to integrate the unique characteristics of both infrared and visible spectra into a holistic representation. Since 2018, growing amount and diversity IVIF approaches step into a deep-learning era, encompassing introduced a broad spectrum of networks or loss functions for improving visual enhancement. As research deepens and practical demands grow, several intricate issues like data compatibility, perception accuracy, and efficiency cannot be ignored. Regrettably, there is a lack of recent surveys that comprehensively introduce and organize this expanding domain of knowledge. Given the current rapid development, this paper aims to fill the existing gap by providing a comprehensive survey that covers a wide array of aspects. Initially, we introduce a multi-dimensional framework to elucidate the prevalent learningbased IVIF methodologies, spanning topics from basic visual enhancement strategies to data compatibility, task adaptability, and further extensions. Subsequently, we delve into a profound analysis of these new approaches, offering a detailed lookup table to clarify their core ideas. Last but not the least, We also summarize performance comparisons quantitatively and qualitatively, covering registration, fusion and follow-up highlevel tasks. Beyond delving into the technical nuances of these learning-based fusion approaches, we also explore potential future directions and open issues that warrant further exploration by the community. For additional information and a detailed data compilation, please refer to our GitHub repository: https://github.com/RollingPlain/IVIF Survey. |
Abstract:
Underwater images are often affected by light refraction and absorption, reducing visibility and interfering with subsequent applications. Existing underwater image enhancement methods primarily focus on improving visual quality while overlooking practical implications. To strike a balance between visual quality and application, we propose a heuristic invertible network for underwater perception enhancement, dubbed HUPE, which enhances visual quality and demonstrates flexibility in handling other downstream tasks. Specifically, we introduced a information-preserving reversible transformation with embedded Fourier transform to establish a bidirectional mapping between underwater images and their clear images. Additionally, a heuristic prior is incorporated into the enhancement process to better capture scene information. To further bridges the feature gap between vision-based enhancement images and application-oriented images, a semantic collaborative learning module is applied in the joint optimization process of the visual enhancement task and the downstream task, which guides the proposed enhancement model to extract more task-oriented semantic features while obtaining visually pleasing images. Extensive experiments, both quantitative and qualitative, demonstrate the superiority of our HUPE over state-of-the-art methods. The source code is available at https://github.com/ZengxiZhang/HUPE. |
A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion.
Risheng Liu*, Zhu Liu, Jinyuan Liu, Xin Fan, Zhongxuan Luo Abstract:
Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM. |
Learning with Constraint Learning: New Perspective, Solution Strategy and Various Applications.
Risheng Liu*, Jiaxin Gao, Xuan Liu, Xin Fan Abstract:
The complexity of learning problems, such as Generative Adversarial Network (GAN) and its variants, multi-task and meta-learning, hyper-parameter learning, and a variety of real-world vision applications, demands a deeper understanding of their underlying coupling mechanisms. Existing approaches often address these problems in isolation, lacking a unified perspective that can reveal commonalities and enable effective solutions. Therefore, in this work, we proposed a new framework, named Learning with Constraint Learning (LwCL), that can holistically examine challenges and provide a unified methodology to tackle all the above-mentioned complex learning and vision problems. Specifically, LwCL is designed as a general hierarchical optimization model that captures the essence of these diverse learning and vision problems. Furthermore, we develop a gradient-response based fast solution strategy to overcome optimization challenges of the LwCL framework. Our proposed framework efficiently addresses a wide range of applications in learning and vision, encompassing three categories and nine different problem types. Extensive experiments on synthetic tasks and real-world applications verify the effectiveness of our approach. The LwCL framework offers a comprehensive solution for tackling complex machine learning and computer vision problems, bridging the gap between theory and practice. |
Abstract:
Restoring high-quality images from degraded hazy observations is a fundamental and essential task in the field of computer vision. While deep models have achieved significant success with synthetic data, their effectiveness in real-world scenarios remains uncertain. To improve adaptability in real-world environments, we construct an entirely new computational framework by making efforts from three key aspects: imaging perspective, structural modules, and training strategies. To simulate the often-overlooked multiple degradation attributes found in real-world hazy images, we develop a new hazy imaging model that encapsulates multiple degraded factors, assisting in bridging the domain gap between synthetic and real-world image spaces. In contrast to existing approaches that primarily address the inverse imaging process, we design a new dehazing network following the "localization-and-removal" pipeline. The degradation localization module aims to assist in network capture discriminative haze-related feature information, and the degradation removal module focuses on eliminating dependencies between features by learning a weighting matrix of training samples, thereby avoiding spurious correlations of extracted features in existing deep methods. We also define a new Gaussian perceptual contrastive loss to further constrain the network to update in the direction of the natural dehazing. Regarding multiple full/no-reference image quality indicators and subjective visual effects on challenging RTTS, URHI, and Fattal real hazy datasets, the proposed method has superior performance and is better than the current state-of-the-art methods. See more results: https://github.com/fyxnl/KA_Net |
Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy.
Risheng Liu, Zhu Liu, Wei Yao, Shangzhi Zeng, Jin Zhang Abstract:
This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied on lower-level convexity simplification, our work specifically tackles large-scale BLO problems involving nonconvexity in both the upper and lower levels. We simultaneously address computational and theoretical challenges by introducing an innovative single-loop gradient-based algorithm, utilizing the Moreau envelope-based reformulation, and providing non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, our algorithm relies solely on first-order gradient information, enhancing its practicality and efficiency, especially for large-scale BLO learning tasks. We validate our approach's effectiveness through experiments on various synthetic problems, two typical hyper-parameter learning tasks, and a real-world neural architecture search application, collectively demonstrating its superior performance. |
Abstract:
One-shot medical image segmentation (MIS) aims to cope with the expensive, time-consuming, and inherent human bias annotations. One prevalent method to address one-shot MIS is joint registration and segmentation (JRS) with a shared encoder, which mainly explores the voxel-wise correspondence between the labeled data and unlabeled data for better segmentation. However, this method omits underlying connections between task-specific decoders for segmentation and registration, leading to unstable training. In this paper, we propose a novel Bi-level Learning of Task-Specific Decoders for one-shot MIS, employing a pretrained fixed shared encoder that is proved to be more quickly adapted to brand-new datasets than existing JRS without fixed shared encoder paradigm. To be more specific, we introduce a bi-level optimization training strategy considering registration as a major objective and segmentation as a learnable constraint by leveraging inter-task coupling dependencies. Furthermore, we design an appearance conformity constraint strategy that learns the backward transformations generating the fake labeled data used to perform data augmentation instead of the labeled image, to avoid performance degradation caused by inconsistent styles between unlabeled data and labeled data in previous methods. Extensive experiments on the brain MRI task across ABIDE, ADNI, and PPMI datasets demonstrate that the proposed Bi-JROS outperforms state-of-the-art one-shot MIS methods for both segmentation and registration tasks. The code will be available at https://github.com/Coradlut/Bi-JROS. |
2023 |
Hierarchical Optimization-Derived Learning.
Risheng Liu, Xuan Liu, Shangzhi Zeng, Jin Zhang, Yixuan Zhang Abstract:
In recent years, by utilizing optimization techniques to formulate the propagation of deep model, a variety of so-called Optimization-Derived Learning (ODL) approaches have been proposed to address diverse learning and vision tasks. Although having achieved relatively satisfying practical performance, there still exist fundamental issues in existing ODL methods. In particular, current ODL methods tend to consider model construction and learning as two separate phases, and thus fail to formulate their underlying coupling and depending relationship. In this work, we first establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. Then we rigorously prove the joint convergence of these two sub-tasks, from the perspectives of both approximation quality and stationary analysis. To our best knowledge, this is the first theoretical guarantee for these two coupled ODL components: optimization and learning. We further demonstrate the flexibility of our framework by applying HODL to challenging learning tasks, which have not been properly addressed by existing ODL methods. Finally, we conduct extensive experiments on both synthetic data and real applications in vision and other learning tasks to verify the theoretical properties and practical performance of HODL in various application scenarios. |
Value-Function-based Sequential Minimization for Bi-level Optimization.
Risheng Liu, Xuan Liu, Shangzhi Zeng, Jin Zhang, Yixuan Zhang Abstract:
Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. However,most existing strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods able to solve BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into approximated single-level problems, we provide a new algorithm, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to address the above issues. Specifically, BVFSM constructs a series of value-function-based approximations, and thus avoids repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, time-consuming especially for high-dimensional tasks. We also extend BVFSM to address BLO with additional functional constraints. More importantly, BVFSM can be used for the challenging pessimistic BLO, which has never been properly solved before. In theory, we prove the convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is completely discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO (e.g., optimistic, pessimistic, and with constraints) with solid convergence guarantees. Extensive experiments verify the theoretical investigations and demonstrate our superiority on various real-world applications. |
Abstract:
Images captured from low-light scenes often suffer from severe degradations, including low visibility, color casts, intensive noises, etc. These factors not only degrade image qualities, but also affect the performance of downstream Low-Light Vision (LLV) applications. A variety of deep networks have been proposed to enhance the visual quality of low-light images. However, they mostly rely on significant architecture engineering and often suffer from the high computational burden. More importantly, it still lacks an efficient paradigm to uniformly handle various tasks in the LLV scenarios. To partially address the above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, that can address low-light enhancement task, and has the flexibility to handle other challenging downstream vision tasks. Specifically, we first establish a nested optimization formulation, together with an unrolling strategy, to explore underlying principles of a series of LLV tasks. Furthermore, we design a differentiable strategy to cooperatively search specific scene and task architectures for RUAS. Last but not least, we demonstrate how to apply RUAS for both low- and high-level LLV applications (e.g., enhancement, detection and segmentation). Extensive experiments verify the flexibility, effectiveness, and efficiency of RUAS. |
Abstract:
In recent years, there has been a growing interest in combining learnable modules with numerical optimization to solve low-level vision tasks. However, most existing approaches focus on designing specialized schemes to generate image/feature propagation. There is a lack of unified consideration to construct propagative modules, provide theoretical analysis tools, and design effective learning mechanisms. To mitigate the above issues, this paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC for short) principles with strong generalization for diverse optimization models. Specifically, by introducing a general energy minimization model and formulating its descent direction from different viewpoints (\textit{i.e.,} in a generative manner, based on the discriminative metric and with optimality-based correction), we construct three propagative modules to effectively solve the optimization models with flexible combinations. We design two control mechanisms that provide the non-trivial theoretical guarantees for both fully- and partially-defined optimization formulations. Under the support of theoretical guarantees, we can introduce diverse architecture augmentation strategies such as normalization and search to ensure stable propagation with convergence and seamlessly integrate the suitable modules into the propagation respectively. Extensive experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC. |
CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion.
Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, Xin Fan Abstract:
Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target / background detail part is pulled close to the infrared / visible source and pushed far away from the visible / infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.. |
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement.
Long Ma, Dian Jin, Nan An, Jinyuan Liu, Xin Fan, Zhongxuan Luo, Risheng Liu* Abstract:
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. The mainstream learning-based methods mainly acquire the enhanced model by learning the data distribution from the specific scenes, causing poor adaptability (even failure) when meeting real-world scenarios that have never been encountered before. The main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. To remedy this, we first explore relationships between diverse low-light scenes based on statistical analysis, i.e., the network parameters of the encoder trained in different data distributions are close. We introduce the bilevel paradigm to model the above latent correspondence from the perspective of hyperparameter optimization. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes (i.e., freezing the encoder in the adaptation and testing phases). Further, we define a reinforced bilevel learning framework to provide a meta-initialization for scene-specific decoder to further ameliorate visual quality. Moreover, to improve the practicability, we establish a Retinex-induced architecture with adaptive denoising and apply our built learning framework to acquire its parameters by using two training losses including supervised and unsupervised forms. Extensive experimental evaluations on multiple datasets verify our adaptability and competitive performance against existing state-of-the-art works. The code and datasets will be available at https://github.com/vis-opt-group/BL. |
Automated Learning for Deformable Medical Image Registration by Jointly Optimizing Network Architectures and Objective Functions.
Xin Fan, Zi Li, Ziyang Li, Xiaolin Wang, Risheng Liu, Zhongxuan Luo, Hao Huang Abstract:
Deformable image registration plays a critical role in various tasks of medical image analysis. A successful registration algorithm, either derived from conventional energy optimization or deep networks, requires tremendous efforts from computer experts to well design registration energy or to carefully tune network architectures with respect to medical data available for a given registration task/scenario. This paper proposes an automated learning registration algorithm (AutoReg) that cooperatively optimizes both architectures and their corresponding training objectives, enabling non-computer experts to conveniently find off-the-shelf registration algorithms for various registration scenarios. Specifically, we establish a triple-level framework to embrace the searching for both network architectures and objectives with a cooperating optimization. Extensive experiments on multiple volumetric datasets and various registration scenarios demonstrate that AutoReg can automatically learn an optimal deep registration network for given volumes and achieve stateof-the-art performance. The automatically learned network also improves computational efficiency over the mainstream UNet architecture from 0.558 to 0.270 seconds for a volume pair on the same configuration. |
Investigating intrinsic degradation factors by multi-branch aggregation for real-world underwater image enhancement.
Xinwei Xue, Zexuan Li, Long Ma, Qi Jia, Risheng Liu*, Xin Fan Abstract:
Recently, improving the visual quality of underwater images has received extensive attentions in both computer vision and ocean engineering fields. However, existing works mostly focus on directly learning clear images from degraded observations but without careful investigations on the intrinsic degradation factors, thus require mass training data and lack generalization ability. In this work, we propose a new method, named Multi-Branch Aggregation Network (termed as MBANet) to partially address the above issue. Specifically, by analyzing underwater degradation factors from the perspective of both color distortions and veil effects, MBANet first constructs a multi-branch multi-variable architecture to obtain one intermediate coarse result and two degraded factors. We then establish a physical model inspired process to fully utilize our estimated degraded factors and thus obtain the desired clear output images. A series of evaluations on multiple datasets show the superiority of our method against existing state-of-the-art approaches, both in execution speed and accuracy. Furthermore, we demonstrate that our MBANet can significantly improve the performance of salience object detection in the underwater environment. Latex Bibtex Citation:
@article{xue2022investigating, |
Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation.
Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, Xin Fan Abstract:
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation. Early efforts focus on boosting the performance for only one task, e.g., fusion or segmentation, making it hard to reach 'Best of Both Worlds'. To overcome this issue, in this paper, we propose a multi-interactive feature learning architecture for image fusion and segmentation, namely SegMiF, and exploit dual-task correlation to promote the performance of both tasks. The SegMiF is a cascade structure containing a fusion sub-network and a commonly used segmentation sub-network. By slickly bridging intermediate features between two components, the knowledge learned from the segmentation task can effectively assist the fusion task. Also, the benefited fusion network supports the segmentation one to perform more pretentiously. Besides, a hierarchical interactive attention block is established to ensure fine-grained mapping of all the vital information between two tasks so that the modality/semantic features can be fully mutual-interactive. In addition, a dynamic weight factor is introduced to automatically adjust the corresponding weights of each task, which can balance the interactive feature correspondence and break through the limitation of laborious tuning. Furthermore, we construct an innovative multi-wave binocular imaging system and collect a full-time multi-modality benchmark with 15 annotated pixel level categories for image fusion and segmentation. Extensive experiments on several public datasets and our benchmark demonstrate that the proposed method outputs visually appealing fused images and perform average 7.66% higher segmentation mIoU in the real-world scene than the state-of-the-art approaches. |
Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity.
Risheng Liu, Yaohua Liu, Wei Yao, Shangzhi Zeng, Jin Zhang Abstract:
Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower-Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method. |
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond.
Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, Risheng Liu* Abstract:
Recently, multi-modality scene perception tasks, e.g., image fusion and scene understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally and neglecting others, seldom investigating their underlying connections for joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge these tasks. Concretely, we firstly construct an image fusion module to fuse complementary characteristics and cascade dual task-related modules, including a discriminator for visual effects and a semantic network for feature measurement. We provide a bi-level perspective to formulate image fusion and follow-up downstream tasks. To incorporate distinct task-related responses for image fusion, we consider image fusion as a primary goal and dual modules as learnable constraints. Furthermore, we develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning. Extensive experiments demonstrate the superiority of our method, which not only produces visually pleasant fused results but also realizes significant promotion for detection and segmentation than the state-of-the-art approaches. |
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond.
Zengxi Zhang, Zhiying Jiang, Jinyuan Liu, Xin Fan, Risheng Liu* Abstract:
Underwater images suffer from light refraction and absorption, which impairs visibility and interferes the subsequent applications. Existing underwater image enhancement methods mainly focus on image quality improvement, ignoring the effect on practice. To balance the visual quality and application, we propose a heuristic normalizing flow for detection-driven underwater image enhancement, dubbed WaterFlow. Specifically, we first develop an invertible mapping to achieve the translation between the degraded image and its clear counterpart. Considering the differentiability and interpretability, we incorporate the heuristic prior into the data-driven mapping procedure, where the ambient light and medium transmission coefficient benefit credible generation. Furthermore, we introduce a detection perception module to transmit the implicit semantic guidance into the enhancement procedure, where the enhanced images hold more detection-favorable features and are able to promote the detection performance. Extensive experiments prove the superiority of our WaterFlow, against state-of-the-art methods quantitatively and qualitatively. |
Bilevel Generative Learning for Low-Light Vision.
Yingchi Liu, Zhu Liu, Long Ma, Jinyuan Liu, Xin Fan, Zhongxuan Luo, Risheng Liu* Abstract:
Recently, there has been a growing interest in constructing deep learning schemes for Low-Light Vision (LLV) in the field of computer vision. Existing techniques primarily focus on designing task-specific and data-dependent vision models on the standard RGB domain, which inherently contain latent data associations. In this study, we propose a generic low-light vision solution by introducing a generative block to convert data from the RAW to RGB domain. This novel approach connects diverse vision problems by explicitly depicting data generation, which is the first in the field. To precisely characterize the latent correspondence between the generative procedure and vision task, we establish a bilevel model with the parameters of generative block defined as the upper level and the parameters of low-level defined as the lower level. We further develop two types of learning strategies targeting different goals, namely low cost and high accuracy, to acquire a new bilevel generative learning paradigm. The generative blocks embrace a strong generalization ability in other low-light vision tasks through the bi-level optimization on enhancement task. Extensive experimental evaluations on three representative low-light vision tasks, namely enhancement, detection and segmentation, fully demonstrate the superiority of our proposed approach. The code will be available after acceptance of this work. |
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation.
Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, Risheng Liu* Abstract:
Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering from the inherent vulnerability of adversarial attacks, causing a significant decrease in accuracy. In this work, a perception-aware fusion framework is proposed to promote segmentation robustness in adversarial scenes. We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations. Based on these analyses, we propose a harmonized architecture search with a decomposition-based structure to balance standard accuracy and robustness. We also propose an adaptive learning strategy to improve the parameter robustness of image fusion, which can learn effective feature extraction under diverse adversarial perturbations. Thus, the goals of image fusion (\textit{i.e.,} extracting complementary features from source modalities and defending attack) can be realized from the perspectives of architectural and learning strategies. Extensive experimental results demonstrate that our scheme substantially enhances the robustness, with gains of 15.3\% mIOU of segmentation in the adversarial scene, compared with advanced competitors. |
Multi-Spectral Image Stitching via Spatial Graph Reasoning.
Zhiying Jiang, Zengxi Zhang, Jinyuan Liu, Xin Fan, Risheng Liu* Abstract:
Multi-spectral image stitching leverages the complementarity between infrared and visible images to generate a robust and reliable wide field-of-view (FOV) scene. The primary challenge of this task is to explore the relations between multi-spectral images for aligning and integrating multi-view scenes. Capitalizing on the strengths of Graph Convolutional Networks (GCNs) in modeling feature relationships, we propose a spatial graph reasoning based multi-spectral image stitching method that effectively distills the deformation and integration of multi-spectral images across different viewpoints. To accomplish this, we embed multi-scale complementary features from the same view position into a set of nodes. The correspondence across different views is learned through powerful dense feature embeddings, where both inter- and intra-correlations are developed to exploit cross-view matching and enhance inner feature disparity. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features, generating informative and reliable wide FOV scenes. Moreover, we release a challenging dataset named ChaMS, comprising both real-world and synthetic sets with significant parallax, providing a new option for comprehensive evaluation. Extensive experiments demonstrate that our method surpasses the state-of-the-arts. |
PEARL: Preprocessing Enhanced Adversarial Robust Learning of Image Deraining for Semantic Segmentation.
Xianghao Jiao, Yaohua Liu, Jiaxin Gao, Xinyuan Chu, Xin Fan, Risheng Liu* Abstract:
In light of the significant progress made in the development and application of semantic segmentation tasks, there has been increasing attention towards improving the robustness of segmentation models against natural degradation factors (e.g., rain streaks) or artificially attack factors (e.g., adversarial attack). Whereas, most existing methods are designed to address a single degradation factor and are tailored to specific application scenarios. In this work, we present the first attempt to improve the robustness of semantic segmentation tasks by simultaneously handling different types of degradation factors. Specifically, we introduce the Preprocessing Enhanced Adversarial Robust Learning (PEARL) framework based on the analysis of our proposed Naive Adversarial Training (NAT) framework. Our approach effectively handles both rain streaks and adversarial examples by transferring the robustness of the segmentation model to the image derain model. Furthermore, as opposed to the commonly used Negative Adversarial Attack (NAA), we design the Auxiliary Mirror Attack (AMA) to introduce positive information prior to the training of the PEARL framework, which improves defense capability and segmentation performance. Our extensive experiments and ablation studies based on different derain methods and segmentation models have demonstrated the significant performance improvement of PEARL with AMA in defense against various adversarial attacks and rain streaks while maintaining high generalization performance across different datasets. |
Fearless Luminance Adaptation: a Macro-Micro-Hierarchical Transformer for Exposure Correction.
Gehui Li, Jinyuan Liu, Long Ma, Zhiying Jiang, Xin Fan, Risheng Liu* Abstract:
Photographs taken with less-than-ideal exposure settings often display poor visual quality. Since the correction procedures vary significantly, it is difficult for a single neural network to handle all exposure problems. Moreover, the inherent limitations of convolutions, hinder the models ability to restore faithful color or details on extremely over-/under- exposed regions. To overcome these limitations, we propose a Macro-Micro-Hierarchical transformer, which consists of a macro attention to capture long-range dependencies, a micro attention to extract local features, and a hierarchical structure for coarse-to-fine correction. In specific, the complementary macro-micro attention designs enhance locality while allowing global interactions. The hierarchical structure enables the network to correct exposure errors of different scales layer by layer. Furthermore, we propose a contrast constraint and couple it seamlessly in the loss function, where the corrected image is pulled towards the positive sample and pushed away from the dynamically generated negative samples. Thus the remaining color distortion and loss of detail can be removed. We also extend our method as an image enhancer for low-light face recognition and low-light semantic segmentation. Experiments demonstrate that our approach obtains more attractive results than state-of-the-art methods quantitatively and qualitatively. |
2022 |
Abstract:
Underwater images suffer from severe distortion,which degrades the accuracy of object detection performed in an underwater environment. Existing underwater image enhancement algorithms focus on the restoration of contrast and scene reflection. In practice, the enhanced images may not benefit the effectiveness of detection and even lead to a severe performance drop. In this paper, we propose an object-guided twin adversarial contrastive learning based underwater enhancement method to achieve both visual-friendly and task-orientated enhancement. Concretely, we first develop a bilateral constrained closed-loop adversarial enhancement module, which eases the requirement of paired data with the unsupervised manner and preserves more informative features by coupling with the twin inverse mapping. In addition, to confer the restored images with a more realistic appearance, we also adopt the contrastive cues in the training phase. To narrow the gap between visually-oriented and detection-favorable target images, a task-aware feedback module is embedded in the enhancement process, where the coherent gradient information of the detector is incorporated to guide the enhancement towards the detection-pleasing direction. To validate the performance, we allocate a series of prolific detectors into our framework. Extensive experiments demonstrate that the enhanced results of our method show remarkable amelioration in visual quality, the accuracy of different detectors conducted on our enhanced images has been promoted notably. Moreover, we also conduct a study on semantic segmentation to illustrate how object guidance improves high-level tasks. |
Abstract:
Due to the refraction and absorption of light bywater, underwater images usually suffer from severe degradation,such as color cast, hazy blur, and low visibility, which woulddegrade the effectiveness of marine applications equipped onautonomous underwater vehicles. To eliminate the degradationof underwater images, we propose a target oriented perceptualadversarial fusion network, dubbed TOPAL. Concretely, weconsider the degradation factors of underwater images in termsof turbidity and chromatism. And according to the degradationissues, we first develop a multi-scale dense boosted moduleto strengthen the visual contrast and a deep aesthetic rendermodule to perform the color correction, respectively. After that,we employ the dual channel-wise attention module and guidethe adaptive fusion of latent features, in which both diversedetails and credible appearance are integrated. To bridge thegap between synthetic and real-world images, a global-local ad-versarial mechanism is introduced in the reconstruction. Besides,perceptual information is also embedded into the process toassist the understanding of scenery content. To evaluate theperformance of TOPAL, we conduct extensive experiments onseveral benchmarks and make comparisons among state-of-the-art methods. Quantitative and qualitative results demonstratethat our TOPAL improves the quality of underwater imagesgreatly and achieves superior performance than others. Latex Bibtex Citation:
@article{jiang2022target, |
Hierarchical domain adaptation with local feature patterns.
Jun Wen, Junsong Yuan, Qian Zheng, Risheng Liu, Zhefeng Gong, Nenggan Zheng Abstract:
Domain adaptation is proposed to generalize learning machines and address performance degradation of models that are trained from one specific source domain but applied to novel target domains. Exist- ing domain adaptation methods focus on transferring holistic features whose discriminability is generally tailored to be source-specific and inferiorly generic to be transferable. As a result, standard domain adap- tation on holistic features usually damages feature structures, especially local feature statistics, and dete- riorates the learned discriminability. To alleviate this issue, we propose to transfer primitive local feature patterns, whose discriminability are shown to be inherently more sharable, and perform hierarchical fea- ture adaptation. Concretely, we first learn a cluster of domain-shared local feature patterns and partition the feature space into cells. Local features are adaptively aggregated inside each cell to obtain cell fea- tures, which are further integrated into holistic features. To achieve fine-grained adaptations, we simulta- neously perform alignment on local features, cell features and holistic features, within which process the local and cell features are aligned independently inside each cell to maintain the learned local structures and prevent negative transfer. Experimenting on typical one-to-one unsupervised domain adaptation for both image classification and action recognition tasks, partial domain adaptation, and domain-agnostic adaptation, we show that the proposed method achieves more reliable feature transfer by consistently outperforming state-of-the-art models and the learned domain-invariant features generalize well to novel domains. Latex Bibtex Citation:
@article{wen2022hierarchical, |
Abstract:
Low-light image enhancement aims to improve the quality of images captured under low-lightening conditions, which is a fundamental problem in computer vision and multimedia areas. Although many efforts have been invested over the years, existing illumination-based models tend to generate unnatural-looking results (e.g., over-exposure). It is because that the widely-adopted illumination adjustment (e.g., Gamma Correction) breaks down the favorable smoothness property of the original illumination derived from the well-designed illumination estimation model. To settle this issue, a great-efficiency and high-quality Self-Reinforced Retinex Projection (SRRP) model is developed in this paper, which contains optimization modules of both illumination and reflectance layers. Specifically, we construct a new fidelity term with the self-reinforced function for the illumination optimization to eliminate the dependence of the illumination adjustment to obtain a desired illumination with the excellent smoothing property. By introducing a flexible feasible constraint, we obtain a reflectance optimization module with projection. Owing to its flexibility, we can extend our model to an enhanced version by integrating a data-driven denoising mechanism as the projection, which is able to effectively handle the generated noises/artifacts in the enhanced procedure. In the experimental part, on one side, we make ample comparative assessments on multiple benchmarks with considerable state-ofthe- art methods. These evaluations fully verify the outstanding performance of our method, in terms of the qualitative and quantitative analyses and execution efficiency. On the other side, we also conduct extensive analytical experiments to indicate the effectiveness and advantages of our proposed model. Code is available at https://github.com/LongMa319/SRRP. |
Attention-guided Global-local Adversarial Learning for Detail-preserving Multi-exposure Image Fusion.
Jinyuan Liu, Jingjie Shang, Risheng Liu, Xin Fan Abstract:
Deep learning networks have recently demonstrated yielded impressive progress for multi-exposure image fusion. However, how to restore realistic texture details while correcting color distortion is still a challenging problem to be solved. To alleviate the aforementioned issues, in this paper, we propose an attention-guided global-local adversarial learning network for fusing extreme exposure images in a coarse-to-fine manner. Firstly, the coarse fusion result is generated under the guidance of attention weight maps, which acquires the essential region of interest from both sides. Secondly, we formulate an edge loss function, along with a spatial feature transform layer, for refining the fusion process. So that it can take full use of the edge information to deal with blurry edges. Moreover, by incorporating global-local learning, our method can balance pixel intensity distribution and correct the color distortion on spatially varying source images from both image/patch perspectives. Such a global-local discriminator ensures all the local patches of the fused images align with realistic normal-exposure ones. Extensive experimental results on two publicly available datasets show that our method drastically outperforms state-of-the-art methods in visual inspection and objective analysis. Furthermore, sufficient ablation experiments prove that our method has significant advantages in generating high-quality fused results with appealing details, clear targets, and faithful color. Latex Bibtex Citation:
@article{liu2022attention, |
Towards All Weather and Unobstructed Multi-Spectral
Image Stitching: Algorithm and Benchmark.
Zhiying Jiang, Zengxi Zhang, Xin Fan, Risheng Liu* Abstract:
Image stitching is a fundamental task that requires multiple images from different viewpoints to generate a wide field-of-viewing (FOV) scene. Previous methods are developed on RGB images. However, the severe weather and harsh conditions, such as rain, fog, low light, strong light, etc., on visible images may introduce evident interference, leading to the distortion and misalignment of the stitched results. To remedy the deficient imaging of optical sensors, we investigate the complementarity across infrared and visible images to improve the perception of scenes in terms of visual information and viewing ranges. Instead of the cascaded fusion-stitching process, where the inaccuracy accumulation caused by image fusion hinders the stitch performance, especially content loss and ghosting effect, we develop a learnable feature adaptive network to investigate a stitch-oriented feature representation and perform the information complementary at the feature-level. By introducing a pyramidal structure along with the global fast correlation regression, the quadrature attention based correspondence is more responsible for feature alignment, and the estimation of sparse offsets can be realized in a coarse-to-fine manner. Furthermore, we propose the first infrared and visible image based multi-spectral image stitching dataset, covering a more comprehensive range of scenarios and diverse viewing baselines. Extensive experiments on real-world data demonstrate that our method reconstructs the wide FOV images with more credible structure and complementary information against state-of-the-arts. |
PIA: Parallel Architecture with Illumination Allocator for
Joint Enhancement and Detection in Low-Light.
Tengyu Ma, Long Ma, Xin Fan, Zhongxuan Luo, Risheng Liu* Abstract:
Visual perception in low-light conditions (e.g., nighttime) plays an important role in various multimedia-related applications (e.g., autonomous driving). The enhancement (provides a visual-friendly appearance) and detection (detects the instances of objects) in low-light are two fundamental and crucial visual perception tasks. In this paper, we make efforts on how to simultaneously realize low-light enhancement and detection from two aspects. First, we define a parallel architecture to satisfy the task demand for both two tasks. In which, a decomposition-type warm-start acting on the entrance of parallel architecture is developed to narrow down the adverse effects brought by low-light scenes to some extent. Second, a novel illumination allocator is designed by encoding the key illumination component (the inherent difference between normal-light and low-light) to extract hierarchical features for assisting in enhancement and detection. Further, we make a substantive discussion for our proposed method. That is, we solve enhancement in a coarse-to-fine manner and handle detection in a decomposed-to-integrated fashion. Finally, multidimensional analytical and evaluated experiments are performed to indicate our effectiveness and superiority. Code and results will be public if this paper can be accepted. |
Best of Both Worlds: See and Understand Clearly in the Dark.
Xinwei Xue, Jia He, Long Ma, Yi Wang, Xin Fan, Risheng Liu Abstract:
Recently, with the development of intelligent technology, the perception of low-light scenes has been gaining widespread attention. However, existing techniques usually focus on only one task (e.g., enhancement) and lose sight of the others (e.g., detection), making it difficult to perform all of them well at the same time. To overcome this limitation, we propose a new method that can handle visual quality enhancement and semantic-related tasks (e.g., detection, segmentation) simultaneously in a unified framework. Specifically, we build a cascaded architecture to meet the task requirements. To better enhance the entanglement in both tasks and achieve mutual guidance, we develop a new contrastive-alternative learning strategy for learning the model parameters, to largely improve the representational capacity of the cascaded architecture. Notably, the contrastive learning mechanism establishes the communication between two objective tasks in essence, which actually extends the capability of contrastive learning to some extent. Finally, extensive experiments are performed to fully validate the advantages of our method over other state-of-the-art works in enhancement, detection, and segmentation. A series of analytical evaluations are also conducted to reveal our effectiveness. The code will be available after this work is accepted. |
Abstract:
Recently, Optimization-Derived Learning (ODL) has attracted attention from learning and vision areas, which designs learning models from the perspective of optimization. However, previous ODL approaches regard the training and hyper-training procedures as two separated stages, meaning that the learnable parameters have to be fixed during the training process, and thus simultaneously obtaining the convergence on training variables and learnable parameters is also impossible. In this work, we design a Generalized Krasnoselskii- Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module, which unifies existing ODL methods as special cases. Under GKM scheme, the Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal iterative variables for training and learnable parameters for hyper-training together. We rigorously prove the joint convergence of fixed-point iteration and learning processes, both on the approximation quality, and on the stationary analysis. Experiments demonstrate the efficiency of BMO with competitive performance on sparse coding and real-world applications such as image deconvolution and rain streak removal. |
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration.
Di Wang, Jinyuan Liu, Xin Fan, Risheng Liu* Abstract:
Recent learning-based image fusion methods have marked numerous progress in pre-registered multimodality data, but suffered serious ghosts dealing with misaligned multi-modality data, due to the spatial deformation and the difficulty narrowing cross-modality discrepancy. To overcome the obstacles, in this paper, we present a robust crossmodality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion (IVIF). Specifically, we propose a Crossmodality Perceptual Style Transfer Network (CPSTN) to generate a pseudo infrared image taking a visible image as input. Benefiting from the favorable geometry preservation ability of the CPSTN, the generated pseudo infrared image embraces a sharp structure, which is more conducive to transforming cross-modality image alignment into mono-modality registration coupled with the structure-sensitive of the infrared image. In this case, we introduce a Multi-level Refinement Registration Network (MRRN) to predict the displacement vector field between distorted and pseudo infrared images and reconstruct registered infrared image under the mono-modality setting. Moreover, to better fuse the registered infrared image and visible images, we present a feature Interaction Fusion Module (IFM) to adaptively select more meaningful features for fusion in the Dual-path Interaction Fusion Network (DIFN). Extensive experimental results suggest that the proposed method performs superior capability on misaligned crossmodality image fusion. |
Hierarchical Bilevel Learning with Architecture and Loss Search for
Hadamard-based Image Restoration.
Guijing Zhu, Long Ma, Xin Fan, Risheng Liu* Abstract:
In the past few decades, Hadamard-based image restoration problems (e.g., low-light image enhancement) attract wide concerns in multiple areas related to artificial intelligence. However, existing works mostly focus on heuristically defining efficient architecture and loss by the engineering experiences that came from extensive practices. This way brings about expensive verification costs for seeking out the optimal solution. To this end, we develop a novel hierarchical bilevel learning scheme to discover the architecture and loss simultaneously towards different Hadamard-based image restoration tasks. More concretely, we first establish a new Hadamard-inspired neural unit to aggregate domain knowledge into the network design. Then we model a triple-level optimization that consists of the architecture, loss and parameters optimizations, to deliver a macro perspective for network learning. Then we introduce a new hierarchical bilevel learning scheme for solving the built triple-level model to progressively generate the desired architecture and loss. We also define an architecture search space consists of a series of simple operations and an image quality-oriented loss search space. Extensive experiments on three Hadamard-based image restoration tasks (including low-light image enhancement, image dehazing, and underwater image enhancement) fully verify our superiority against other state-of-the-art methods. |
Toward Fast, Flexible, and Robust Low-Light Image Enhancement.
Long Ma, Tengyu Ma, Risheng Liu*, Xin Fan, Zhongxuan Luo Abstract:
Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios. In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. To be specific, we establish a cascaded illumination learning process with weight sharing to handle this task. Considering the computational burden of the cascaded pattern, we construct the selfcalibrated module which realizes the convergence between results of each stage, producing the gains that only use the single basic block for inference (yet has not been exploited in previous works), which drastically diminishes computation cost. We then define the unsupervised training loss to elevate the model capability that can adapt general scenes. Further, we make comprehensive explorations to excavate SCI' s inherent properties (lacking in existing works) including operation-insensitive adaptability (acquiring stable performance under the settings of different simple operations) and model-irrelevant generality (can be applied to illumination-based existing works to improve performance). Finally, plenty of experiments and ablation studies fully indicate our superiority in both quality and efficiency. Applications on low-light face detection and nighttime semantic segmentation fully reveal the latent practical values for SCI. The source code will be made publicly available. |
Abstract:
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also averagely 10:9% higher detection mAP than the stateof- the-art approaches on various challenging scenarios. |
Abstract:
To overcome the overfitting issue of dehazing models trained on synthetic hazy-clean image pairs, many recent methods attempted to improve models?? generalization ability by training on unpaired data. Most of them simply formulate dehazing and rehazing cycles, yet ignore the physical properties of the real-world hazy environment, i.e. the haze varies with density and depth. In this paper, we propose a self-augmented image dehazing framework, termed D4 (Dehazing via Decomposing transmission map into Density and Depth) for haze generation and removal. Instead of merely estimating transmission maps or clean content, the proposed framework focuses on exploring scattering coefficient and depth information contained in hazy and clean images. With estimated scene depth, our method is capable of re-rendering hazy images with different thicknesses which further benefits the training of the dehazing network. It is worth noting that the whole training process needs only unpaired hazy and clean images, yet succeeded in recovering the scattering coefficient, depth map and clean content from a single hazy image. Comprehensive experiments demonstrate our method outperforms state-ofthe- art unpaired dehazing methods with much fewer parameters and FLOPs. Our code will be made publicly available. |
Abstract:
It is challenging to accurately detect camouflaged objects from their highly similar surroundings. Existing methods mainly leverage a single-stage detection fashion, while neglecting small objects with low-resolution fine edges requires more operations than the larger ones. To tackle camouflaged object detection (COD), we are inspired by humans attention and the coarse-to-fine detection strategy, and thereby propose an iterative refinement framework, coined SegMaR, which integrates Segment, Magnify and Reiterate in a multi-stage detection fashion. Specifically, we design a new discriminative mask which makes the model attend on the fixation and edge regions. In addition, we leverage an attention-based sampler to magnify the object region progressively with no need of enlarging the image size. Extensive experiments show our SegMaR achieves remarkable and consistent improvements over other state-ofthe- art methods. Our performance on small objects surpasses two competitive methods by 7.4% and 20.0% respectively in average over standard evaluation metrics. Code is available at supplementary material. |
ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion.
Zhanbo Huang, Jinyuan Liu, Xin Fan, Risheng Liu, Wei Zhong, Zhongxuan Luo Abstract:
Recent advances in deep networks have gained great attention in infrared and visible image fusion (IVIF). Nevertheless, most existing methods are incapable of dealing with slight misalignment on source images and suffer from high computational and spatial expenses. This paper tackles these two critical issues rarely touched in the community by developing a recurrent correction network for robust and efficient fusion, namely ReCoNet. Concretely, we design a deformation module to explicitly compensate geometrical distortions and an attention mechanism to mitigate ghosting-like artifacts, respectively. Meanwhile, the network consists of a parallel dilated convolutional layer and runs in a recurrent fashion, significantly reducing both spatial and computational complexities. ReCoNet can effectively and efficiently alleviates both structural distortions and textural artifacts brought by slight misalignment. Extensive experiments on two public datasets demonstrate the superior accuracy and efficacy of our ReCoNet against the state-of-the-art IVIF methods. Consequently, we obtain a 16% relative improvement of CC on datasets with misalignment and boost the efficiency by 86%.. |
Semantic-aware Texture-Structure Feature Collaboration for
Underwater Image Enhancement.
Di Wang, Long Ma, Risheng Liu, Xin Fan Abstract:
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics. However, the limited number of datasets and imperfect hand-crafted ground truth weaken its robustness to unseen scenarios, and hamper the application to high-level vision tasks. To address the above limitations, we develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model, aiming to exploit its hierarchical feature representation as an auxiliary for the low-level underwater image enhancement. Specifically, we tend to characterize the shallow layer features as textures while the deep layer features as structures in the semantic-aware model, and propose a multi-path Contextual Feature Refinement Module (CFRM) to refine features in multiple scales and model the correlation between different features. In addition, a feature dominative network is devised to perform channel-wise modulation on the aggregated texture and structure features for the adaptation to different feature patterns of the enhancement network. Extensive experiments on benchmarks demonstrate that the proposed algorithm achieves more appealing results and outperforms state-of-the-art methods by large margins. We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks. |
2021 |
A General Descent Aggregation Framework for Gradient-based Bi-level Optimization.
Risheng Liu, Pan Mu, Xiaoming Yuan, Shangzhi Zeng, Jin Zhang Abstract:
In recent years, a variety of gradient-based methods have been developed to solve Bi-Level Optimization (BLO) problems in machine learning and computer vision areas. However, the theoretical correctness and practical effectiveness of these existing approaches always rely on some restrictive conditions (e.g., Lower-Level Singleton, LLS), which could hardly be satisfied in real-world applications. Moreover, previous literature only proves theoretical results based on their specific iteration strategies, thus lack a general recipe to uniformly analyze the convergence behaviors of different gradient-based BLOs. In this work, we formulate BLOs from an optimistic bi-level viewpoint and establish a new gradient-based algorithmic framework, named Bi-level Descent Aggregation (BDA), to partially address the above issues. Specifically, BDA provides a modularized structure to hierarchically aggregate both the upper- and lower-level subproblems to generate our bi-level iterative dynamics. Theoretically, we establish a general convergence analysis template and derive a new proof recipe to investigate the essential theoretical properties of gradient-based BLO methods. Furthermore, this work systematically explores the convergence behavior of BDA in different optimization scenarios, i.e., considering various solution qualities (i.e., global/local/stationary solution) returned from solving approximation subproblems. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks. Latex Bibtex Citation:
@article{liu2021generic, |
Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond.
Risheng Liu, Jiaxin Gao, Jin Zhang, Deyu Meng, Zhouchen Lin Abstract:
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. Latex Bibtex Citation:
@article{liu2021investigating, |
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond.
Risheng Liu, Zi Li, Xin Fan, Chenying Zhao, Hao Huang, Zhongxuan Luo Abstract:
Conventional deformable registration methods aim at solving an optimization model carefully designed on image pairs and their computational costs are exceptionally high. In contrast, recent deep learning-based approaches can provide fast deformation estimation. These heuristic network architectures are fully data-driven and thus lack explicit geometric constraints which are indispensable to generate plausible deformations, e.g., topology-preserving. Moreover, these learning-based approaches typically pose hyper-parameter learning as a black-box problem and require considerable computational and human effort to perform many training runs. To tackle the aforementioned problems, we propose a new learning-based framework to optimize a diffeomorphic model via multi-scale propagation. Specifically, we introduce a generic optimization model to formulate diffeomorphic registration and develop a series of learnable architectures to obtain propagative updating in the coarse-to-fine feature space. Further, we propose a new bilevel self-tuned training strategy, allowing efficient search of task-specific hyper-parameters. This training strategy increases the flexibility to various types of data while reduces computational and human burdens. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data. Extensive results demonstrate the state-of-the-art performance of the proposed method with diffeomorphic guarantee and extreme efficiency. We also apply our framework to challenging multi-modal image registration, and investigate how our registration to support the down-streaming tasks for medical image analysis including multi-modal fusion and image segmentation. Latex Bibtex Citation:
@article{liu2021learning, |
Task-Oriented Convex Bilevel Optimization with Latent Feasibility.
Risheng Liu, Long Ma, Xiaoming Yuan, Shangzhi Zeng, Jin Zhang
Abstract:
This paper firstly proposes a convex bilevel optimization paradigm to formulate and optimize popular learning and vision problems in real-world scenarios. Different from conventional approaches, which directly design their iteration schemes based on given problem formulation, we introduce a task-oriented energy as our latent constraint which integrates richer task information. By explicitly re-characterizing the feasibility, we establish an efficient and flexible algorithmic framework to tackle convex models with both shrunken solution space and powerful auxiliary (based on domain knowledge and data distribution of the task). In theory, we present the convergence analysis of our latent feasibility re-characterization based numerical strategy. We also analyze the stability of the theoretical convergence under computational error perturbation. Extensive numerical experiments are conducted to verify our theoretical findings and evaluate the practical performance of our method on different applications. Latex Bibtex Citation:
|
Abstract:
Video deraining is an important issue for outdoor vision systems and has been investigated extensively. However, designing optimal architectures by the aggregating model formation and data distribution is a challenging task for video deraining. In this paper, we develop a model-guided triplelevel optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism, named Triple-level Model Inferred Cooperating Searching (TMICS), for dealing with various video rain circumstances. In particular, to mitigate the problem that existing methods cannot cover various rain streaks distribution, we first design a hyper-parameter optimization model about task variable and hyper-parameter. Based on the proposed optimization model, we design a collaborative structure for video deraining. This structure includes Dominant Network Architecture (DNA) and Companionate Network Architecture (CNA) that is cooperated by introducing an Attention-based Averaging Scheme (AAS). To better explore inter-frame information from videos, we introduce a macroscopic structure searching scheme that searches from Optical Flow Module (OFM) and Temporal Grouping Module (TGM) to help restore latent frame. In addition, we apply the differentiable neural architecture searching from a compact candidate set of task-specific operations to discover desirable rain streaks removal architectures automatically. Extensive experiments on various datasets demonstrate that our model shows significant improvements in fidelity and temporal consistency over the state-of-the-art works. Latex Bibtex Citation:
@article{mu2021triple, |
Investigating Customization Strategies and Convergence Behaviors of Task-Specific ADMM.
Risheng Liu, Pan Mu, Jin Zhang
Abstract:
Alternating Direction Method of Multiplier (ADMM) has been a popular algorithmic framework forseparable optimization problems with linear constraints. Fornumerical ADMM fail to exploit the particular structure of theproblem at hand nor the input data information, leveraging task-specific modules (e.g., neural networks and other data-drivenarchitectures) to extend ADMM is a significant but challengingtask. This work focuses on designing a flexible algorithmic framework to incorporate various task-specific modules (withno additional constraints) to improve the performance of ADMM in real-world applications. Specifically, we proposeGuidance from Optimality (GO), anew customization strategy,to embed task-specific modules into ADMM (GO-ADMM).By introducing an optimality-based criterion to guide thepropagation, GO-ADMM establishes anupdating schemeagnostic to the choice of additional modules. The existing task-specific methods just plug their task-specific modules intothe numerical iterations in a straightforward manner. Evenwith some restrictive constraints on the plug-in modules, theycan only obtain some relatively weaker convergence propertiesfor the resulted ADMM iterations. Fortunately, without anyrestrictions on the embedded modules, we prove the convergenceof GO-ADMM regarding objective values and constraintviolations, and derive the worst-case convergence rate measuredby iteration complexity. Extensive experiments are conducted toverify the theoretical results and demonstrate the efficiency of GO-ADMM.. Latex Bibtex Citation:
@article{liu2021investigating, |
Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation.
Risheng Liu, Long Ma, Jiaao Zhang, Xin Fan, Zhongxuan Luo Abstract:
Enhancing visual qualities for underexposed imagesis an extensively concerned task that plays important roles invarious areas of multimedia and computer vision. Most existingmethods often fail to generate high-quality results with appro-priate luminance and abundant details. To address these issues,we in this work develop a novel framework, integrating bothknowledge from physical principles and implicit distributionsfrom data to solve the underexposed image correction task.More concretely, we propose a new perspective to formulatethis task as an energy-inspired model with advanced hybridpriors. A propagation procedure navigated by the hybrid priorsis well designed for simultaneously propagating the reflectanceand illumination toward desired results. We conduct extensiveexperiments to verify the necessity of integrating both underlyingprinciples (i.e., with knowledge) and distributions (i.e., from data)as navigated deep propagation. Plenty of experimental results ofunderexposed image correction demonstrate that our proposedmethod performs favorably against the state-of-the-art methodson both subjective and objective assessments. Additionally, weexecute the task of face detection to further verify the naturalnessand practical value of underexposed image correction. What'smore, we employ our method to single image haze removal whoseexperimental results further demonstrate its superiorities. Latex Bibtex Citation:
@article{liu2021underexposed, |
Abstract:
Enhancing the quality of low-light (LOL) imagesplays a very important role in many image processing andmultimedia applications. In recent years, a variety of deep learn-ing techniques have been developed to address this challengingtask. A typical framework is tosimultaneously estimate theillumination and reflectance, but they disregard the scene-levelcontextual information encapsulated in feature spaces, causingmany unfavorable outcomes, e.g., details loss, color unsatura-tion, and artifacts. To address these issues, we develop a newcontext-sensitive decomposition network (CSDNet) architectureto exploit the scene-level contextual dependencies on spatialscales. More concretely, we build a two-stream estimation mech-anism including reflectance and illumination estimation network.We design a novel context-sensitive decomposition connection tobridge the two-stream mechanism by incorporating the phys-ical principle. The spatially varying illumination guidance isfurther constructed for achieving the edge-aware smoothnessproperty of the illumination component. According to differenttraining patterns, we construct CSDNet (paired supervision) andcontext-sensitive decomposition generative adversarial network(CSDGAN) (unpaired supervision) to fully evaluate our designedarchitecture. We test our method on seven testing benchmarks[including massachusetts institute of technology (MIT)-AdobeFiveK, LOL, ExDark, and naturalness preserved enhancement(NPE)] to conduct plenty of analytical and evaluated experiments.Thanks to our designed context-sensitive decomposition connec-tion, we successfully realized excellent enhanced results (withsufficient details, vivid colors, and few noises), which fully indi-cates our superiority against existing state-of-the-art approaches.Finally, considering the practical needs for high efficiency,we develop a lightweight CSDNet (named LiteCSDNet) by reduc-ing the number of channels. Furthermore, by sharing an encoderfor these two components, we obtain a more lightweight version(SLiteCSDNet for short). SLiteCSDNet just contains 0.0301Mparameters but achieves the almost same performance as CSD-Net.Code is available at https://github.com/KarelZhang/CSDNet-CSDGAN. Latex Bibtex Citation:
@article{ma2021learning, |
Learning a Deep Multi-scale Feature Ensemble and an Edge-attention Guidance for Image Fusion.
Jinyuan Liu, Xin Fan, Ji Jiang, Risheng Liu, Zhongxuan Luo Abstract:
Image fusion integrates a series of images acquiredfrom different sensors,e.g., infrared and visible, outputtingan image with richer information than either one. Traditionaland recent deep-based methods have difficulties in preservingprominent structures and recovering vital textural details forpractical applications. In this paper, we propose a deep networkfor infrared and visible image fusion cascading a feature learningmodule with a fusion learning mechanism. Firstly, we applya coarse-to-fine deep architecture to learn multi-scale featuresfor multi-modal images, which enables discovering prominentcommon structures for later fusion operations. The proposedfeature learning module requires no well-aligned image pairs fortraining. Compared with the existing learning-based methods,the proposed feature learning module can ensemble numerousexamples from respective modals for training, increasing theability of feature representation. Secondly, we design an edge-guided attention mechanism upon the multi-scale features toguide the fusion focusing on common structures, thus recoveringdetails while attenuating noise. Moreover, we provide a newaligned infrared and visible image fusion dataset, RealStreet,collected in various practical scenarios for comprehensive eval-uation. Extensive experiments on two benchmarks, TNO andRealStreet, demonstrate the superiority of the proposed methodover the state-of-the-art in terms of both visual inspection andobjective analysis on six evaluation metrics. We also conduct theexperiments on the FLIR and NIR datasets, containing foggyweather and poor light conditions, to verify the generalizationand robustness of the proposed method. Latex Bibtex Citation:
@article{liu2021learning, |
A Convergent Framework with Learnable Feasibility for Hadamard-based Image Recovery.
Yiyang Wang, Long Ma, Risheng Liu Abstract:
In this paper, we propose a framework for recovering image degradations that can be formulated by theHadamard product of clear images with degradation factors. By training the mapping from datasets, weshow that implicit feasibilities can be learned in forms of latent domains. Then with the feasibilities andacknowledged data priors, the recovery problems are formulated as a general optimization model in whichthe domain knowledge of degradations are also nicely involved. Then we solve the model based on the classicalcoordinate update with plugged-in networks so that all the variables can be well estimated. Even better,our updating scheme is designed under the guidance of theoretical analyses, thus its stability can always beguaranteed in practice. We show that different recovery problems can be solved under our unified framework,and the extensive experimental results verify that the proposed framework is superior to state-of-the-artmethods in both benchmark datasets and real-world images. Latex Bibtex Citation:
@article{wang2021convergent, |
Abstract:
Low-light image enhancement plays very important rolesin low-level vision areas. Recent works have built a greatdeal of deep learning models to address this task. Howev-er, these approaches mostly rely on significant architectureengineering and suffer from high computational burden.In this paper, we propose a new method, named Retinex-inspired Unrolling with Architecture Search (RUAS), to con-struct lightweight yet effective enhancement network forlow-light images in real-world scenario. Specifically, build-ing upon Retinex rule, RUAS first establishes models tocharacterize the intrinsic underexposed structure of low-light images and unroll their optimization processes to con-struct our holistic propagation structure. Then by design-ing a cooperative reference-free learning strategy to dis-cover low-light prior architectures from a compact searchspace, RUAS is able to obtain a top-performing image en-hancement network, which is with fast speed and requiresfew computational resources. Extensive experiments veri-fy the superiority of our RUAS framework against recent-ly proposed state-of-the-art methods. The project page isavailable at http://dutmedia.org/RUAS/. Latex Bibtex Citation:
@inproceedings{liu2021retinex, |
Bridging the Gap between Low-Light Scenes: Bilevel Learning for Fast Adaptation.
Dian Jin, Long Ma, Risheng Liu*, Xin Fan
Abstract:
Brightening low-light images of diverse scenes is a challenging but widely concerned task in the multimedia community. Convolutional Neural Networks (CNNs) based approaches mostly acquire the enhanced model by learning the data distribution from the specific scenes. However, these works present poor adaptability (even fail) when meeting real-world scenarios that never encountered before. To conquer it, we develop a novel bilevel learning scheme for fast adaptation to bridge the gap between low-light scenes. Concretely, we construct a Retinex-induced encoder-decoder with an adaptive denoising mechanism, aiming at covering more practical cases. Different from existing works that directly learn model parameters by using the massive data, we provide a new hyperparameter optimization perspective to formulate a bilevel learning scheme towards general low-light scenarios. This scheme depicts the latent correspondence (i.e., scene-irrelevant encoder) and the respective characteristic (i.e., scene-specific decoder) among different data distributions. Due to the expensive inner optimization, estimating the hyper-parameter gradient exactly can be prohibitive, we develop an approximate hyper-parameter gradient method by introducing the one-step forward approximation and finite difference approximation to ensure the high-efficient inference. Extensive experiments are conducted to reveal our superiority against other state-of-theart methods. A series of analytical experiments are also executed to verify our effectiveness. Latex Bibtex Citation:
@inproceedings{jin2021bridging, |
Abstract:
Multi-modality image fusion refers to generating a complementary image that integrates typical characteristics from source images. In recent years, we have witnessed the remarkable progress of deep learning models for multi-modality fusion. Existing CNN-based approaches strain every nerve to design various architectures for realizing these tasks in an end-to-end manner. However, these handcrafted designs are unable to cope with the high demanding fusion tasks, resulting in blurred targets and lost textural details. To alleviate these issues, in this paper, we propose a novel approach, aiming at searching effective architectures according to various modality principles and fusion mechanisms. Specifically, we construct a hierarchically aggregated fusion architecture to extract and refine fused features from feature-level and object-level fusion perspectives, which is responsible for obtaining complementary target/detail representations. Then by investigating diverse effective practices, we composite a more flexible fusion-specific search space. Motivated by the collaborative principle, we employ a new search strategy with different principled losses and hardware constraints for sufficient discovery of components. As a result, we can obtain a task-specific architecture with fast inference time. Extensive quantitative and qualitative results demonstrate the superiority and versatility of our method against state-of-the-art methods. Latex Bibtex Citation: @inproceedings{liu2021searching, |
Underwater Species Detection using Channel Sharpening Attention.
Lihao Jiang, Yi Wang, Qi Jia, Shengwei Xu. Yu Liu, Xin Fan, Haojie Li, Risheng Liu, Xinwei Xue, Ruili Wang
Abstract:
With the continuous exploration of marine resources, underwater artificial intelligent robots play an increasingly important role in the fish industry. However, the detection of underwater objects is a very challenging problem due to the irregular movement of underwater objects, the occlusion of sand and rocks, the diversity of water illumination, and the poor visibility and low color contrast in the underwater environment. In this article, we first propose a realworld underwater object detection dataset (UODD), which covers more than 3K images of the most common aquatic products. Then we propose Channel Sharpening Attention Module (CSAM) as a plug-and-play module to further fuse high-level image information, providing the network with the privilege of selecting feature maps. Fusion of original images through CSAM can improve the accuracy of detecting small and medium objects, thereby improving the overall detection accuracy. We also use Water-Net as a preprocessing method to remove the haze and color cast in complex underwater scenes, which shows a satisfactory detection result on small-sized objects. In addition, we use the class weighted loss as the training loss, which can accurately describe the relationship between classification and precision of bounding boxes of targets, and the loss function converges faster during the training process. Experimental results show that the proposed method reaches a maximum AP of 50.1%, outperforming other traditional and state-of-the-art detectors. In addition, our model only needs an average inference time of 25.4 ms per image, which is quite fast and might suit the real-time scenario. Latex Bibtex Citation:
@inproceedings{jiang2021underwater, |
Abstract:
Bi-level optimization model is able to capture a wide range of complex learning tasks with practical interest. Due to the witnessed efficiency in solving bi-level programs, gradient-based methods have gained popularity in the machine learning community. In this work, we propose a new gradient-based solution scheme, namely, the Bi-level Value-Function-based Interior-point Method (BVFIM). Following the main idea of the log-barrier interior-point scheme, we penalize the regularized value function of the lower level problem into the upper level objective. By further solving a sequence of differentiable unconstrained approximation problems, we consequently derive a sequential programming scheme. The numerical advantage of our scheme relies on the fact that, when gradient methods are applied to solve the approximation problem, we successfully avoid computing any expensive Hessian-vector or Jacobian-vector product. We prove the convergence without requiring any convexity assumption on either the upper level or the lower level objective. Experiments demonstrate the efficiency of the proposed BVFIM on non-convex bi-level problems. Latex Bibtex Citation:
@InProceedings{pmlr-v139-liu21o, |
Abstract:
In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of Non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC. Latex Bibtex Citation:
@article{liu2021towards, |
Abstract:
Meta-learning (a.k.a. learning to learn) has recently emerged as a promising paradigm for a variety of applications. There are now many meta-learning methods, each focusing on different modeling aspects of base and meta learners, but all can be (re)formulated as specific bilevel optimization problems. This work presents BOML, a modularized optimization library that unifies several meta-learning algorithms into a common bilevel optimization framework. It provides a hierarchical optimization pipeline together with a variety of iteration modules, which can be used to solve the mainstream categories of meta-learning methods, such as meta-feature-based and meta-initialization-based formulations. The library is written in Python and is available at https://github.com/dut-media-lab/BOML. Latex Bibtex Citation:
@inproceedings{liu2021boml, |
2020A Bilevel Integrated Model with Data-driven Layer Ensemble for Multi-modality Image Fusion.
Risheng Liu, Jinyuan Liu, Zhiying Jiang, Xin Fan, Zhongxuan Luo Abstract:
Image fusion plays a critical role in a variety ofvision and learning applications. Current fusion approaches aredesigned to characterize source images, focusing on a certaintype of fusion task while limited in a wide scenario. More-over, other fusion strategies (i.e., weighted averaging, choose-max) cannot undertake the challenging fusion tasks, whichfurthermore leads to undesirable artifacts facilely emerged intheir fused results. In this paper, we propose a generic imagefusion method with a bilevel optimization paradigm, targetingon multi-modality image fusion tasks. Corresponding alternationoptimization is conducted on certain components decoupled fromsource images. Via adaptive integration weight maps, we areable to get the flexible fusion strategy across multi-modalityimages. We successfully applied it to three types of image fusiontasks, including infrared and visible, computed tomography andmagnetic resonance imaging, and magnetic resonance imagingand single-photon emission computed tomography image fusion.Results highlight the performance and versatility of our approachfrom both quantitative and qualitative aspects. Latex Bibtex Citation:
@article{liu2020bilevel, |
A Deep Framework Assembling Principled Modules for CS-MRI: Unrolling Perspective, Convergence Behaviors, and Practical Modeling.
Risheng Liu, Yuxi Zhang, Shichao Cheng, Zhongxuan Luo, Xin Fan Abstract:
Compressed Sensing Magnetic ResonanceImaging (CS-MRI) significantly accelerates MR acquisitionat a sampling rate much lower than the Nyquist crite-rion. A major challenge for CS-MRI lies in solving theseverely ill-posed inverse problem to reconstruct aliasing-free MR images from the sparsek-space data. Conventionalmethods typically optimize an energy function, producingrestoration of high quality, but their iterative numericalsolvers unavoidably bring extremely large time consump-tion. Recent deep techniques provide fast restorationby either learning direct prediction to final reconstruc-tion or plugging learned modules into the energy optimizer.Nevertheless, these data-driven predictors cannot guar-antee the reconstruction following principled constraintsunderlying the domain knowledge so that the reliabilityof their reconstruction process is questionable. In thispaper, we propose a deep framework assembling principledmodules for CS-MRI that fuses learning strategy with theiterative solver of a conventional reconstruction energy.This framework embeds an optimal condition checkingmechanism, fosteringefficientandreliablereconstruction.We also apply the framework to three practical tasks,i.e., complex-valued data reconstruction, parallel imaging andreconstruction with Rician noise. Extensive experiments onboth benchmark and manufacturer-testing images demon-strate that the proposed method reliably converges to theoptimal solution more efficiently and accurately than thestate-of-the-art in various scenarios. Latex Bibtex Citation:
@article{liu2020deep, |
Location-aware and Regularization-adaptive Correlation Filters for Robust Visual Tracking.
Risheng Liu, Qianru Chen, Yuansheng Yao, Xin Fan, Zhongxuan Luo Abstract:
Correlation filter (CF) has recently been widely usedfor visual tracking. The estimation of the search window and thefilter-learning strategies is the key component of the CF trackers.Nevertheless, prevalent CF models separately address these issuesin heuristic manners. The commonly used CF models directlyset the estimated location in the previous frame as the searchcenter for the current one. Moreover, these models usually relyon simple and fixed regularization for filter learning, and thus,their performance is compromised by the search window size andoptimization heuristics. To break these limits, this article proposesa location-aware and regularization-adaptive CF (LRCF) forrobust visual tracking. LRCF establishes a novel bilevel opti-mization model to address simultaneously the location-estimationand filter-training problems. We prove that our bilevel for-mulation can successfully obtain a globally converged CF andthe corresponding object location in a collaborative manner.Moreover, based on the LRCF framework, we design two trackersnamed LRCF-S and LRCF-SA and a series of comparisons toprove the flexibility and effectiveness of the LRCF framework.Extensive experiments on different challenging benchmark datasets demonstrate that our LRCF trackers perform favorablyagainst the state-of-the-art methods in practice. Latex Bibtex Citation:
@article{liu2020location, |
Investigating Task-driven Latent Feasibility for Nonconvex Image Modeling.
Risheng Liu, Pan Mu, Jian Chen, Xin Fan, Zhongxuan Luo Abstract:
Properly modeling latent image distributions playsan important role in a variety of image-related vision problems.Most exiting approaches aim to formulate this problem asoptimization models (e.g., Maximum A Posterior, MAP) withhandcrafted priors. In recent years, different CNN modules arealso considered as deep priors to regularize the image model-ing process. However, these explicit regularization techniquesrequire deep understandings on the problem and elaboratelymathematical skills. In this work, we provide a new perspective,named Task-driven Latent Feasibility (TLF), to incorporatespecific task information to narrow down the solution spacefor the optimization-based image modeling problem. Thanks tothe flexibility of TLF, both designed and trained constraintscan be embedded into the optimization process. By introducingcontrol mechanisms based on the monotonicity and boundednessconditions, we can also strictly prove the convergence of ourproposed inference process. We demonstrate that different typesof image modeling problems, such as image deblurring and rainstreaks removals, can all be appropriately addressed within ourTLF framework. Extensive experiments also verify the theoreticalresults and show the advantages of our method against existingstate-of-the-art approaches. Latex Bibtex Citation:
@article{liu2020investigating, |
Learning Hadamard-Product-Propagation for Image Dehazing and Beyond.
Risheng Liu, Shiqi Li, Jinyuan Liu, Long Ma, Xin Fan, Zhongxuan Luo Abstract:
Image dehazing has evolved into an attractiveresearch field in the computer vision community in the pastfew decades. Previous traditional approaches attempt to designenergy-based objective functions. However, they cannot accu-rately express the intrinsic characteristics of the images, posingweak adaptation ability for real-world complex scenarios. Morerecently, deep learning techniques for image dehazing havematured and become more reliable, showing outstanding perfor-mance. Nevertheless, these methods heavily depend on trainingdata, restricting their application ranges. More importantly, bothtraditional and deep learning approaches all ignore a commonissue, noises/artifacts always appear in the recovery process.To this end, a new Hadamard-Product (HP) model is proposed,which consists of a series of data-driven priors. Based on thismodel, we derive a Learnable Hadamard-Product-Propagation(LHPP) by cascading a series of principle-inspired guidanceand recovery modules. In which, the principle-inspired guidancerelated to transmission is endowed the smoothness property,the other recovery module satisfies the distribution of naturalimages. The Hadamard-product-based propagations is generatedin our developed learnable framework for the task of imagedehazing. In this way, we can eliminate noises/artifacts in therecovery procedure to obtain the ideal outputs. Subsequently,since the generality of our HP model, we successfully extend ourLHPP to settle low-light image enhancement and underwaterimage enhancement problems. A series of analytical experimentsare performed to verify our effectiveness. Plenty of performanceevaluations on three complex tasks fully reveal our superiorityagainst multiple state-of-the-art methods. Latex Bibtex Citation:
@article{liu2020learning, |
Dual Neural Networks Coupling Data Regression with Explicit Priors for Monocular 3D Face Reconstruction.
Xin Fan, Shichao Cheng, Kang Huyan, Minjun Hou, Risheng Liu, Zhongxuan Luo Abstract:
We address the challenging issue of reconstructinga 3D face from one single image under various expressionsand illuminations, which is widely applied in multimedia tasks.Methods built upon classical parametric morphable models(3DMMs) gain success on reconstructing the global geometry ofa 3D face, but fail to precisely characterize local facial details.Recently, deep neural networks (DNN) have been applied tothe reconstruction that directly predicts depth maps, showingcompelling performance on detail recovery. Unfortunately, theirreconstruction is prone to structural distortions owing to thelack of explicit prior constraints. In this paper, we propose dualneural networks that optimize one energy coupling data fittingwith local explicit geometric prior. Specifically, we build oneresidual network upon traditional convolution layers in order todirectly predict 3D structures by fitting an input image. Meanwhile,we devise a novel architecture stacking shallow networks torefine 3D clouds with geometric priors given by Markov randomfields (MRFs). Quantitative evaluations demonstrate the superiorperformance of the dual networks over either end-to-end DNNsor parametric models. Comparisons with the state-of-the-art alsoshow competitive reconstruction quality on various conditions. Latex Bibtex Citation:
@article{fan2020dual, |
Abstract:
Underwater image enhancement is such an impor-tant low-level vision task with many applications that numerousalgorithms have been proposed in recent years. These algorithmsdeveloped upon various assumptions demonstrate successes fromvarious aspects usingdifferentdata sets anddifferentmetrics.In this work, we setup an undersea image capturing sys-tem, and construct a large-scaleReal-world Underwater ImageEnhancement(RUIE) data set divided into three subsets. Thethree subsets target at three challenging aspects for enhance-ment, i.e., image visibility quality, color casts, and higher-leveldetection/classification, respectively. We conduct extensive andsystematic experiments on RUIE to evaluate the effectiveness andlimitations of various algorithms to enhance visibility and correctcolor casts on images with hierarchical categories of degradation.Moreover, underwater image enhancement in practice usuallyserves as a preprocessing step for mid-level and high-level visiontasks. We thus exploit the object detection performance onenhanced images as a brand newtask-specificevaluation crite-rion. The findings from these evaluations not only confirm whatis commonly believed, but also suggest promising solutions andnew directions for visibility enhancement, color correction, andobject detection on real-world underwater images. The bench-mark is available at: https://github.com/dlut-dimt/Realworld-Underwater-Image-Enhancement-RUIE-Benchmark. Latex Bibtex Citation:
@article{liu2020real, |
Abstract:
We address the challenging issue of deformable registration that robustly and efficiently builds dense correspondences between images. Traditional approaches upon iterative energy optimization typically invoke expensive computational load. Recent learning-based methods are able to efficiently predict deformation maps by incorporating learnable deep networks. Unfortunately, these deep networks are designated to learn deterministic features for classification tasks, which are not necessarily optimal for registration. In this paper, we propose a novel bi-level optimization model that enables jointly learning deformation maps and features for image registration. The bi-level model takes the energy for deformation computation as the upper-level optimization while formulates the maximum a posterior (MAP) for features as the lower-level optimization. Further, we design learnable deep networks to simultaneously optimize the cooperative bi-level model, yielding robust and efficient registration. These deep networks derived from our bi-level optimization constitute an unsupervised end-to-end framework for learning both features and deformations. Extensive experiments of image-to-atlas and image-to-image deformable registration on 3D brain MR datasets demonstrate that we achieve state-of-the-art performance in terms of accuracy, efficiency, and robustness. Latex Bibtex Citation:
@inproceedings{LiuLZFL20, |
Optimization Learning: Perspective, Method, and Applications.
Risheng Liu Abstract:
Numerous tasks at the core of statistics, learn-ing, and vision areas are specific cases of ill-posed inverse problems. Recently, learning-based(e.g., deep) iterative methods have been empiricallyshown to be useful for these problems. Nevertheless,integrating learnable structures into iterations is stilla laborious process, which can only be guided byintuitions or empirical insights. Moreover, there isa lack of rigorous analysis of the convergence be-haviors of these reimplemented iterations, and thusthe significance of such methods is a little bit vague.We move beyond these limits and propose a theoret-ically guaranteed optimization learning paradigm,a generic and provable paradigm for nonconvex in-verse problems, and develop a series of convergentdeep models. Our theoretical analysis reveals thatthe proposed optimization learning paradigm allowsus to generate globally convergent trajectories forlearning-based iterative methods. Thanks to thesuperiority of our framework, we achieve state-of-the-art performance on different real applications. Latex Bibtex Citation:
@inproceedings{liu2021optimization, |
A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton.
Risheng Liu, Pan Mu, Xiaoming Yuan, Shangzhi Zeng, Jin Zhang Abstract:
In recent years, a variety of gradient-based bi-level optimization methods have been developedfor learning tasks. However, theoretical guaran-tees of these existing approaches often heavily re-ly on the simplification that for each fixed upper-level variable, the lower-level solution must bea singleton (a.k.a., Lower-Level Singleton, LL-S). In this work, by formulating bi-level modelsfrom the optimistic viewpoint and aggregatinghierarchical objective information, we establishBi-level Descent Aggregation (BDA), a flexibleand modularized algorithmic framework for bi-level programming. Theoretically, we derive anew methodology to prove the convergence ofBDA without the LLS condition. Furthermore,we improve the convergence properties of conven-tional first-order bi-level schemes (under the LLSsimplification) based on our proof recipe. Exten-sive experiments justify our theoretical results anddemonstrate the superiority of the proposed BDAfor different tasks, including hyper-parameter op-timization and meta learning. Latex Bibtex Citation:
@inproceedings{liu2020generic, |
2019Learning Converged Propagations with Deep Prior Ensemble for Image Enhancement.
Risheng Liu, Long Ma, Yiyang Wang, Lei Zhang Abstract:
Enhancing visual qualities of images plays veryimportant roles in various vision and learning applications.In the past few years, both knowledge-driven maximum aposterior (MAP) with prior modelings and fully data-dependentconvolutional neural network (CNN) techniques have been in-vestigated to address specific enhancement tasks. In this paper,by exploiting the advantages of these two types of mechanismswithin a complementary propagation perspective, we proposea unified framework, named deep prior ensemble (DPE), forsolving various image enhancement tasks. Specifically, we firstestablish the basic propagation scheme based on the fundamentalimage modeling cues and then introduce residual CNNs tohelp predicting the propagation direction at each stage. Bydesigning prior projections to perform feedback control, wetheoretically prove that even with experience-inspired CNNs,DPE is definitely converged and the output will always satisfyour fundamental task constraints. The main advantage againstconventional optimization-based MAP approaches is that ourdescent directions are learned from collected training data, thusare much more robust to unwanted local minimums. While,compared with existing CNN type networks, which are often de-signed in heuristic manners without theoretical guarantees, DPEis able to gain advantages from rich task cues investigated on thebases of domain knowledges. Therefore, DPE actually provides ageneric ensemble methodology to integrate both knowledge anddata-based cues for different image enhancement tasks. Moreimportantly, our theoretical investigations verify that the feed-forward propagations of DPE are properly controlled towardour desired solution. Experimental results demonstrate that theproposed DPE outperforms state-of-the-arts on a variety of imageenhancement tasks in terms of both quantitative measure andvisual perception quality. Latex Bibtex Citation:
@article{liu2019learning, |
Deep Proximal Unrolling: Algorithmic Framework, Convergence Analysis and Applications.
Risheng Liu, Shichao Cheng, Long Ma, Xin Fan, Zhongxuan Luo Abstract:
Deep learning models have gained great successin many real-world applications. However, most existing net-works are typically designed in heuristic manners, thus theseapproaches lack rigorous mathematical derivations and clearinterpretations. Several recent studies try to build deep modelsby unrolling a particular optimization model that involves taskinformation. Unfortunately, due to the dynamic nature of networkparameters, their resultant deep propagations do not possess thenice convergence property as the original optimization schemedoes. In this work, we develop a generic paradigm to unrollnonconvex optimization for deep model design. Different frommost existing frameworks, which just replace the iterations bynetwork architectures, we prove in theory that the propagationgenerated by our proximally unrolled deep model can globallyconverge to the critical-point of the original optimization model.Moreover, even if the task information is only partially available(e.g., no prior regularization), we can still train convergentdeep propagations. We also extend these theoretical investiga-tions on the more general multi-block models and thus a lotof real-world applications can be successfully handled by theproposed framework. Finally, we conduct experiments on variouslow-level vision tasks (i.e., non-blind deconvolution, dehazing, andlow-light image enhancement) and demonstrate the superiority ofour proposed framework, compared with existing state-of-the-artapproaches. Latex Bibtex Citation:
@article{liu2019deep, |
On the Convergence of Learning-based Iterative Methods for Nonconvex Inverse Problems.
Risheng Liu, Shichao Cheng, Yi He, Xin Fan, Zhouchen Lin, Zhongxuan Luo Abstract:
Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill-posed inverse problems. Recently,learning-based (e.g., deep) iterative methods have been empirically shown to be useful for these problems. Nevertheless, integratinglearnable structures into iterations is still a laborious process, which can only be guided by intuitions or empirical insights. Moreover, thereis a lack of rigorous analysis about the convergence behaviors of these reimplemented iterations, and thus the significance of suchmethods is a little bit vague. This paper moves beyond these limits and proposes Flexible Iterative Modularization Algorithm (FIMA), ageneric and provable paradigm for nonconvex inverse problems. Our theoretical analysis reveals that FIMA allows us to generate globallyconvergent trajectories for learning-based iterative methods. Meanwhile, the devised scheduling policies on flexible modules should alsobe beneficial for classical numerical methods in the nonconvex scenario. Extensive experiments on real applications verify the superiorityof FIMA. Latex Bibtex Citation:
@article{liu2019convergence, |
Learning Aggregated Transmission Propagation Networks for Haze Removal and Beyond.
Risheng Liu, Xin Fan, Minjun Hou, Zhiying Jiang, Zhongxuan Luo, Lei Zhang Abstract:
Single image dehazing is an important low-levelvision task with many applications. Early researches have inves-tigated different kinds of visual priors to address this problem.However, they may fail when their assumptions are not validon specific images. Recent deep networks also achieve relativelygood performance in this task. But unfortunately, due to thedisappreciation of rich physical rules in hazes, large amounts ofdata are required for their training. More importantly, they maystill fail when there exist completely different haze distributionsin testing images. By considering the collaborations of these twoperspectives, this paper designs a novel residual architecture toaggregate both prior (i.e., domain knowledge) and data (i.e., hazedistribution) information to propagate transmissions for scene ra-diance estimation. We further present a variational energy basedperspective to investigate the intrinsic propagation behavior ofour aggregated deep model. In this way, we actually bridge thegap between prior driven models and data driven networks andleverage advantages but avoid limitations of previous dehazingapproaches. A lightweight learning framework is proposed totrain our propagation network. Finally, by introducing a task-aware image separation formulation with a flexible optimizationscheme, we extend the proposed model for more challengingvision tasks, such as underwater image enhancement and singleimage rain removal. Experiments on both synthetic and real-world images demonstrate the effectiveness and efficiency of theproposed framework. Latex Bibtex Citation:
@article{liu2019learning, |
Knowledge-driven Deep Unrolling for Robust Image Layer Separation.
Risheng Liu, Zhiying Jiang, Xin Fan, Zhongxuan Luo Abstract:
Single-image layer separation targets to decomposethe observed image into two independent components in terms ofdifferent application demands. It is known that many vision andmultimedia applications can be (re)formulated as a separationproblem. Due to the fundamentally ill-posed natural of theseseparations, existing methods are inclined to investigate modelpriors on the separated components elaborately. Nevertheless,it is knotty to optimize the cost function with complicated modelregularizations. Effectiveness is greatly conceded by the settlediteration mechanism, and the adaption cannot be guaranteed dueto the poor data fitting. What is more, for a universal framework,the most taxing point is that one type of visual cue cannot beshared with different tasks. To partly overcome the weaknessesmentioned earlier, we delve into a generic optimization unrollingtechnique to incorporate deep architectures into iterations foradaptive image layer separation. First, we propose a generalenergy model with implicit priors, which is based on maximum aposterior, and employ the extensively accepted alternating direc-tion method of multiplier to determine our elementary iterationmechanism. By unrolling with one general residual architectureprior and one task-specific prior, we attain a straightforward,flexible, and data-dependent image separation framework suc-cessfully. We apply our method to four different tasks, includingsingle-image-rain streak removal, high-dynamic-range tone map-ping, low-light image enhancement, and single-image reflectionremoval. Extensive experiments demonstrate that the proposedmethod is applicable to multipletasks and outperforms the stateof the arts by a large margin qualitatively and quantitatively. Latex Bibtex Citation:
@article{liu2019knowledge, |
Toward Efficient Image Representation: Sparse Concept Discriminant Matrix Factorization.
Meng Pang, Yiu-ming Cheung, Risheng Liu, Jian Lou, Chuang Lin Abstract:
The key ingredients of matrix factorization lie in basic learning and coefficient representation. To enhance the discriminant ability of the learned basis, discriminant graph embedding is usually introduced in the matrix factorization model. However, the existing matrix factorization methods based on graph embedding generally conduct discriminant analysis via a single type of adjacency graph, either similarity-based graphs (e.g., Laplacian eigenmaps graph) or reconstruction-based graphs (e.g., L1-graph), while ignoring the cooperation of the different types of adjacency graphs that can better depict the discriminant structure of original data. To address the above issue, we propose a novel Fisher-like criterion, based on graph embedding, to extract sufficient discriminant information via two different types of adjacency graphs. One graph preserves the reconstruction relationships of neighboring samples in the same category, and the other suppresses the similarity relationships of neighboring samples from different categories. Moreover, we also leverage the sparse coding to promote the sparsity of the coefficients. By virtue of the proposed Fisher-like criterion and sparse coding, a new matrix factorization framework called Sparse concept Discriminant Matrix Factorization (SDMF) is proposed for efficient image representation. Furthermore, we extend the Fisher-like criterion to an unsupervised context, thus yielding an unsupervised version of SDMF. Experimental results on seven benchmark datasets demonstrate the effectiveness and efficiency of the proposed SDMFs on both image classification and clustering tasks. Latex Bibtex Citation:
@article{pang2019toward, |
Robust Heterogeneous Discriminative Analysis for Face Recognition with Single Sample per Person.
Meng Pang, Yiu-ming Cheung, Binghui Wang, Risheng Liu Abstract:
Single sample per person face recognition is one of the most challenging problems in face recognition (FR), where only single sample per person (SSPP) is enrolled in the gallery set for training. Although the existing patch-based methods have achieved great success in FR with SSPP, they still have limitations in feature extraction and identification stages when handling complex facial variations. In this work, we propose a new patch-based method called Robust Heterogeneous Discriminative Analysis (RHDA), for FR with SSPP. To enhance the robustness against complex facial variations, we first present a new graphbased Fisher-like criterion, which incorporates two manifold embeddings, to learn heterogeneous discriminative representations of image patches. Specifically, for each patch, the Fisher-like criterion is able to preserve the reconstruction relationship of neighboring patches from the same person, while suppressing the similarities between neighboring patches from the different persons. Then, we introduce two distance metrics, i.e., patch-to-patch distance and patch-to-manifold distance, and develop a fusion strategy to combine the recognition outputs of above two distance metrics via a joint majority voting for identification. Experimental results on various benchmark datasets demonstrate the effectiveness of the proposed method. Latex Bibtex Citation:
@article{pang2019robust, |
Blind Image Deblurring via Adaptive Optimization with Flexible Sparse Structure Control.
Risheng Liu, Caisheng Mao, Zhi-Hui Wang, Haojie Li Abstract:
Blind image deblurring is a long-standing ill-posed inverse problem which aims to recover a latent sharp imagegiven only a blurry observation. So far, existing studies have designed many effective priors w.r.t. the latent image withinthe maximum a posteriori (MAP) framework in order to narrow down the solution space. These non-convex priors arealways integrated into the final deblurring model, which makes the optimization challenging. However, due to unknownimage distribution, complex kernel structure and non-uniform noises in real-world scenarios, it is indeed challenging toexplicitly design a fixed prior for all cases. Thus we adopt the idea of adaptive optimization and propose the sparsestructure control (SSC) for the latent image during the optimization process. In this paper, we only formulate the necessaryoptimization constraints in a lightweight MAP model with no priors. Then we develop an inexact projected gradient schemeto incorporate flexible SSC in MAP inference. Besideslp-norm based SSC in our previous work, we also train a group ofdenoising convolutional neural networks (CNNs) to learn the sparse image structure automatically from the training dataunder different noise levels, and we show that CNNs-based SSC can achieve similar results compared withlp-norm but aremore robust to noise. Extensive experiments demonstrate that the proposed adaptive optimization scheme with two typesof SSC achieves the state-of-the-art results on both synthetic data and real-world images. Latex Bibtex Citation:
@article{liu2019blind, |
Semi-supervised Skin Detection by Network with Mutual Guidances.
Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang Abstract:
In this paper we present a new data-driven method for robust skin detection from a single human portrait image. Unlike previous methods, we incorporate human body as a weak semantic guidance into this task, considering acquiring large-scale of human labeled skin data is commonly expensive and time-consuming. To be specific, we propose a dual-task neural network for joint detection of skin and body via a semi-supervised learning strategy. The dualtask network contains a shared encoder but two decoders for skin and body separately. For each decoder, its output also serves as a guidance for its counterpart, making both decoders mutually guided. Extensive experiments were conducted to demonstrate the effectiveness of our network with mutual guidance, and experimental results show our network outperforms the state-of-the-art in skin detection. Latex Bibtex Citation:
@inproceedings{wang2019asynchronous, |
Task Embedded Coordinate Update: A Realizable Framework for Multivariate Non-convex Optimization.
Yiyang Wang, Risheng Liu*, Long Ma, Xiaoliang Song. Abstract:
We in this paper propose a realizable framework TECU, whichembeds task-specific strategies into update schemes of coordi-nate descent, for optimizing multivariate non-convex problemswith coupled objective functions. On one hand, TECU is ca-pable of improving algorithm efficiencies through embeddingproductive numerical algorithms, for optimizing univariatesub-problems with nice properties. From the other side, it alsoaugments probabilities to receive desired results, by embed-ding advanced techniques in optimizations of realistic tasks. In-tegrating both numerical algorithms and advanced techniquestogether, TECU is proposed in a unified framework for solvinga class of non-convex problems. Although the task embeddedstrategies bring inaccuracies in sub-problem optimizations, weprovide a realizable criterion to control the errors, meanwhile,to ensure robust performances with rigid theoretical analyses.By respectively embedding ADMM and a residual-type CNNin our algorithm framework, the experimental results verifyboth efficiency and effectiveness of embedding task-orientedstrategies in coordinate descent for solving practical problems. Latex Bibtex Citation:
@inproceedings{wang2019task, |
A Theoretically Guaranteed Deep Optimization Framework for Robust Compressive Sensing MRI.
Risheng Liu, Yuxi Zhang, Shichao Cheng, Xin Fan, Zhongxuan Luo Abstract:
Magnetic Resonance Imaging (MRI) is one of the most dy-namic and safe imaging techniques available for clinical ap-plications. However, the rather slow speed of MRI acqui-sitions limits the patient throughput and potential indica-tions. Compressive Sensing (CS) has proven to be an effi-cient technique for accelerating MRI acquisition. The mostwidely used CS-MRI model, founded on the premise of re-constructing an image from an incompletely filled k-space,leads to an ill-posed inverse problem. In the past years, lotsof efforts have been made to efficiently optimize the CS-MRImodel. Inspired by deep learning techniques, some prelimi-nary works have tried to incorporate deep architectures intoCS-MRI process. Unfortunately, the convergence issues (dueto the experience-based networks) and the robustness (i.e.,lack real-world noise modeling) of these deeply trained opti-mization methods are still missing. In this work, we developa new paradigm to integrate designed numerical solvers andthe data-driven architectures for CS-MRI. By introducing anoptimal condition checking mechanism, we can successfullyprove the convergence of our established deep CS-MRI op-timization scheme. Furthermore, we explicitly formulate theRician noise distributions within our framework and obtainan extended CS-MRI network to handle the real-world nosiesin the MRI process. Extensive experimental results verify thatthe proposed paradigm outperforms the existing state-of-the-art techniques both in reconstruction accuracy and efficiencyas well as robustness to noises in real scene. Latex Bibtex Citation:
@inproceedings{liu2019theoretically, |
Exploiting Local Feature Patterns for Unsupervised Domain Adaptation.
Jun Wen, Risheng Liu, Nenggan Zheng, Qian Zheng, Zhefeng Gong, Junsong Yuan Abstract:
Unsupervised domain adaptation methods aim to alleviate performance degradation caused by domain-shift by learning domain-invariant representations. Existing deep domain adaptation methods focus on holistic feature alignment by matching source and target holistic feature distributions, without considering local features and their multi-mode statistics. We show that the learned local feature patterns are more generic and transferable and a further local feature distribution matching enables fine-grained feature alignment. In this paper, we present a method for learning domain-invariant local feature patterns and jointly aligning holistic and local feature statistics. Comparisons to the state-of-the-art unsupervised domain adaptation methods on two popular benchmark datasets demonstrate the superiority of our approach and its effectiveness on alleviating negative transfer. Latex Bibtex Citation:
@inproceedings{wen2019exploiting, |
Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems.
Pengfei Wang, Risheng Liu, Nenggan Zheng, Zhefeng Gong Abstract:
In machine learning research, many emerging applications can be (re)formulated as the composition optimization problem with nonsmooth regularization penalty. To solve this problem, traditional stochastic gradient descent (SGD) algorithm and its variants either have low convergence rate or are computationally expensive. Recently, several stochastic composition gradient algorithms have been proposed, however, these methods are still inefficient and not scalable to large-scale composition optimization problem instances. To address these challenges, we propose an asynchronous parallel algorithm, named Async-ProxSCVR, which effectively combines asynchronous parallel implementation and variance reduction method. We prove that the algorithm admits the fastest convergence rate for both strongly convex and general nonconvex cases. Furthermore, we analyze the query complexity of the proposed algorithm and prove that linear speedup is accessible when we increase the number of processors. Finally, we evaluate our algorithm Async-ProxSCVR on two representative composition optimization problems including value function evaluation in reinforcement learning and sparse mean-variance optimization problem. Experimental results show that the algorithm achieves significant speedups and is much faster than existing compared methods. Latex Bibtex Citation:
@inproceedings{wang2019asynchronous, |
2018 and beforeOnline Low-Rank Representation Learning for Joint Multi-Subspace Recovery and Clustering.
Bo Li, Risheng Liu, Junjie Cao, Jie Zhang, Yukun Lai, and Xiuping Liu |
Explicit Shape Regression with Characteristic Number for Facial Landmark Localization.
Xin Fan, Risheng Liu, Zhongxuan Luo, Yuntao Li, and Yuyao Feng |
Learning to Diffuse: A New Perspective to Design PDEs for Visual Analysis.
Risheng Liu, Guangyu Zhong, Junjie Cao, Zhouchen Lin, Shiguang Shan, and Zhongxuan Luo |
Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning.
Zhouchen Lin, Risheng Liu, Huan Li |
Structure-Constrained Low-Rank Representation.
Kewei Tang, Risheng Liu, Zhixun Su and Jie Zhang |
Low-Rank Structure Learning via Nonconvex Heuristic Recovery.
Yue Deng, Qionghai Dai, Risheng Liu, Zengke Zhang, Sanqing Hu |
Feature Extraction by Learning Lorentzian Metric Tensor and Its Extensions.
Risheng Liu, Zhouchen Lin, Zhixun Su, Kewei Tang |
Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond.
Risheng Liu, Yi He, Shichao Cheng, Xin Fan, and Zhongxuan Luo |
A Bridging Framework for Model Optimization and Deep Propagation.
Risheng Liu, Shichao Cheng, Xiaokun Liu, Long Ma, Xin Fan, Zhongxuan Luo |
Toward Designing Convergent Deep Operator Splitting Methods for Task-specific Nonconvex Optimization.
Risheng Liu, Shichao Cheng, Yi He, Xin Fan, and Zhongxuan Luo |
Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams.
Yi Wang, Nan Xue, Xin Fan, Jiebo Luo, Risheng Liu, Bin Chen, Haojie Li, Zhongxuan Luo |
Proximal Alternating Direction Network: A Globally Converged Deep Unrolling Framework.
Risheng Liu, Xin Fan, Shichao Cheng, Xiangyu Wang, Zhongxuan Luo |
Unsupervised Representation Learning with Long-Term Dynamics for Skeleton Based Action Recognition.
Nenggan Zheng, Jun Wen, Risheng Liu, Liangqu Long, Jianhua Dai and Zhefeng Gong |
Self-reinforced Cascaded Regression for Face Alignment.
Xin Fan, Risheng Liu, Kang Huyan and Zhongxuan Luo |
Deep Location-Specific Tracking.
Lingxiao Yang, Risheng Liu, David Zhang, and Lei Zhang |
Deep Hybrid Residual Learning with Statistic Priors for Single Image Super-Resolution.
Risheng Liu, Xiangyu Wang, Xin Fan, Haojie Li, and Zhongxuan Luo |
Blind Image Deblurring via Adaptive Dynamical System Learning.
Risheng Liu, Shichao Cheng, Xin Fan, Zhongxuan Luo |
Linearized Alternating Direction Method with Penalization for Nonconvex and Nonsmooth Optimization.
Yiyang Wang, Risheng Liu, Xiaoliang Song and Zhixun Su |
Characteristic Number Regression for Facial Feature Extraction.
Yuntao Li, Xin Fan, Risheng Liu, Yuyao Feng, Zhongxuan Luo and Zezhou Li |
Adaptive Partial Differential Equation Learning for Visual Saliency Detection.
Risheng Liu, Junjie Cao, Zhouchen Lin and Shiguang Shan |
Robust Visual Tracking Using Latent Subspace Projection Pursuit.
Wei Jin, Risheng Liu*, Zhixun Su, Changcheng Zhang and Shanshan Bai |
Fixed-Rank Representation for Unsupervised Visual Learning.
Risheng Liu, Zhouchen Lin, Fernando De la Torre, Zhixun Su |
Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation.
Zhouchen Lin, Risheng Liu, Zhixun Su |
Learning PDEs for Image Restoration via Optimal Control.
Risheng Liu, Zhouchen Lin, Wei Zhang, Zhixun Su |