Weakly Incremental Learning for Semantic Segmentation (WILSS) leverages a pre-trained segmentation model to segment new classes using cost-effective and readily available image-level labels. A prevailing way to solve WILSS is the generation of seed areas for each new class, serving as a form of pixel-level supervision. However, a scenario usually arises where a pixel is concurrently predicted as an old class by the pre-trained segmentation model and a new class by the seed areas. Such a scenario becomes particularly problematic in WILSS, as the lack of pixel-level annotations on new classes makes it intractable to ascertain whether the pixel pertains to the new class or not. To surmount this issue, we propose an innovative, tendency-driven relationship of mutual exclusivity, meticulously tailored to govern the behavior of the seed areas and the predictions generated by the pre-trained segmentation model. This relationship stipulates that predictions for the new and old classes must not conflict whilst prioritizing the preservation of predictions for the old classes, which not only addresses the conflicting prediction issue but also effectively mitigates the inherent challenge of incremental learning - catastrophic forgetting. Furthermore, under the auspices of this tendency-driven mutual exclusivity relationship, we generate pseudo masks for the new classes, allowing for concurrent execution with model parameter updating via the resolution of a bi-level optimization problem. Extensive experiments substantiate the effectiveness of our framework, resulting in the establishment of new benchmarks and paving the way for further research in this field.
In partial label learning (PLL), each instance is associated with a set of candidate labels among which only one is ground-truth. The majority of the existing works focuses on constructing robust classifiers to estimate the labeling confidence of candidate labels in order to identify the correct one. However, these methods usually struggle to rectify mislabeled samples. To help existing PLL methods identify and rectify mislabeled samples, in this paper, we introduce a novel partner classifier and propose a novel ``mutual supervision'' paradigm. Specifically, we instantiate the partner classifier predicated on the implicit fact that non-candidate labels of a sample should not be assigned to it, which is inherently accurate and has not been fully investigated in PLL. Furthermore, a novel collaborative term is formulated to link the base classifier and the partner one. During each stage of mutual supervision, both classifiers will blur each other's predictions through a blurring mechanism to prevent overconfidence in a specific label. Extensive experiments demonstrate that the performance and disambiguation ability of several well-established stand-alone and deep-learning based PLL approaches can be significantly improved by coupling with this learning paradigm.
Existing scene text detection methods typically rely on extensive real data for training. Due to the lack of annotated real images, recent works have attempted to exploit large-scale labeled synthetic data (LSD) for pre-training text detectors. However, a synth-to-real domain gap emerges, further limiting the performance of text detectors. Differently, in this work, we propose \textbf{FreeReal}, a real-domain-aligned pre-training paradigm that enables the complementary strengths of both LSD and unlabeled real data (URD). Specifically, to bridge real and synthetic worlds for pre-training, a novel glyph-based mixing mechanism (GlyphMix) is tailored for text images. GlyphMix delineates the character structures of synthetic images and embeds them as graffiti-like units onto real images. Without introducing real domain drift, GlyphMix freely yields real-world images with annotations derived from synthetic labels. Furthermore, when given free fine-grained synthetic labels, GlyphMix can effectively bridge the linguistic domain gap stemming from English-dominated LSD to URD in various languages. Without bells and whistles, FreeReal achieves average gains of 4.56\%, 3.85\%, 3.90\%, and 1.97\% in improving the performance of DBNet, PANet, PSENet, and FCENet methods, respectively, consistently outperforming previous pre-training methods by a substantial margin across four public datasets. Code will be released soon.
In this paper, a hybrid IRS-aided amplify-and-forward (AF) relay wireless network is put forward, where the hybrid IRS is made up of passive and active elements. For maximum signal-to-noise ratio (SNR), a low-complexity method based on successive convex approximation and fractional programming (LC-SCA-FP) is proposed to jointly optimize the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS. Simulation results verify that the rate achieved by the proposed LC-SCA-FP method surpass those of the benchmark schemes, namely the passive IRS-aided AF relay and only AF relay network.
Due to its intrinsic ability to combat the double fading effect, the active intelligent reflective surface (IRS) becomes popular. The main feature of active IRS must be supplied by power, and the problem of how to allocate the total power between base station (BS) and IRS to fully explore the rate gain achieved by power allocation (PA) to remove the rate gap between existing PA strategies and optimal exhaustive search (ES) arises naturally. First, the signal-to-noise ratio (SNR) expression is derived to be a function of PA factor beta [0, 1]. Then, to improve the rate performance of the conventional gradient ascent (GA), an equal-spacing-multiple-point-initialization GA (ESMPI-GA) method is proposed. Due to its slow linear convergence from iterative GA, the proposed ESMPI-GA is high-complexity. Eventually, to reduce this high complexity, a low-complexity closed-form PA method with third-order Taylor expansion (TTE) centered at point beta0 = 0.5 is proposed. Simulation results show that the proposed ESMPI-GA harvests about 0.5 bit gain over conventional GA and 1.2 and 0.8 bits gain over existing methods like equal PA and Taylor polynomial approximation (TPA) for small-scale IRS, and the proposed TTE performs much better than TPA and fixed PA strategies using an extremely low complexity.
As a promising solution to improve communication quality, unmanned aerial vehicle (UAV) has been widely integrated into wireless networks. In this paper, for the sake of enhancing the message exchange rate between User1 (U1) and User2 (U2), an intelligent reflective surface (IRS)-and-UAV- assisted two-way amplify-and-forward (AF) relay wireless system is proposed, where U1 and U2 can communicate each other via a UAV-mounted IRS and an AF relay. Besides, an optimization problem of maximizing minimum rate is casted, where the variables, namely AF relay beamforming matrix and IRS phase shifts of two time slots, need to be optimized. To achieve a maximum rate, a low-complexity alternately iterative (AI) scheme based on zero forcing and successive convex approximation (LC-ZF-SCA) algorithm is put forward, where the expression of AF relay beamforming matrix can be derived in semi-closed form by ZF method, and IRS phase shift vectors of two time slots can be respectively optimized by utilizing SCA algorithm. To obtain a significant rate enhancement, a high-performance AI method based on one step, semidefinite programming and penalty SCA (ONS-SDP-PSCA) is proposed, where the beamforming matrix at AF relay can be firstly solved by singular value decomposition and ONS method, IRS phase shift matrices of two time slots are optimized by SDP and PSCA algorithms. Simulation results present that the rate performance of the proposed LC-ZF-SCA and ONS-SDP-PSCA methods surpass those of random phase and only AF relay. In particular, when total transmit power is equal to 30dBm, the proposed two methods can harvest more than 68.5% rate gain compared to random phase and only AF relay. Meanwhile, the rate performance of ONS-SDP-PSCA method at cost of extremely high complexity is superior to that of LC-ZF-SCA method.
In this paper, we propose a scribble-based video colorization network with temporal aggregation called SVCNet. It can colorize monochrome videos based on different user-given color scribbles. It addresses three common issues in the scribble-based video colorization area: colorization vividness, temporal consistency, and color bleeding. To improve the colorization quality and strengthen the temporal consistency, we adopt two sequential sub-networks in SVCNet for precise colorization and temporal smoothing, respectively. The first stage includes a pyramid feature encoder to incorporate color scribbles with a grayscale frame, and a semantic feature encoder to extract semantics. The second stage finetunes the output from the first stage by aggregating the information of neighboring colorized frames (as short-range connections) and the first colorized frame (as a long-range connection). To alleviate the color bleeding artifacts, we learn video colorization and segmentation simultaneously. Furthermore, we set the majority of operations on a fixed small image resolution and use a Super-resolution Module at the tail of SVCNet to recover original sizes. It allows the SVCNet to fit different image resolutions at the inference. Finally, we evaluate the proposed SVCNet on DAVIS and Videvo benchmarks. The experimental results demonstrate that SVCNet produces both higher-quality and more temporally consistent videos than other well-known video colorization approaches. The codes and models can be found at https://github.com/zhaoyuzhi/SVCNet.
Since reconfigurable intelligent surface (RIS) is considered to be a passive reflector for rate performance enhancement, a RIS-aided amplify-and-forward (AF) relay network is presented. By jointly optimizing the beamforming matrix at AF relay and the phase shifts matrices at RIS, two schemes are put forward to address a maximizing signal-to-noise ratio (SNR) problem. Firstly, aiming at achieving a high rate, a high-performance alternating optimization (AO) method based on Charnes-Cooper transformation and semidefinite programming (CCT-SDP) is proposed, where the optimization problem is decomposed to three subproblems solved by CCT-SDP and rank-one solutions can be recovered by Gaussian randomization. While the optimization variables in CCT-SDP method are matrices, which leads to extremely high complexity. In order to reduce the complexity, a low-complexity AO scheme based on Dinkelbachs transformation and successive convex approximation (DT-SCA) is put forward, where matrices variables are transformed to vector variables and three decoupled subproblems are solved by DT-SCA. Simulation results verify that compared to two benchmarks (i.e. a RIS-assisted AF relay network with random phase and a AF relay network without RIS), the proposed CCT-SDP and DT-SCA schemes can harvest better rate performance. Furthermore, it is revealed that the rate of the low-complexity DT-SCA method is close to that of CCT-SDP method.
Intelligent reflecting surface (IRS) is an emerging technology for wireless communication composed of a large number of low-cost passive devices with reconfigurable parameters, which can reflect signals with a certain phase shift and is capable of building programmable communication environment. In this paper, to avoid the high hardware cost and energy consumption in spatial modulation (SM), an IRS-aided hybrid secure SM (SSM) system with a hybrid precoder is proposed. To improve the security performance, we formulate an optimization problem to maximize the secrecy rate (SR) by jointly optimizing the beamforming at IRS and hybrid precoding at the transmitter. Considering that the SR has no closed form expression, an approximate SR (ASR) expression is derived as the objective function. To improve the SR performance, three IRS beamforming methods, called IRS alternating direction method of multipliers (IRS-ADMM), IRS block coordinate ascend (IRS-BCA) and IRS semi-definite relaxation (IRS-SDR), are proposed. As for the hybrid precoding design, approximated secrecy rate-successive convex approximation (ASR-SCA) method and cut-off rate-gradient ascend (COR-GA) method are proposed. Simulation results demonstrate that the proposed IRS-SDR and IRS-ADMM beamformers harvest substantial SR performance gains over IRS-BCA. Particularly, the proposed IRS-ADMM and IRS-BCA are of low-complexity at the expense of a little performance loss compared with IRS-SDR. For hybrid precoding, the proposed ASR-SCA performs better than COR-GA in the high transmit power region.
In this paper, a hybrid IRS-aided amplify-and-forward (AF) relay wireless network is considered, where an optimization problem is formulated to maximize signal-to-noise ratio (SNR) by jointly optimizing the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS subject to the constraints of transmit power budgets at the source/AF relay/hybrid IRS and that of unit-modulus for passive IRS phase shifts. To achieve high rate performance and extend the coverage range, a high-performance method based on semidefinite relaxation and fractional programming (HP-SDR-FP) algorithm is presented. Due to its extremely high complexity, a low-complexity method based on successive convex approximation and FP (LC-SCA-FP) algorithm is put forward. To further reduce the complexity, a lower-complexity method based on whitening filter, general power iterative and generalized Rayleigh-Ritz (WF-GPI-GRR) is proposed, where different from the above two methods, it is assumed that the amplifying coefficient of each active IRS element is equal, and the corresponding analytical solution of the amplifying coefficient can be obtained according to the transmit powers at AF relay and hybrid IRS. Simulation results show that the proposed three methods can greatly improve the rate performance compared to the existing networks, such as the passive IRS-aided AF relay and only AF relay network. In particular, a 50.0% rate gain over the existing networks is approximately achieved in the high power budget region of hybrid IRS. Moreover, it is verified that the proposed three efficient beamforming methods have an increasing order in rate performance: WF-GPI-GRR, LC-SCA-FP and HP-SDR-FP.