Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huy H. Nguyen

Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

Feb 13, 2024
AprilPyone MaungMaung, Huy H. Nguyen, Hitoshi Kiya, Isao Echizen

Figure 1 for Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

Figure 2 for Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

Figure 3 for Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

Figure 4 for Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many images from the Internet to find more spurious features is time-consuming. To this end, we utilize an existing approach of personalizing large-scale text-to-image diffusion models with available discovered spurious images and propose a new spurious feature similarity loss based on neural features of an adversarially robust model. Precisely, we fine-tune Stable Diffusion with several reference images from Spurious ImageNet with a modified objective incorporating the proposed spurious-feature similarity loss. Experiment results show that our method can generate spurious images that are consistently spurious across different classifiers. Moreover, the generated spurious images are visually similar to reference images from Spurious ImageNet.

Via

Access Paper or Ask Questions

Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

Jan 16, 2024
Zhicheng Dou, Yuchen Guo, Ching-Chun Chang, Huy H. Nguyen, Isao Echizen

The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers with little to no human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing methods prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector named Synthetic-Siamese which takes a pair of texts, one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness of previous detectors (OpenAI detector and DetectGPT) and significantly improves the baseline performances in realistic academic writing scenarios by approximately 67% to 95%.

Via

Access Paper or Ask Questions

Cross-Attention Watermarking of Large Language Models

Jan 12, 2024
Folco Bertini Baldassini, Huy H. Nguyen, Ching-Chung Chang, Isao Echizen

A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.

* 5 pages, 3 figures. Accepted to ICASSP 2024

Via

Access Paper or Ask Questions

Surface Normal Estimation with Transformers

Jan 11, 2024
Barry Shichen Hu, Siyun Liang, Johannes Paetzold, Huy H. Nguyen, Isao Echizen, Jiapeng Tang

We propose the use of a Transformer to accurately predict normals from point clouds with noise and density variations. Previous learning-based methods utilize PointNet variants to explicitly extract multi-scale features at different input scales, then focus on a surface fitting method by which local point cloud neighborhoods are fitted to a geometric surface approximated by either a polynomial function or a multi-layer perceptron (MLP). However, fitting surfaces to fixed-order polynomial functions can suffer from overfitting or underfitting, and learning MLP-represented hyper-surfaces requires pre-generated per-point weights. To avoid these limitations, we first unify the design choices in previous works and then propose a simplified Transformer-based model to extract richer and more robust geometric features for the surface normal estimation task. Through extensive experiments, we demonstrate that our Transformer-based method achieves state-of-the-art performance on both the synthetic shape dataset PCPNet, and the real-world indoor scene dataset SceneNN, exhibiting more noise-resilient behavior and significantly faster inference. Most importantly, we demonstrate that the sophisticated hand-designed modules in existing works are not necessary to excel at the task of surface normal estimation.

Via

Access Paper or Ask Questions

Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

Dec 13, 2023
Yuyang Sun, Huy H. Nguyen, Chun-Shien Lu, ZhiYong Zhang, Lu Sun, Isao Echizen

Figure 1 for Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

Figure 2 for Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

Figure 3 for Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

Figure 4 for Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

The growing diversity of digital face manipulation techniques has led to an urgent need for a universal and robust detection technology to mitigate the risks posed by malicious forgeries. We present a blended-based detection approach that has robust applicability to unseen datasets. It combines a method for generating synthetic training samples, i.e., reconstructed blended images, that incorporate potential deepfake generator artifacts and a detection model, a multi-scale feature reconstruction network, for capturing the generic boundary artifacts and noise distribution anomalies brought about by digital face manipulations. Experiments demonstrated that this approach results in better performance in both cross-manipulation detection and cross-dataset detection on unseen data.

Via

Access Paper or Ask Questions

How Close are Other Computer Vision Tasks to Deepfake Detection?

Oct 02, 2023
Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Figure 1 for How Close are Other Computer Vision Tasks to Deepfake Detection?

Figure 2 for How Close are Other Computer Vision Tasks to Deepfake Detection?

Figure 3 for How Close are Other Computer Vision Tasks to Deepfake Detection?

Figure 4 for How Close are Other Computer Vision Tasks to Deepfake Detection?

In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection. We present a new measurement, "model separability," for visually and quantitatively assessing a model's raw capacity to separate data in an unsupervised manner. We also present a systematic benchmark for determining the correlation between deepfake detection and other computer vision tasks using pre-trained models. Our analysis shows that pre-trained face recognition models are more closely related to deepfake detection than other models. Additionally, models trained using self-supervised methods are more effective in separation than those trained using supervised methods. After fine-tuning all models on a small deepfake dataset, we found that self-supervised models deliver the best results, but there is a risk of overfitting. Our results provide valuable insights that should help researchers and practitioners develop more effective deepfake detection models.

* Accepted to be Published in Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2023)

Via

Access Paper or Ask Questions

Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

Sep 27, 2023
Lukas Strack, Futa Waseda, Huy H. Nguyen, Yinqiang Zheng, Isao Echizen

Figure 1 for Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

Figure 2 for Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

Figure 3 for Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

Figure 4 for Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

Infrared detection is an emerging technique for safety-critical tasks owing to its remarkable anti-interference capability. However, recent studies have revealed that it is vulnerable to physically-realizable adversarial patches, posing risks in its real-world applications. To address this problem, we are the first to investigate defense strategies against adversarial patch attacks on infrared detection, especially human detection. We have devised a straightforward defense strategy, patch-based occlusion-aware detection (POD), which efficiently augments training samples with random patches and subsequently detects them. POD not only robustly detects people but also identifies adversarial patch locations. Surprisingly, while being extremely computationally efficient, POD easily generalizes to state-of-the-art adversarial patch attacks that are unseen during training. Furthermore, POD improves detection precision even in a clean (i.e., no-patch) situation due to the data augmentation effect. Evaluation demonstrated that POD is robust to adversarial patches of various shapes and sizes. The effectiveness of our baseline approach is shown to be a viable defense mechanism for real-world infrared human detection systems, paving the way for exploring future research directions.

* Lukas Strack and Futa Waseda contributed equally. 4 pages, 2 figures, Under-review

Via

Access Paper or Ask Questions

Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Dec 07, 2022
YuYang Sun, ZhiYong Zhang, Isao Echizen, Huy H. Nguyen, ChangZhen Qiu, Lu Sun

Figure 1 for Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Figure 2 for Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Figure 3 for Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Figure 4 for Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since human faces are closely related to identity characteristics, maliciously disseminated identity manipulated videos could trigger a crisis of public trust in the media and could even have serious political, social, and legal implications. To effectively detect manipulated videos, we focus on the position offset in the face blending process, resulting from the forced affine transformation of the normalized forged face. We introduce a method for detecting manipulated videos that is based on the trajectory of the facial region displacement. Specifically, we develop a virtual-anchor-based method for extracting the facial trajectory, which can robustly represent displacement information. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos that is based on dual-stream spatial-temporal graph attention and a gated recurrent unit backbone. Testing of our method on various manipulation datasets demonstrated that its accuracy and generalization ability is competitive with that of the leading detection methods.

Via

Access Paper or Ask Questions

Analysis of Master Vein Attacks on Finger Vein Recognition Systems

Oct 18, 2022
Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen

Figure 1 for Analysis of Master Vein Attacks on Finger Vein Recognition Systems

Figure 2 for Analysis of Master Vein Attacks on Finger Vein Recognition Systems

Figure 3 for Analysis of Master Vein Attacks on Finger Vein Recognition Systems

Figure 4 for Analysis of Master Vein Attacks on Finger Vein Recognition Systems

Finger vein recognition (FVR) systems have been commercially used, especially in ATMs, for customer verification. Thus, it is essential to measure their robustness against various attack methods, especially when a hand-crafted FVR system is used without any countermeasure methods. In this paper, we are the first in the literature to introduce master vein attacks in which we craft a vein-looking image so that it can falsely match with as many identities as possible by the FVR systems. We present two methods for generating master veins for use in attacking these systems. The first uses an adaptation of the latent variable evolution algorithm with a proposed generative model (a multi-stage combination of beta-VAE and WGAN-GP models). The second uses an adversarial machine learning attack method to attack a strong surrogate CNN-based recognition system. The two methods can be easily combined to boost their attack ability. Experimental results demonstrated that the proposed methods alone and together achieved false acceptance rates up to 73.29% and 88.79%, respectively, against Miura's hand-crafted FVR system. We also point out that Miura's system is easily compromised by non-vein-looking samples generated by a WGAN-GP model with false acceptance rates up to 94.21%. The results raise the alarm about the robustness of such systems and suggest that master vein attacks should be considered an important security measure.

* Accepted to be Published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

Via

Access Paper or Ask Questions

Rethinking Adversarial Examples for Location Privacy Protection

Jun 28, 2022
Trung-Nghia Le, Ta Gu, Huy H. Nguyen, Isao Echizen

Figure 1 for Rethinking Adversarial Examples for Location Privacy Protection

Figure 2 for Rethinking Adversarial Examples for Location Privacy Protection

Figure 3 for Rethinking Adversarial Examples for Location Privacy Protection

Figure 4 for Rethinking Adversarial Examples for Location Privacy Protection

We have investigated a new application of adversarial examples, namely location privacy protection against landmark recognition systems. We introduce mask-guided multimodal projected gradient descent (MM-PGD), in which adversarial examples are trained on different deep models. Image contents are protected by analyzing the properties of regions to identify the ones most suitable for blending in adversarial examples. We investigated two region identification strategies: class activation map-based MM-PGD, in which the internal behaviors of trained deep models are targeted; and human-vision-based MM-PGD, in which regions that attract less human attention are targeted. Experiments on the Places365 dataset demonstrated that these strategies are potentially effective in defending against black-box landmark recognition systems without the need for much image manipulation.

Via

Access Paper or Ask Questions