Models, code, and papers for "Yuankai Huo":
Longitudinal reproducibility is an essential concern in automated medical image segmentation, yet has proven to be an elusive objective as manual brain structure tracings have shown more than 10% variability. To improve reproducibility, lon-gitudinal segmentation (4D) approaches have been investigated to reconcile tem-poral variations with traditional 3D approaches. In the past decade, multi-atlas la-bel fusion has become a state-of-the-art segmentation technique for 3D image and many efforts have been made to adapt it to a 4D longitudinal fashion. However, the previous methods were either limited by using application specified energy function (e.g., surface fusion and multi model fusion) or only considered tem-poral smoothness on two consecutive time points (t and t+1) under sparsity as-sumption. Therefore, a 4D multi-atlas label fusion theory for general label fusion purpose and simultaneously considering temporal consistency on all time points is appealing. Herein, we propose a novel longitudinal label fusion algorithm, called 4D joint label fusion (4DJLF), to incorporate the temporal consistency modeling via non-local patch-intensity covariance models. The advantages of 4DJLF include: (1) 4DJLF is under the general label fusion framework by simul-taneously incorporating the spatial and temporal covariance on all longitudinal time points. (2) The proposed algorithm is a longitudinal generalization of a lead-ing joint label fusion method (JLF) that has proven adaptable to a wide variety of applications. (3) The spatial temporal consistency of atlases is modeled in a prob-abilistic model inspired from both voting based and statistical fusion. The pro-posed approach improves the consistency of the longitudinal segmentation while retaining sensitivity compared with original JLF approach using the same set of atlases. The method is available online in open-source.
Whole brain segmentation and cortical surface parcellation are essential in understanding the anatomical-functional relationships of the brain. Multi-atlas segmentation has been regarded as one of the leading segmentation methods for the whole brain segmentation. In our recent work, the multi-atlas technique has been adapted to surface reconstruction using a method called Multi-atlas CRUISE (MaCRUISE). The MaCRUISE method not only performed consistent volume-surface analyses but also showed advantages on robustness compared with the FreeSurfer method. However, a detailed surface parcellation was not provided by MaCRUISE, which hindered the region of interest (ROI) based analyses on surfaces. Herein, the MaCRUISE surface parcellation (MaCRUISEsp) method is proposed to perform the surface parcellation upon the inner, central and outer surfaces that are reconstructed from MaCRUISE. MaCRUISEsp parcellates inner, central and outer surfaces with 98 cortical labels respectively using a volume segmentation based surface parcellation (VSBSP), following a topological correction step. To validate the performance of MaCRUISEsp, 21 scan-rescan magnetic resonance imaging (MRI) T1 volume pairs from the Kirby21 dataset were used to perform a reproducibility analyses. MaCRUISEsp achieved 0.948 on median Dice Similarity Coefficient (DSC) for central surfaces. Meanwhile, FreeSurfer achieved 0.905 DSC for inner surfaces and 0.881 DSC for outer surfaces, while the proposed method achieved 0.929 DSC for inner surfaces and 0.835 DSC for outer surfaces. Qualitatively, the results are encouraging, but are not directly comparable as the two approaches use different definitions of cortical labels.
Probabilistic atlases provide essential spatial contextual information for image interpretation, Bayesian modeling, and algorithmic processing. Such atlases are typically constructed by grouping subjects with similar demographic information. Importantly, use of the same scanner minimizes inter-group variability. However, generalizability and spatial specificity of such approaches is more limited than one might like. Inspired by Commowick "Frankenstein's creature paradigm" which builds a personal specific anatomical atlas, we propose a data-driven framework to build a personal specific probabilistic atlas under the large-scale data scheme. The data-driven framework clusters regions with similar features using a point distribution model to learn different anatomical phenotypes. Regional structural atlases and corresponding regional probabilistic atlases are used as indices and targets in the dictionary. By indexing the dictionary, the whole brain probabilistic atlases adapt to each new subject quickly and can be used as spatial priors for visualization and processing. The novelties of this approach are (1) it provides a new perspective of generating personal specific whole brain probabilistic atlases (132 regions) under data-driven scheme across sites. (2) The framework employs the large amount of heterogeneous data (2349 images). (3) The proposed framework achieves low computational cost since only one affine registration and Pearson correlation operation are required for a new subject. Our method matches individual regions better with higher Dice similarity value when testing the probabilistic atlases. Importantly, the advantage the large-scale scheme is demonstrated by the better performance of using large-scale training data (1888 images) than smaller training set (720 images).
Whole brain segmentation on structural magnetic resonance imaging (MRI) is essential for understanding neuroanatomical-functional relationships. Traditionally, multi-atlas segmentation has been regarded as the standard method for whole brain segmentation. In past few years, deep convolutional neural network (DCNN) segmentation methods have demonstrated their advantages in both accuracy and computational efficiency. Recently, we proposed the spatially localized atlas network tiles (SLANT) method, which is able to segment a 3D MRI brain scan into 132 anatomical regions. Commonly, DCNN segmentation methods yield inferior performance under external validations, especially when the testing patterns were not presented in the training cohorts. Recently, we obtained a clinically acquired, multi-sequence MRI brain cohort with 1480 clinically acquired, de-identified brain MRI scans on 395 patients using seven different MRI protocols. Moreover, each subject has at least two scans from different MRI protocols. Herein, we assess the SLANT method's intra- and inter-protocol reproducibility. SLANT achieved less than 0.05 coefficient of variation (CV) for intra-protocol experiments and less than 0.15 CV for inter-protocol experiments. The results show that the SLANT method achieved high intra- and inter- protocol reproducibility.
Brain imaging analysis on clinically acquired computed tomography (CT) is essential for the diagnosis, risk prediction of progression, and treatment of the structural phenotypes of traumatic brain injury (TBI). However, in real clinical imaging scenarios, entire body CT images (e.g., neck, abdomen, chest, pelvis) are typically captured along with whole brain CT scans. For instance, in a typical sample of clinical TBI imaging cohort, only ~15% of CT scans actually contain whole brain CT images suitable for volumetric brain analyses; the remaining are partial brain or non-brain images. Therefore, a manual image retrieval process is typically required to isolate the whole brain CT scans from the entire cohort. However, the manual image retrieval is time and resource consuming and even more difficult for the larger cohorts. To alleviate the manual efforts, in this paper we propose an automated 3D medical image retrieval pipeline, called deep montage-based image retrieval (dMIR), which performs classification on 2D montage images via a deep convolutional neural network. The novelty of the proposed method for image processing is to characterize the medical image retrieval task based on the montage images. In a cohort of 2000 clinically acquired TBI scans, 794 scans were used as training data, 206 scans were used as validation data, and the remaining 1000 scans were used as testing data. The proposed achieved accuracy=1.0, recall=1.0, precision=1.0, f1=1.0 for validation data, while achieved accuracy=0.988, recall=0.962, precision=0.962, f1=0.962 for testing data. Thus, the proposed dMIR is able to perform accurate CT whole brain image retrieval from large-scale clinical cohorts.
A lack of generalizability is one key limitation of deep learning based segmentation. Typically, one manually labels new training images when segmenting organs in different imaging modalities or segmenting abnormal organs from distinct disease cohorts. The manual efforts can be alleviated if one is able to reuse manual labels from one modality (e.g., MRI) to train a segmentation network for a new modality (e.g., CT). Previously, two stage methods have been proposed to use cycle generative adversarial networks (CycleGAN) to synthesize training images for a target modality. Then, these efforts trained a segmentation network independently using synthetic images. However, these two independent stages did not use the complementary information between synthesis and segmentation. Herein, we proposed a novel end-to-end synthesis and segmentation network (EssNet) to achieve the unpaired MRI to CT image synthesis and CT splenomegaly segmentation simultaneously without using manual labels on CT. The end-to-end EssNet achieved significantly higher median Dice similarity coefficient (0.9188) than the two stages strategy (0.8801), and even higher than canonical multi-atlas segmentation (0.9125) and ResNet method (0.9107), which used the CT manual labels.
An abdominal ultrasound examination, which is the most common ultrasound examination, requires substantial manual efforts to acquire standard abdominal organ views, annotate the views in texts, and record clinically relevant organ measurements. Hence, automatic view classification and landmark detection of the organs can be instrumental to streamline the examination workflow. However, this is a challenging problem given not only the inherent difficulties from the ultrasound modality, e.g., low contrast and large variations, but also the heterogeneity across tasks, i.e., one classification task for all views, and then one landmark detection task for each relevant view. While convolutional neural networks (CNN) have demonstrated more promising outcomes on ultrasound image analytics than traditional machine learning approaches, it becomes impractical to deploy multiple networks (one for each task) due to the limited computational and memory resources on most existing ultrasound scanners. To overcome such limits, we propose a multi-task learning framework to handle all the tasks by a single network. This network is integrated to perform view classification and landmark detection simultaneously; it is also equipped with global convolutional kernels, coordinate constraints, and a conditional adversarial module to leverage the performances. In an experimental study based on 187,219 ultrasound images, with the proposed simplified approach we achieve (1) view classification accuracy better than the agreement between two clinical experts and (2) landmark-based measurement errors on par with inter-user variability. The multi-task approach also benefits from sharing the feature extraction during the training process across all tasks and, as a result, outperforms the approaches that address each task individually.
Manually tracing regions of interest (ROIs) within the liver is the de facto standard method for measuring liver attenuation on computed tomography (CT) in diagnosing nonalcoholic fatty liver disease (NAFLD). However, manual tracing is resource intensive. To address these limitations and to expand the availability of a quantitative CT measure of hepatic steatosis, we propose the automatic liver attenuation ROI-based measurement (ALARM) method for automated liver attenuation estimation. The ALARM method consists of two major stages: (1) deep convolutional neural network (DCNN)-based liver segmentation and (2) automated ROI extraction. First, liver segmentation was achieved using our previously developed SS-Net. Then, a single central ROI (center-ROI) and three circles ROI (periphery-ROI) were computed based on liver segmentation and morphological operations. The ALARM method is available as an open source Docker container (https://github.com/MASILab/ALARM).246 subjects with 738 abdomen CT scans from the African American-Diabetes Heart Study (AA-DHS) were used for external validation (testing), independent from the training and validation cohort (100 clinically acquired CT abdominal scans).
Recently, multi-task networks have shown to both offer additional estimation capabilities, and, perhaps more importantly, increased performance over single-task networks on a "main/primary" task. However, balancing the optimization criteria of multi-task networks across different tasks is an area of active exploration. Here, we extend a previously proposed 3D attention-based network with four additional multi-task subnetworks for the detection of lung cancer and four auxiliary tasks (diagnosis of asthma, chronic bronchitis, chronic obstructive pulmonary disease, and emphysema). We introduce and evaluate a learning policy, Periodic Focusing Learning Policy (PFLP), that alternates the dominance of tasks throughout the training. To improve performance on the primary task, we propose an Internal-Transfer Weighting (ITW) strategy to suppress the loss functions on auxiliary tasks for the final stages of training. To evaluate this approach, we examined 3386 patients (single scan per patient) from the National Lung Screening Trial (NLST) and de-identified data from the Vanderbilt Lung Screening Program, with a 2517/277/592 (scans) split for training, validation, and testing. Baseline networks include a single-task strategy and a multi-task strategy without adaptive weights (PFLP/ITW), while primary experiments are multi-task trials with either PFLP or ITW or both. On the test set for lung cancer prediction, the baseline single-task network achieved prediction AUC of 0.8080 and the multi-task baseline failed to converge (AUC 0.6720). However, applying PFLP helped multi-task network clarify and achieved test set lung cancer prediction AUC of 0.8402. Furthermore, our ITW technique boosted the PFLP enabled multi-task network and achieved an AUC of 0.8462 (McNemar test, p < 0.01).
Generalizability is an important problem in deep neural networks, especially in the context of the variability of data acquisition in clinical magnetic resonance imaging (MRI). Recently, the Spatially Localized Atlas Network Tiles (SLANT) approach has been shown to effectively segment whole brain non-contrast T1w MRI with 132 volumetric labels. Enhancing generalizability of SLANT would enable broader application of volumetric assessment in multi-site studies. Transfer learning (TL) is commonly used to update the neural network weights for local factors; yet, it is commonly recognized to risk degradation of performance on the original validation/test cohorts. Here, we explore TL by data augmentation to address these concerns in the context of adapting SLANT to anatomical variation and scanning protocol. We consider two datasets: First, we optimize for age with 30 T1w MRI of young children with manually corrected volumetric labels, and accuracy of automated segmentation defined relative to the manually provided truth. Second, we optimize for acquisition with 36 paired datasets of pre- and post-contrast clinically acquired T1w MRI, and accuracy of the post-contrast segmentations assessed relative to the pre-contrast automated assessment. For both studies, we augment the original TL step of SLANT with either only the new data or with both original and new data. Over baseline SLANT, both approaches yielded significantly improved performance (signed rank tests; pediatric: 0.89 vs. 0.82 DSC, p<0.001; contrast: 0.80 vs 0.76, p<0.001). The performance on the original test set decreased with the new-data only transfer learning approach, so data augmentation was superior to strict transfer learning.
Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we propose the attention identical dual network (AID-Net) to perform CAC detection using scan-rescan longitudinal non-contrast CT scans with weakly supervised attention by only using per scan level labels. To leverage the performance, 3D attention mechanisms were integrated into the AID-Net to provide complementary information for classification tasks. Moreover, the 3D Gradient-weighted Class Activation Mapping (Grad-CAM) was also proposed at the testing stage to interpret the behaviors of the deep neural network. 5075 non-contrast chest CT scans were used as training, validation and testing datasets. Baseline performance was assessed on the same cohort. From the results, the proposed AID-Net achieved the superior performance on classification accuracy (0.9272) and AUC (0.9627).
Whole brain segmentation on a structural magnetic resonance imaging (MRI) is essential in non-invasive investigation for neuroanatomy. Historically, multi-atlas segmentation (MAS) has been regarded as the de facto standard method for whole brain segmentation. Recently, deep neural network approaches have been applied to whole brain segmentation by learning random patches or 2D slices. Yet, few previous efforts have been made on detailed whole brain segmentation using 3D networks due to the following challenges: (1) fitting entire whole brain volume into 3D networks is restricted by the current GPU memory, and (2) the large number of targeting labels (e.g., > 100 labels) with limited number of training 3D volumes (e.g., < 50 scans). In this paper, we propose the spatially localized atlas network tiles (SLANT) method to distribute multiple independent 3D fully convolutional networks to cover overlapped sub-spaces in a standard atlas space. This strategy simplifies the whole brain learning task to localized sub-tasks, which was enabled by combing canonical registration and label fusion techniques with deep learning. To address the second challenge, auxiliary labels on 5111 initially unlabeled scans were created by MAS for pre-training. From empirical validation, the state-of-the-art MAS method achieved mean Dice value of 0.76, 0.71, and 0.68, while the proposed method achieved 0.78, 0.73, and 0.71 on three validation cohorts. Moreover, the computational time reduced from > 30 hours using MAS to ~15 minutes using the proposed method. The source code is available online https://github.com/MASILab/SLANTbrainSeg
Annual low dose computed tomography (CT) lung screening is currently advised for individuals at high risk of lung cancer (e.g., heavy smokers between 55 and 80 years old). The recommended screening practice significantly reduces all-cause mortality, but the vast majority of screening results are negative for cancer. If patients at very low risk could be identified based on individualized, image-based biomarkers, the health care resources could be more efficiently allocated to higher risk patients and reduce overall exposure to ionizing radiation. In this work, we propose a multi-task (diagnosis and prognosis) deep convolutional neural network to improve the diagnostic accuracy over a baseline model while simultaneously estimating a personalized cancer-free progression time (CFPT). A novel Censored Regression Loss (CRL) is proposed to perform weakly supervised regression so that even single negative screening scans can provide small incremental value. Herein, we study 2287 scans from 1433 de-identified patients from the Vanderbilt Lung Screening Program (VLSP) and Molecular Characterization Laboratories (MCL) cohorts. Using five-fold cross-validation, we train a 3D attention-based network under two scenarios: (1) single-task learning with only classification, and (2) multi-task learning with both classification and regression. The single-task learning leads to a higher AUC compared with the Kaggle challenge winner pre-trained model (0.878 v. 0.856), and multi-task learning significantly improves the single-task one (AUC 0.895, p<0.01, McNemar test). In summary, the image-based predicted CFPT can be used in follow-up year lung cancer prediction and data assessment.
Early detection of lung cancer is essential in reducing mortality. Recent studies have demonstrated the clinical utility of low-dose computed tomography (CT) to detect lung cancer among individuals selected based on very limited clinical information. However, this strategy yields high false positive rates, which can lead to unnecessary and potentially harmful procedures. To address such challenges, we established a pipeline that co-learns from detailed clinical demographics and 3D CT images. Toward this end, we leveraged data from the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL), which focuses on early detection of lung cancer. A 3D attention-based deep convolutional neural net (DCNN) is proposed to identify lung cancer from the chest CT scan without prior anatomical location of the suspicious nodule. To improve upon the non-invasive discrimination between benign and malignant, we applied a random forest classifier to a dataset integrating clinical information to imaging data. The results show that the AUC obtained from clinical demographics alone was 0.635 while the attention network alone reached an accuracy of 0.687. In contrast when applying our proposed pipeline integrating clinical and imaging variables, we reached an AUC of 0.787 on the testing dataset. The proposed network both efficiently captures anatomical information for classification and also generates attention maps that explain the features that drive performance.
A key limitation of deep convolutional neural networks (DCNN) based image segmentation methods is the lack of generalizability. Manually traced training images are typically required when segmenting organs in a new imaging modality or from distinct disease cohort. The manual efforts can be alleviated if the manually traced images in one imaging modality (e.g., MRI) are able to train a segmentation network for another imaging modality (e.g., CT). In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a segmentation network for a target imaging modality without having manual labels. SynSeg-Net is trained by using (1) unpaired intensity images from source and target modalities, and (2) manual labels only from source modality. SynSeg-Net is enabled by the recent advances of cycle generative adversarial networks (CycleGAN) and DCNN. We evaluate the performance of the SynSeg-Net on two experiments: (1) MRI to CT splenomegaly synthetic segmentation for abdominal images, and (2) CT to MRI total intracranial volume synthetic segmentation (TICV) for brain images. The proposed end-to-end approach achieved superior performance to two stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional segmentation network using target modality labels in certain scenarios. The source code of SynSeg-Net is publicly available (https://github.com/MASILab/SynSeg-Net).
Detailed whole brain segmentation is an essential quantitative technique, which provides a non-invasive way of measuring brain regions from a structural magnetic resonance imaging (MRI). Recently, deep convolution neural network (CNN) has been applied to whole brain segmentation. However, restricted by current GPU memory, 2D based methods, downsampling based 3D CNN methods, and patch-based high-resolution 3D CNN methods have been the de facto standard solutions. 3D patch-based high resolution methods typically yield superior performance among CNN approaches on detailed whole brain segmentation (>100 labels), however, whose performance are still commonly inferior compared with multi-atlas segmentation methods (MAS) due to the following challenges: (1) a single network is typically used to learn both spatial and contextual information for the patches, (2) limited manually traced whole brain volumes are available (typically less than 50) for training a network. In this work, we propose the spatially localized atlas network tiles (SLANT) method to distribute multiple independent 3D fully convolutional networks (FCN) for high-resolution whole brain segmentation. To address the first challenge, multiple spatially distributed networks were used in the SLANT method, in which each network learned contextual information for a fixed spatial location. To address the second challenge, auxiliary labels on 5111 initially unlabeled scans were created by multi-atlas segmentation for training. Since the method integrated multiple traditional medical image processing methods with deep learning, we developed a containerized pipeline to deploy the end-to-end solution. From the results, the proposed method achieved superior performance compared with multi-atlas segmentation methods, while reducing the computational time from >30 hours to 15 minutes (https://github.com/MASILab/SLANTbrainSeg).
Spleen volume estimation using automated image segmentation technique may be used to detect splenomegaly (abnormally enlarged spleen) on Magnetic Resonance Imaging (MRI) scans. In recent years, Deep Convolutional Neural Networks (DCNN) segmentation methods have demonstrated advantages for abdominal organ segmentation. However, variations in both size and shape of the spleen on MRI images may result in large false positive and false negative labeling when deploying DCNN based methods. In this paper, we propose the Splenomegaly Segmentation Network (SSNet) to address spatial variations when segmenting extraordinarily large spleens. SSNet was designed based on the framework of image-to-image conditional generative adversarial networks (cGAN). Specifically, the Global Convolutional Network (GCN) was used as the generator to reduce false negatives, while the Markovian discriminator (PatchGAN) was used to alleviate false positives. A cohort of clinically acquired 3D MRI scans (both T1 weighted and T2 weighted) from patients with splenomegaly were used to train and test the networks. The experimental results demonstrated that a mean Dice coefficient of 0.9260 and a median Dice coefficient of 0.9262 using SSNet on independently tested MRI volumes of patients with splenomegaly.
Deep brain stimulation (DBS) has the potential to improve the quality of life of people with a variety of neurological diseases. A key challenge in DBS is in the placement of a stimulation electrode in the anatomical location that maximizes efficacy and minimizes side effects. Pre-operative localization of the optimal stimulation zone can reduce surgical times and morbidity. Current methods of producing efficacy probability maps follow an anatomical guidance on magnetic resonance imaging (MRI) to identify the areas with the highest efficacy in a population. In this work, we propose to revisit this problem as a classification problem, where each voxel in the MRI is a sample informed by the surrounding anatomy. We use a patch-based convolutional neural network to classify a stimulation coordinate as having a positive reduction in symptoms during surgery. We use a cohort of 187 patients with a total of 2,869 stimulation coordinates, upon which 3D patches were extracted and associated with an efficacy score. We compare our results with a registration-based method of surgical planning. We show an improvement in the classification of intraoperative stimulation coordinates as a positive response in reduction of symptoms with AUC of 0.670 compared to a baseline registration-based approach, which achieves an AUC of 0.627 (p < 0.01). Although additional validation is needed, the proposed classification framework and deep learning method appear well-suited for improving pre-surgical planning and personalize treatment strategies.
Tissue window filtering has been widely used in deep learning for computed tomography (CT) image analyses to improve training performance (e.g., soft tissue windows for abdominal CT). However, the effectiveness of tissue window normalization is questionable since the generalizability of the trained model might be further harmed, especially when such models are applied to new cohorts with different CT reconstruction kernels, contrast mechanisms, dynamic variations in the acquisition, and physiological changes. We evaluate the effectiveness of both with and without using soft tissue window normalization on multisite CT cohorts. Moreover, we propose a stochastic tissue window normalization (SWN) method to improve the generalizability of tissue window normalization. Different from the random sampling, the SWN method centers the randomization around the soft tissue window to maintain the specificity for abdominal organs. To evaluate the performance of different strategies, 80 training and 453 validation and testing scans from six datasets are employed to perform multi-organ segmentation using standard 2D U-Net. The six datasets cover the scenarios, where the training and testing scans are from (1) same scanner and same population, (2) same CT contrast but different pathology, and (3) different CT contrast and pathology. The traditional soft tissue window and nonwindowed approaches achieved better performance on (1). The proposed SWN achieved general superior performance on (2) and (3) with statistical analyses, which offers better generalizability for a trained model.
Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional true/false, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method.